Is there any coding agent that uses local agent for access to source code but can call out to cloud frontier LLMs for thinking?
Posted by Crafty-Lavishness540@reddit | LocalLLaMA | View on Reddit | 15 comments
See title. I'd like to begin work on a project that I want assurance that the code can't ever be leaked in an OpenAI/Anthropic/Google breach and I think this is the only way to go about it. Yes I am being overly paranoid, it is unlikely that they will be breached in the lifetime of their respective companies, but it is reassuring for my anxious mind.
andres_garrido@reddit
What you're describing is possible, but the constraint isn't “local vs cloud”, it's how context is constructed.
If the local agent is responsible for:
- indexing the codebase
- selecting only the minimal relevant pieces
- and sending abstractions instead of raw files
then the cloud model never sees the full code, just a compressed representation of what matters for the task.
Most current setups leak because they just stream files or large chunks directly into the prompt.
The interesting direction is treating the local agent as a retrieval + summarization layer, and the cloud model as a reasoning layer on top of that.
That separation is where hybrid setups start to make sense without exposing everything.
JamesEvoAI@reddit
I don't see how this would actually be useful in practice. If the point of querying one of the frontier labs models is to get it to reason about my code and contribute something to it, it needs to be able to see that code. If I'm sending an abstraction then it's going to give me back code that is based on that abstraction rather than the ground truth. Now you need a model to de-abstract whatever you get back from OpenAI/Anthropic and smuggle that into your ground truth code.
At that point you're just better off using a local model, the results will almost certainly be higher quality since they're grounded in reality rather than an abstraction.
andres_garrido@reddit
I think you’re right if the abstraction is too lossy.
If you just summarize code into something generic, then yeah, the model is reasoning on something detached from reality and the output won’t map cleanly back.
The cases where this works better are when the “abstraction” is still structurally tied to the code, like selecting specific functions/classes + minimal surrounding context, not a high level summary.
At that point it’s less about compressing meaning and more about filtering what’s relevant.
Fully local models are definitely cleaner from a grounding perspective, but the tradeoff shows up when you need stronger reasoning or planning on top of that.
So it ends up being a spectrum:
- full context → best grounding, worst privacy/cost
- naive abstraction → safe but unreliable
- structured selection → somewhere in the middle where hybrid setups can actually make sense
Crafty-Lavishness540@reddit (OP)
FYI you're talking to a bot.
andres_garrido@reddit
Yeah fair, I probably explained it too abstract. What I meant in practice is something like:
instead of sending whole files, you extract specific functions/classes + a short summary locally, and only send that.
Otherwise you hit exactly what you said, the model is reasoning on something that’s too far from the real code. The tricky part is finding that balance, too much abstraction breaks it, too little leaks everything.
JamesEvoAI@reddit
> can't ever be leaked in an OpenAI/Anthropic/Google breach and I think this is the only way to go about it.
The only way to ensure your data doesn't get leaked to the large labs is to not use them.
There is no difference from the models perspective between a thinking token and a regular output token, they're just in different parts of the output template. In order for a model to think about your code it would have to see your code.
There are methods like those described by andres_garrido and others, but you're going to end up with worst quality results and you're almost certainly going to end up leaking code at some point regardless.
Ok_Technology_5962@reddit
This is confusing. Since if you call cloud agents you will leak the code. But yes both Hermes, open claw have this. I use it regularly. Main is local subagent is gpt 5.4 or something else. Once comple it give the answer back to local
Crafty-Lavishness540@reddit (OP)
Well ideally the agent would not pass the entirety of the code to the LLM but just an overall view of what is needed to understand it and solve the problem or plan or etc.
Ok_Technology_5962@reddit
Hermes or open claw. Or ask any agent to sketchup python scripts for this
Crafty-Lavishness540@reddit (OP)
How's it working out for you?
Ok_Technology_5962@reddit
Fine. I use qwen 397b local and ask subagent for hard task. It usually limits cost.works fine. Dont have too many examples only when i needed something fixed in the openclaw itself i asked opus 4.6 to take over and change the code a bit
Crafty-Lavishness540@reddit (OP)
cheers
andres_garrido@reddit
Yeah, that’s the right intuition, but the tricky part is that “overall view” still has to be constructed somewhere.
Context windows don’t inherently protect you, they just limit how much you send. If your system builds that view by summarizing or extracting structure locally first, then you can control what actually leaves your machine.
So it becomes less about hybrid agents and more about where the abstraction happens. If raw code is transformed locally into higher level context, you can get pretty close to what you’re describing without exposing full files.
DedsPhil@reddit
That's not how context windows work.
Crafty-Lavishness540@reddit (OP)
?