How I replaced Gemini CLI & Copilot with a local stack using Ollama, Continue.dev and MCP servers
Posted by aaronsky@reddit | LocalLLaMA | View on Reddit | 18 comments
Over the last few weeks I’ve been trying to get off the treadmill of cloud AI assistants (Gemini CLI, Copilot, Claude-CLI, etc.) and move everything to a local stack.
Goals:
- Keep code on my machine
- Stop paying monthly for autocomplete
- Still get “assistant-level” help in the editor
The stack I ended up with:
- Ollama for local LLMs (Nemotron-9B, Qwen3-8B, etc.)
- Continue.dev inside VS Code for chat + agents
- MCP servers (Filesystem, Git, Fetch, XRAY, SQLite, Snyk…) as tools
What it can do in practice:
- Web research from inside VS Code (Fetch)
- Multi-file refactors & impact analysis (Filesystem + XRAY)
- Commit/PR summaries and diff review (Git)
- Local DB queries (SQLite)
- Security / error triage (Snyk / Sentry)
I wrote everything up here, including:
- Real laptop specs (Win 11 + RTX 6650M, 8 GB VRAM)
- Model selection tips (GGUF → Ollama)
- Step-by-step setup
- Example “agent” workflows (PR triage bot, dep upgrader, docs bot, etc.)
Main article:
https://aiandsons.com/blog/local-ai-stack-ollama-continue-mcp
Repo with docs & config:
https://github.com/aar0nsky/blog-post-local-agent-mcp
Also cross-posted to Medium if that’s easier to read:
Curious how other people are doing local-first dev assistants (what models + tools you’re using).
Hot-Employ-3399@reddit
For nothing serious I used aider with qwen30 today. Got simple infinite craft clone (no ui, but readline with autocomplete) which was ok to play after couple of fixes (eg prompt to combine two words in initial version put words in the beginning, after telling to move them to end, the game became much faster as cache in llama.cpp now worked and wasn't discarded completely)
Didn't like the code though and YouTube tutorials linked from aider site itself are bad quality comparing to something like usual gamedev tutorials(eg freecodecamp)
For vim i use llama.vim for autocomplete with granite due to its speed.
Sometimes i also copypaste code to mikupad and ask model to review it or use it as a template for similar code.
I used roo but at best it wasn't that good at best and at worst its content was so big it cancelled request to local server due to timeout
StardockEngineer@reddit
Continue is terrible. What
aaronsky@reddit (OP)
Are you specifically referring to the vscode extension or the cli agent or you just mean the platform? I am just using the vscode extension for the purposes of the blog post but there could be alternatives depending on your needs.
StardockEngineer@reddit
Extension. So gotten so obtuse.
aaronsky@reddit (OP)
Tysm for clarifying. I have been exploring alternatives as well. I can say that it worked for what I needed at the time but I would say there are definite options that are attractive as well.
StardockEngineer@reddit
It was for me, too. But now it’s not even top ten. Keep exploring!
Ill_Barber8709@reddit
I replaced everything with Zed + LMStudio. Totally hassle free.
Zed handles everything related to the coder agent (git, create file etc.), and LMStudio handles the local server, the models, and some MCP tools like web search and visit website.
aaronsky@reddit (OP)
This looks promising for assisted development for sure. Most of the tooling I need agent assistance with is bug fixing and enhancements and if they can accomplish the task without my interaction is the priority. Zed supports ollama as well I believe and I have had success with LMStudio as well in a different setting.
artificial-dopamine@reddit
I can't seem to get agent mode working properly in continue.dev and ollama no matter what model I use or what I put in the config file. I've tried Qwen2.5 and 3 32B, in a few flavours, Mistral, Devstral, Gemma 3, gpt-oss, etc. I'm working with a 3090 and wondering if I should change to Cline on the front end or swap to lamma.cpp or vllm on the back end. Any suggestions?
StardockEngineer@reddit
Stop using continue
artificial-dopamine@reddit
What is the best alternative?
StardockEngineer@reddit
Roo. Cline. Maybe Kilo (I haven’t tried this). VSCode Insiders has local inference now, too. Continue is not even in the conversation. They’ve lost their minds and made it too bloated and difficult to use.
artificial-dopamine@reddit
Also I am struggling to get it to put all of the files that I need into the prompt context.
Aggressive-Bother470@reddit
Buy more 3090s and jack the context.
artificial-dopamine@reddit
What kind of MB is the best to run multiple? I can only fit one on my current one
Aggressive-Bother470@reddit
Yes, swap to lcpp and roo.
End of problems.
g_rich@reddit
Goose (from block, not goose.ai) is another good option.
PotentialFunny7143@reddit
did you try opencode?