How I replaced Gemini CLI & Copilot with a local stack using Ollama, Continue.dev and MCP servers

Posted by aaronsky@reddit | LocalLLaMA | View on Reddit | 18 comments

Over the last few weeks I’ve been trying to get off the treadmill of cloud AI assistants (Gemini CLI, Copilot, Claude-CLI, etc.) and move everything to a local stack.

Goals:

- Keep code on my machine

- Stop paying monthly for autocomplete

- Still get “assistant-level” help in the editor

The stack I ended up with:

- Ollama for local LLMs (Nemotron-9B, Qwen3-8B, etc.)

- Continue.dev inside VS Code for chat + agents

- MCP servers (Filesystem, Git, Fetch, XRAY, SQLite, Snyk…) as tools

What it can do in practice:

- Web research from inside VS Code (Fetch)

- Multi-file refactors & impact analysis (Filesystem + XRAY)

- Commit/PR summaries and diff review (Git)

- Local DB queries (SQLite)

- Security / error triage (Snyk / Sentry)

I wrote everything up here, including:

- Real laptop specs (Win 11 + RTX 6650M, 8 GB VRAM)

- Model selection tips (GGUF → Ollama)

- Step-by-step setup

- Example “agent” workflows (PR triage bot, dep upgrader, docs bot, etc.)

Main article:

https://aiandsons.com/blog/local-ai-stack-ollama-continue-mcp

Repo with docs & config:

https://github.com/aar0nsky/blog-post-local-agent-mcp

Also cross-posted to Medium if that’s easier to read:

https://medium.com/@a.ankiel/ditch-the-monthly-fees-a-more-powerful-alternative-to-gemini-and-copilot-f4563f6530b7

Curious how other people are doing local-first dev assistants (what models + tools you’re using).

[-]

aaronsky@reddit (OP)

Are you specifically referring to the vscode extension or the cli agent or you just mean the platform? I am just using the vscode extension for the purposes of the blog post but there could be alternatives depending on your needs.

[-]

StardockEngineer@reddit

Extension. So gotten so obtuse.

[-]

aaronsky@reddit (OP)

Tysm for clarifying. I have been exploring alternatives as well. I can say that it worked for what I needed at the time but I would say there are definite options that are attractive as well.

[-]

StardockEngineer@reddit

It was for me, too. But now it’s not even top ten. Keep exploring!

[-]

aaronsky@reddit (OP)

This looks promising for assisted development for sure. Most of the tooling I need agent assistance with is bug fixing and enhancements and if they can accomplish the task without my interaction is the priority. Zed supports ollama as well I believe and I have had success with LMStudio as well in a different setting.

Hot-Employ-3399@reddit

For nothing serious I used aider with qwen30 today. Got simple infinite craft clone (no ui, but readline with autocomplete) which was ok to play after couple of fixes (eg prompt to combine two words in initial version put words in the beginning, after telling to move them to end, the game became much faster as cache in llama.cpp now worked and wasn't discarded completely)

Didn't like the code though and YouTube tutorials linked from aider site itself are bad quality comparing to something like usual gamedev tutorials(eg freecodecamp)

For vim i use llama.vim for autocomplete with granite due to its speed.

Sometimes i also copypaste code to mikupad and ask model to review it or use it as a template for similar code.

I used roo but at best it wasn't that good at best and at worst its content was so big it cancelled request to local server due to timeout

Continue is terrible. What

Ill_Barber8709@reddit

I replaced everything with Zed + LMStudio. Totally hassle free.

Zed handles everything related to the coder agent (git, create file etc.), and LMStudio handles the local server, the models, and some MCP tools like web search and visit website.

artificial-dopamine@reddit

I can't seem to get agent mode working properly in continue.dev and ollama no matter what model I use or what I put in the config file. I've tried Qwen2.5 and 3 32B, in a few flavours, Mistral, Devstral, Gemma 3, gpt-oss, etc. I'm working with a 3090 and wondering if I should change to Cline on the front end or swap to lamma.cpp or vllm on the back end. Any suggestions?

Stop using continue

What is the best alternative?

Roo. Cline. Maybe Kilo (I haven’t tried this). VSCode Insiders has local inference now, too. Continue is not even in the conversation. They’ve lost their minds and made it too bloated and difficult to use.

Also I am struggling to get it to put all of the files that I need into the prompt context.

Aggressive-Bother470@reddit

Buy more 3090s and jack the context.

What kind of MB is the best to run multiple? I can only fit one on my current one

Yes, swap to lcpp and roo.

End of problems.

g_rich@reddit

Goose (from block, not goose.ai) is another good option.

PotentialFunny7143@reddit

did you try opencode?