Can I use Cursor Agent (or similar) with a local LLM setup (8B / 13B)?

Posted by BudgetPurple3002@reddit | LocalLLaMA | View on Reddit | 8 comments

Hey everyone, I want to set up a local LLM (running 8B and possibly 13B parameter models). I was wondering if tools like Cursor Agent (or other AI coding agents) can work directly with my local setup, or if they require cloud-based APIs only.

Basically:

Is it possible to connect Cursor (or any similar coding agent) to a local model?

If not Cursor specifically, are there any good agent frameworks that can plug into local models for tasks like code generation and project automation?

Would appreciate any guidance from folks who’ve tried this. 🙏

[-]

Arkonias@reddit

Cursor doesn’t really work well with local models

Use roo code or cline in vscode with your localhost of choice.

[-]

FrozenBuffalo25@reddit

I use Cursor with Qwen Coder and it works just fine. Used it before with Phi4 when I had less VRAM and it worked too.

What’s the problem?

[-]

BudgetPurple3002@reddit (OP)

Appreciate the clarification! Just to double-check, if I run Roo Code or Cline against a local API (Ollama / LM Studio etc.), the agent-style functionality (code generation + iterative updates from a prompt) still works fine, correct?

[-]

ResidentPositive4122@reddit

still works fine, correct?

Eh... Debatable. Small models are not really that strong, suffer from context deterioration quickly and give subpar results overall. They might work (as in, generate tool calls etc) but will probably get confused pretty quickly. Just because they "mime" the agentic behaviour doesn't mean they're actually good at that.

We've had good success with bigger models tho. Devstral is surprisingly good for its size. GLM have a ~100b model that's decent at agentic stuff. Qwen have some models, but it's highly dependent on what your needs are.

Cline/roo/kilo all have prompt editing capabilities. Make sure to edit the prompts, and use the minimum necessary for your tool needs. If you use the ootb prompts you'll notice that the first query is in the tens of thousands of tokens already. That's too much for small models. You'll have to experiment with what you actually need in the prompt, and how well the small models adhere to them. Trim a lot, only add parts as you notice the model fails.

[-]

mearyu_@reddit

yes, see GIF https://media.githubusercontent.com/media/RooCodeInc/Roo-Code/main/src/assets/docs/demo.gif