I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

Posted by Glittering_Focus1538@reddit | LocalLLaMA | View on Reddit | 321 comments

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you're running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls fail, context overflows, multi-step tasks collapse.

So I built SmallCode. It's designed from the ground up for small local models.

The result: 87/100 benchmark tasks pass with a Gemma 4 model that only activates 4B parameters per token. OpenCode scores \~75% with 14B models. The harness does the heavy lifting, not the model size.

How it works (the tricks that make small models reliable):

What it looks like:

Full-screen terminal UI (like OpenCode/vim), scrollable chat, command palette with /, plugin system, persistent memory across sessions.

What it doesn't do:

Install:

npm install -g smallcode
cd your-project
smallcode

Point it at LM Studio, Ollama, or any OpenAI-compatible endpoint.

MIT licensed, everything's on GitHub: https://github.com/Doorman11991/smallcode

Happy to answer questions about the architecture or benchmark methodology.