built an AI agent runtime in Go that routes each step to a different model — tool calls on gpt-4o-mini, reasoning on gpt-4o, automatically

Posted by Aromatic-Ad-6711@reddit | LocalLLaMA | View on Reddit | 0 comments

I have been building ARK, an open-source AI agent runtime that solves three problems I kept hitting:

Context waste, connecting MCP tools dumps 60K+ tokens of schemas into every prompt. ARK loads only 3-5 relevant tools per task. 99% reduction.
One model for everything — every framework uses the same model for simple tool calls and complex reasoning. ARK routes each step to the right model automatically. Tool calls go to gpt-4o-mini ($0.15/M), reasoning goes to gpt-4o ($2.50/M). Configurable in one YAML block.
No cost visibility, you know your API bill but not which decision costs what. ARK tracks cost per step:

Step 1 [tool_call: github_list_repos] $0.000056 gpt-4o-mini

Step 2 [tool_call: github_list_issues] $0.000202 gpt-4o-mini

Step 3 [complete] $0.000549 gpt-4o

Total: $0.000807 | Fast model: 2 steps | Strong model: 1 step

The router learns from failures — if the cheap model fails on a step type, it promotes to the strong model next time. Learning persists across restarts.

Built entirely in Go. Single binary. Zero dependencies. 106 tests. 11 tools (GitHub, web search, file system, custom HTTP). 3 LLM providers (Anthropic, OpenAI, Ollama).

GitHub: https://github.com/atripati/ark

Would love feedback from anyone building agent infrastructure. What's missing?