Small open-source models can behave like real agents if the runtime owns the protocol
Posted by -eth3rnit3-@reddit | LocalLLaMA | View on Reddit | 5 comments
I’ve been working on a Ruby project called Kernai.
It’s technically an agent runtime, but I’m not trying to make the 100th “agent harness”. The thing I wanted to explore was a bit different:
what happens if the runtime owns the execution protocol, instead of depending on provider-native tool calling, framework abstractions, or huge prompt-injected tool registries?
The core idea is very small:
- the model emits structured blocks
- the kernel parses them
- executes skills, protocol calls, workflows
- injects results back
- loops until completion
What I find interesting is that this makes agent behavior much more portable across models.
Even models with no native tool calling can still work in this setup.
And in my tests, even small open-source models can handle surprisingly complex scenarios if the execution contract is clear enough. They usually take more steps and are less reliable than bigger models, but they still work as agents.
Another thing I think matters a lot: the agent context stays very light.
A lot of current agent systems inject huge tool definitions, MCP registries, schemas, etc directly into the prompt. That works, but it also bloats context and mixes everything together from the start.
With this approach, the runtime stays much more exploratory:
- the agent knows it can access commands
- it discovers what exists when needed
- then drills down only when necessary
- and keeps descending toward more precise information before acting
So instead of dumping every skill and every MCP tool into context upfront, the agent explores capabilities progressively:
- list what exists
- inspect the relevant thing
- then call it with the right shape
That keeps the prompt lighter, makes the execution model cleaner, and in practice seems to help even smaller models.
I also wanted to keep the whole thing very minimal:
- no runtime dependencies
- no giant abstraction layers
- explicit execution loop
- dynamic skills
- protocols like MCP
- workflow / sub-agent support
- observability built in
There are a bunch of tested scenarios in the repo, including:
- parallel and sequential workflows
- failure recovery
- deadlocks / invalid plans
- multimodal OCR / image flows
- MCP scenarios
- mixed skill + protocol execution
What made it feel real to me is that I’ve already built a personal shell on top of it, and I’m now integrating the same approach into an existing commercial product where agents interact with the app at different levels.
So this isn’t really me trying to launch a shiny new AI framework.
It’s more me sharing an approach that feels simpler, lighter, and more robust than most of what I’ve tried in this space.
Repo if anyone wants to take a look: https://github.com/Eth3rnit3/kernai
Curious what people think, especially if you also feel that a lot of current agent stacks are getting too heavy.
croninsiglos@reddit
Looks like an agent harness but then made worse by not using provider pre-parsed tool calls which are already provided in the clean json response.
Why not use Pi?
No_Run8812@reddit
so you created a multi agent system?
-eth3rnit3-@reddit (OP)
Not exactly.
It can run workflows with sub-agents, so it supports multi-agent patterns, but that’s not really the main idea.
The main idea is more the runtime itself:
a minimal execution kernel based on a universal structured block protocol, with skills, protocols, multimodal support, and observability built around that.
So multi-agent is one capability, not the whole point.
autisticit@reddit
Would love to see a video of a real test. It looks great.
-eth3rnit3-@reddit (OP)
Thanks, and yes, I should probably record a short demo.
Another important part is that observability is native in the runtime, so I can inspect runs very easily, compare models side by side, and see exactly where they diverge.
That made it much easier to test real scenarios, not just toy demos.
The repo already has scenario scripts for things like MCP + local skills, workflows, OCR/image flows, failure recovery, etc, but I agree that a short video would make the whole thing much easier to grasp quickly.