Small open-source models can behave like real agents if the runtime owns the protocol

Posted by -eth3rnit3-@reddit | LocalLLaMA | View on Reddit | 5 comments

I’ve been working on a Ruby project called Kernai.

It’s technically an agent runtime, but I’m not trying to make the 100th “agent harness”. The thing I wanted to explore was a bit different:

what happens if the runtime owns the execution protocol, instead of depending on provider-native tool calling, framework abstractions, or huge prompt-injected tool registries?

The core idea is very small:

the model emits structured blocks
the kernel parses them
executes skills, protocol calls, workflows
injects results back
loops until completion

What I find interesting is that this makes agent behavior much more portable across models.

Even models with no native tool calling can still work in this setup.
And in my tests, even small open-source models can handle surprisingly complex scenarios if the execution contract is clear enough. They usually take more steps and are less reliable than bigger models, but they still work as agents.

Another thing I think matters a lot: the agent context stays very light.

A lot of current agent systems inject huge tool definitions, MCP registries, schemas, etc directly into the prompt. That works, but it also bloats context and mixes everything together from the start.

With this approach, the runtime stays much more exploratory:

the agent knows it can access commands
it discovers what exists when needed
then drills down only when necessary
and keeps descending toward more precise information before acting

So instead of dumping every skill and every MCP tool into context upfront, the agent explores capabilities progressively:

list what exists
inspect the relevant thing
then call it with the right shape

That keeps the prompt lighter, makes the execution model cleaner, and in practice seems to help even smaller models.

I also wanted to keep the whole thing very minimal:

no runtime dependencies
no giant abstraction layers
explicit execution loop
dynamic skills
protocols like MCP
workflow / sub-agent support
observability built in

There are a bunch of tested scenarios in the repo, including:

parallel and sequential workflows
failure recovery
deadlocks / invalid plans
multimodal OCR / image flows
MCP scenarios
mixed skill + protocol execution

What made it feel real to me is that I’ve already built a personal shell on top of it, and I’m now integrating the same approach into an existing commercial product where agents interact with the app at different levels.

So this isn’t really me trying to launch a shiny new AI framework.

It’s more me sharing an approach that feels simpler, lighter, and more robust than most of what I’ve tried in this space.

Repo if anyone wants to take a look: https://github.com/Eth3rnit3/kernai

Curious what people think, especially if you also feel that a lot of current agent stacks are getting too heavy.

[-]

croninsiglos@reddit

Looks like an agent harness but then made worse by not using provider pre-parsed tool calls which are already provided in the clean json response.

Why not use Pi?

No_Run8812@reddit

so you created a multi agent system?

-eth3rnit3-@reddit (OP)

Not exactly.

It can run workflows with sub-agents, so it supports multi-agent patterns, but that’s not really the main idea.

The main idea is more the runtime itself:

a minimal execution kernel based on a universal structured block protocol, with skills, protocols, multimodal support, and observability built around that.

So multi-agent is one capability, not the whole point.

autisticit@reddit

Would love to see a video of a real test. It looks great.

Thanks, and yes, I should probably record a short demo.

Another important part is that observability is native in the runtime, so I can inspect runs very easily, compare models side by side, and see exactly where they diverge.

That made it much easier to test real scenarios, not just toy demos.

The repo already has scenario scripts for things like MCP + local skills, workflows, OCR/image flows, failure recovery, etc, but I agree that a short video would make the whole thing much easier to grasp quickly.