its all about the harness
Posted by Emotional-Breath-838@reddit | LocalLLaMA | View on Reddit | 30 comments
over the course of the arc of local model history (the past six weeks) we have reached a plateau with models and quantization that would have left our ancient selves (back in the 2025 dark ages) stunned and gobsmacked at the progress we currently enjoy.
Gemma and (soon) Qwen3.6 and 1bit PrismML and on and on.
But now, we must see advances in the harness. This is where our greatest source of future improvement lies.
Has anyone taken the time to systematically test the harnesses the same way so many have done with models?
if i had a spare day to code something that would shake up the world, it would be a harness comparison tool that allows users to select which hardware and which model and then output which harness has the advantage.
recommend a harness, tell me my premise is wrong or claim that my writing style reeks of ai slop (even though this was all single tapped ai free on my iOS keyboard with spell check off since iOS spellcheck is broken...)
layer4down@reddit
And no need to build from scratch. Just fork OpenCode or similar and off ya go.
sersoniko@reddit
These’s also Claw-Code now, however I’m not sure how long it’s going to last with Anthropic trying to take it down
AurumDaemonHD@reddit
Always from scratch i wouldnt fork. They dont have anything valuable in them anyway
NotArticuno@reddit
What clown wrote this 😂
AurumDaemonHD@reddit
Im the whole circus bro never underestimate
NotArticuno@reddit
This response is funny enough I don't even want to call you a clown anymore LMAO.
But fr you sound goofy saying stuff like that dude. I'm sure you're super good at programming, but saying it like that sounds like you're trying to cosplay as batman.
AurumDaemonHD@reddit
Thank you that was kind. You are right. The demon in me shadows the gold sometimes. A lifelong journey to learn to live with it and master it.
rorykoehler@reddit
Not much advantage to doing that. You can also fork codex cli. But coding from scratch gives you more possibilities
FeiX7@reddit
Optimization is all we need
Available-Craft-5795@reddit
Attention is All You Need? Nah, Optimization is All You Need.
cleverusernametry@reddit
Another idiotic neologism "harness" and "harness engineering". The llm was always a component to be used as part of a software system. Think of it like a wheel. So far we've been seeing unicycles and we're now just starting to see primitive cars.
Pleasant-Shallot-707@reddit
I agree the harness is the important part, however, I’d say that all we’ve seen so far is potential. We still need to see some more 1-bit models and wider adoption of turboquant, and fully available powerinfer.
Things will be amazing for local models by EOY.
Inevitable_Raccoon_9@reddit
what harness you mean? I encountered no one is adding governance and security - to give guardrails and firewalls to any LLM.
So instead of just more and more band aid plastered upon models - that dont fix the REAL problem - I build my own solution - with governance, security, budgets build into the foundation. Effectivly guarding LLMs
boutell@reddit
Why do people keep trying to solve this problem from inside the harness when Docker and even plain old Unix permissions are right there?
Extremely lazy version:
The only use case I can see for more than that is blocking external network access, to guard against any risk of the project itself being exfiltrated... but most of those risks are easily exploited against human devs too.
Do this at your own risk obvs
AurumDaemonHD@reddit
Exactly the market is so bad we are all building the same thing in isolation
Inevitable_Raccoon_9@reddit
I build my tool 4 weeks now - and I figuered out ONE thing.
In the past 2 years NOBODY build the tool that is needed!
In the past 4 weeks - not ONE deveoper realized what the ONLY SAFE solution is !
I am sorry - but my tool fixes exactly what is necessary - so I am sure many people will scratch their heads and ask - WHY didn't anyone else build it THIS way!
AurumDaemonHD@reddit
Last straw for me was claw. Its a dumpsterfire. The fix? Nemoclaw - rust engine policy proxy on top of that.
How do u fix bad design? Is wild to me how all these people are oblivious to the simple truth. It seems to me like common sense is dying. All that is left is social dogma fomo hype.
Inevitable_Raccoon_9@reddit
please have a look at https://github.com/GoetzKohlberg/sidjua - I will hopefully get V1.1 out in a few days so that Openclaw importer and MCP tools are available - hope it helps :)
thrownawaymane@reddit
If you think this is novel please make a separate post so that it can be evaluated.
AurumDaemonHD@reddit
Cool. I dont know ts much i do my app in python litestar and htmx alpine tailwind cuz javascript hurt me but it only made me stronger.
Look into nono and zerobox one goes with seccomp and bubblewrap another with landlock if u wanna isolate processes.
I do 2 container with api in between one secure another insecure bith douvle rootles with selinux
I wouldnt support windows that platform is going to be dead soon in my eyes hence i use podman systemd quadlets. My credo is support 1 system well than many poorly. So linux it is. Postgres. Arize phoenix and so on. Ofc plugin architecture u can replace if u dont like but i aint doing it.
Your governance layer is external and while it works im not sure if its the best. I mean why not solve the problm at the core directly inside agentic graphs.
Though your point that the md files are a strongly worded suggestion is spot on.
Look_0ver_There@reddit
You are correct. The harness counts for a lot.
I've tried OpenCode, Aider, OhMyPi, something I can't remember now, and ForgeCode.
ForgeCode is my current favorite. It does require you to use zsh. I'm an old school bash user, but once I sat down and looked at what zsh brought to the table it was an easy decision to switch to zsh.
ForgeCode has it such that you can either enter full agentic mode, or you can just fire off one-off requests to the agent from your regular command line by starting the line with a : character. It uses multiple agent types, (akin to an analyser, a planner, and an implementer), and it has a completely optional free online integration that acts as an overseer to your local ForgeCode agents to guide them a little better.
ForgeCode ranks as the top agent for coding over on the Terminal Bench rankings, even beating out Claude Code when using Claude Opus as the back end LLM model.
You can use any models you want with it of course, including local.
Emotional-Breath-838@reddit (OP)
really good to see actual usage feedback!
Look_0ver_There@reddit
I was doing a session with ForgeCode last night, and trialing out Qwen3-Coder-Next on a new hardware setup. Q3CN started off at 50t/s, but was down to 20-25t/s at 150K+ context depth. The good news though was that ForgeCode was still able to make the whole experience feel not that different to using native ClaudeCode+Opus/Sonnet as the back end in terms of speed and interactivity. That was certainly some "trick" it was pulling.
Watching the llama.cpp logs I could see that every request from ForgeCode was properly hitting the prompt cache, whereas most other agents will cause occasional misses that slows the whole shebang down significantly. I think that this right here is what the ForgeCode team seem to have properly focused on and sorted out over and above the other coding agents.
You can read some of their blogs regarding their focus on tooling correctness here:
https://forgecode.dev/blog/benchmarks-dont-matter/
https://forgecode.dev/blog/gpt-5-4-agent-improvements/
I believe that this last section of the blog here speaks almost completely to your opening post:
https://forgecode.dev/blog/gpt-5-4-agent-improvements/#what-comes-next
DeepOrangeSky@reddit
I am a noob and don't know what harnesses are or what they do or what the different types are or how people use them, etc. (Right now I'm just running models in LM Studio, without doing any modifications or knowing how to do anything fancy with them yet).
Can you explain in a way that a noob can understand, what harnesses are/what I need to know about them, why they are important, etc?
amb007_@reddit
I would include plugins as viable harnesses, e.g. https://github.com/microsoft/skills/tree/main/.github/plugins/deep-wiki (built based on full apps, reusable by Claude). Improves a lot compared to a naive guiding an LLM to document a codebase.
341913@reddit
Here's an example, I built an app that allows users receiving stock into our warehouses to take a picture of an invoice, which AI then extracts and automatically captures in our ERP. Pretty simple right? Not quite
AI has a tendency to hallucinate so the bulk of the effort went into building a harness which catches the AI attempting to cheat.
When you scan the invoice, you need to lookup the purchase order on the app and also enter the total incl Tax into the app. Traditional code calling APIs.
This total, along with the image(s) of the invoice is sent to AI 1, qwen VL, that extracts the data. The output from AI 1, along with the original PO is then sent to AI 2, something like gemini flash, to reason and map the supplier codes to the internal codes required by the ERP.
When AI 2 is done, a scoring engine is run, boring code doing math, which measures AI concensus ie AI 1 said the invoice had 20 lines but AI 2 says it's 21, a clear hallucination. It does a bunch of other simple calcs like checking that total / units = unit price and that the internal item codes mapped by AI 2 actually exist on the PO etc
Based off this a confidence score is calculated which determines if the invoice can be posted to the ERP or flagged for human review.
That is a harness.
plaintexttrader@reddit
LLMs are the engines. Harnesses are the rest of the car, including the transmission, drive train, suspension, etc. LLMs do basic question answering and reasoning. Harnesses wrap around LLMs and make them much more useful by augmenting them with capabilities like tool calling, web search, querying for information, multi step reasoning, multi session memory, etc. that makes for smarter and more useful applications.
theUmo@reddit
LM Studio is more or less raw-dogging your model. It has a system prompt, you open a chat, you type a thing, it responds, lather, rinse repeat. You just have one context throughout the conversation and it more or less contains your conversation history for that session.
A harness is just an app or other way of running the model that adds some structure in to try to overcome some of the weaknesses of working with a raw chat.
madaradess007@reddit
application is what we need, these things are like fun little e-motors that haven't been put into washing machines, scooters and e-bikes yet
NotArticuno@reddit
I think this is what a huge amount of people are working on, and I totally agree!