its all about the harness

Posted by Emotional-Breath-838@reddit | LocalLLaMA | View on Reddit | 30 comments

over the course of the arc of local model history (the past six weeks) we have reached a plateau with models and quantization that would have left our ancient selves (back in the 2025 dark ages) stunned and gobsmacked at the progress we currently enjoy.

Gemma and (soon) Qwen3.6 and 1bit PrismML and on and on.

But now, we must see advances in the harness. This is where our greatest source of future improvement lies.

Has anyone taken the time to systematically test the harnesses the same way so many have done with models?

if i had a spare day to code something that would shake up the world, it would be a harness comparison tool that allows users to select which hardware and which model and then output which harness has the advantage.

recommend a harness, tell me my premise is wrong or claim that my writing style reeks of ai slop (even though this was all single tapped ai free on my iOS keyboard with spell check off since iOS spellcheck is broken...)

# prevent other accounts from accessing your home directory chmod 700 ~ # add a separate user for AI on Linux. # For Mac do this via the GUI sudo useradd ai # switch accounts in this shell sudo su - ai # go have fun in your separate AI account. # Rock those "dangerously unsafe" options. # Don't give it any API keys you can't afford to burn

[-]

DeepOrangeSky@reddit

I am a noob and don't know what harnesses are or what they do or what the different types are or how people use them, etc. (Right now I'm just running models in LM Studio, without doing any modifications or knowing how to do anything fancy with them yet).

Can you explain in a way that a noob can understand, what harnesses are/what I need to know about them, why they are important, etc?

[-]

amb007_@reddit

I would include plugins as viable harnesses, e.g. https://github.com/microsoft/skills/tree/main/.github/plugins/deep-wiki (built based on full apps, reusable by Claude). Improves a lot compared to a naive guiding an LLM to document a codebase.

[-]

341913@reddit

Here's an example, I built an app that allows users receiving stock into our warehouses to take a picture of an invoice, which AI then extracts and automatically captures in our ERP. Pretty simple right? Not quite

AI has a tendency to hallucinate so the bulk of the effort went into building a harness which catches the AI attempting to cheat.

When you scan the invoice, you need to lookup the purchase order on the app and also enter the total incl Tax into the app. Traditional code calling APIs.

This total, along with the image(s) of the invoice is sent to AI 1, qwen VL, that extracts the data. The output from AI 1, along with the original PO is then sent to AI 2, something like gemini flash, to reason and map the supplier codes to the internal codes required by the ERP.

When AI 2 is done, a scoring engine is run, boring code doing math, which measures AI concensus ie AI 1 said the invoice had 20 lines but AI 2 says it's 21, a clear hallucination. It does a bunch of other simple calcs like checking that total / units = unit price and that the internal item codes mapped by AI 2 actually exist on the PO etc

Based off this a confidence score is calculated which determines if the invoice can be posted to the ERP or flagged for human review.

That is a harness.

[-]

plaintexttrader@reddit

LLMs are the engines. Harnesses are the rest of the car, including the transmission, drive train, suspension, etc. LLMs do basic question answering and reasoning. Harnesses wrap around LLMs and make them much more useful by augmenting them with capabilities like tool calling, web search, querying for information, multi step reasoning, multi session memory, etc. that makes for smarter and more useful applications.

[-]

theUmo@reddit

LM Studio is more or less raw-dogging your model. It has a system prompt, you open a chat, you type a thing, it responds, lather, rinse repeat. You just have one context throughout the conversation and it more or less contains your conversation history for that session.

A harness is just an app or other way of running the model that adds some structure in to try to overcome some of the weaknesses of working with a raw chat.

layer4down@reddit

And no need to build from scratch. Just fork OpenCode or similar and off ya go.

sersoniko@reddit

These’s also Claw-Code now, however I’m not sure how long it’s going to last with Anthropic trying to take it down

AurumDaemonHD@reddit

Always from scratch i wouldnt fork. They dont have anything valuable in them anyway

NotArticuno@reddit

What clown wrote this 😂

Im the whole circus bro never underestimate

This response is funny enough I don't even want to call you a clown anymore LMAO.

But fr you sound goofy saying stuff like that dude. I'm sure you're super good at programming, but saying it like that sounds like you're trying to cosplay as batman.

Thank you that was kind. You are right. The demon in me shadows the gold sometimes. A lifelong journey to learn to live with it and master it.

rorykoehler@reddit

Not much advantage to doing that. You can also fork codex cli. But coding from scratch gives you more possibilities

FeiX7@reddit

Optimization is all we need

Available-Craft-5795@reddit

Attention is All You Need? Nah, Optimization is All You Need.

cleverusernametry@reddit

Another idiotic neologism "harness" and "harness engineering". The llm was always a component to be used as part of a software system. Think of it like a wheel. So far we've been seeing unicycles and we're now just starting to see primitive cars.

Pleasant-Shallot-707@reddit

I agree the harness is the important part, however, I’d say that all we’ve seen so far is potential. We still need to see some more 1-bit models and wider adoption of turboquant, and fully available powerinfer.

Things will be amazing for local models by EOY.

Inevitable_Raccoon_9@reddit

what harness you mean? I encountered no one is adding governance and security - to give guardrails and firewalls to any LLM.
So instead of just more and more band aid plastered upon models - that dont fix the REAL problem - I build my own solution - with governance, security, budgets build into the foundation. Effectivly guarding LLMs

boutell@reddit

Why do people keep trying to solve this problem from inside the harness when Docker and even plain old Unix permissions are right there?

Extremely lazy version:

The only use case I can see for more than that is blocking external network access, to guard against any risk of the project itself being exfiltrated... but most of those risks are easily exploited against human devs too.

Do this at your own risk obvs

Exactly the market is so bad we are all building the same thing in isolation

I build my tool 4 weeks now - and I figuered out ONE thing.
In the past 2 years NOBODY build the tool that is needed!
In the past 4 weeks - not ONE deveoper realized what the ONLY SAFE solution is !

I am sorry - but my tool fixes exactly what is necessary - so I am sure many people will scratch their heads and ask - WHY didn't anyone else build it THIS way!

Last straw for me was claw. Its a dumpsterfire. The fix? Nemoclaw - rust engine policy proxy on top of that.

How do u fix bad design? Is wild to me how all these people are oblivious to the simple truth. It seems to me like common sense is dying. All that is left is social dogma fomo hype.

please have a look at https://github.com/GoetzKohlberg/sidjua - I will hopefully get V1.1 out in a few days so that Openclaw importer and MCP tools are available - hope it helps :)

thrownawaymane@reddit

If you think this is novel please make a separate post so that it can be evaluated.

Cool. I dont know ts much i do my app in python litestar and htmx alpine tailwind cuz javascript hurt me but it only made me stronger.

Look into nono and zerobox one goes with seccomp and bubblewrap another with landlock if u wanna isolate processes.

I do 2 container with api in between one secure another insecure bith douvle rootles with selinux

I wouldnt support windows that platform is going to be dead soon in my eyes hence i use podman systemd quadlets. My credo is support 1 system well than many poorly. So linux it is. Postgres. Arize phoenix and so on. Ofc plugin architecture u can replace if u dont like but i aint doing it.

Your governance layer is external and while it works im not sure if its the best. I mean why not solve the problm at the core directly inside agentic graphs.

Though your point that the md files are a strongly worded suggestion is spot on.

Look_0ver_There@reddit

You are correct. The harness counts for a lot.

I've tried OpenCode, Aider, OhMyPi, something I can't remember now, and ForgeCode.

ForgeCode is my current favorite. It does require you to use zsh. I'm an old school bash user, but once I sat down and looked at what zsh brought to the table it was an easy decision to switch to zsh.

ForgeCode has it such that you can either enter full agentic mode, or you can just fire off one-off requests to the agent from your regular command line by starting the line with a : character. It uses multiple agent types, (akin to an analyser, a planner, and an implementer), and it has a completely optional free online integration that acts as an overseer to your local ForgeCode agents to guide them a little better.

ForgeCode ranks as the top agent for coding over on the Terminal Bench rankings, even beating out Claude Code when using Claude Opus as the back end LLM model.

You can use any models you want with it of course, including local.

Emotional-Breath-838@reddit (OP)

really good to see actual usage feedback!

I was doing a session with ForgeCode last night, and trialing out Qwen3-Coder-Next on a new hardware setup. Q3CN started off at 50t/s, but was down to 20-25t/s at 150K+ context depth. The good news though was that ForgeCode was still able to make the whole experience feel not that different to using native ClaudeCode+Opus/Sonnet as the back end in terms of speed and interactivity. That was certainly some "trick" it was pulling.

Watching the llama.cpp logs I could see that every request from ForgeCode was properly hitting the prompt cache, whereas most other agents will cause occasional misses that slows the whole shebang down significantly. I think that this right here is what the ForgeCode team seem to have properly focused on and sorted out over and above the other coding agents.

You can read some of their blogs regarding their focus on tooling correctness here:

https://forgecode.dev/blog/benchmarks-dont-matter/
https://forgecode.dev/blog/gpt-5-4-agent-improvements/

I believe that this last section of the blog here speaks almost completely to your opening post:

https://forgecode.dev/blog/gpt-5-4-agent-improvements/#what-comes-next

madaradess007@reddit

application is what we need, these things are like fun little e-motors that haven't been put into washing machines, scooters and e-bikes yet

I think this is what a huge amount of people are working on, and I totally agree!