Is harness a new buzzword?
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 117 comments
It feels like it became popular only in April.
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 117 comments
It feels like it became popular only in April.
PathIntelligent7082@reddit
monkey hear= monkey repeat, and you can bet your ass 50% of the ppl using it dont know what it means
CircularSeasoning@reddit
What does it mean?
PathIntelligent7082@reddit
it means that half of the ppl using the term "harness" do not know what it means
CircularSeasoning@reddit
I'll go with that.
InteractionSweet1401@reddit
Engine is the model, fuel is the gpu, wheels are the harness, body is the ui. Hope that helps.
CircularSeasoning@reddit
It doesn't help. I think there are a few mixed metaphors there.
InteractionSweet1401@reddit
Explain!
CircularSeasoning@reddit
I don't know, it doesn't seem to help me think about it, that's all. Maybe it helps others.
Kodix@reddit
It's definitely in vogue to focus on the harness right now.
But for good reason. There's a *massive* difference in LLM performance depending on what harness you are using.
Eyelbee@reddit
It's my favorite word since the last month
CircularSeasoning@reddit
I hear you.
novus_nl@reddit
Yes, we used to call it guardrails. But that sounded ‘defensive’. So we now we use the positive forward word “harness” that strengthens the LLM and makes it punch forward other then protect it from their users.
Marketing buzzzZzZz.
CircularSeasoning@reddit
The horse metaphor can go a long way. No pun intended.
srona22@reddit
steer/guidance would be better word. The same reason master-slave in db/server changes into parent-child, git default folder changing into main, instead of master, etc. Just 2 cents.
CircularSeasoning@reddit
If you want to be all politically correct. I'm not a politician, so I'll call it a top-dog/under-dog kind of thing. Doesn't really matter what I call it, as long as you know what I mean.
FastDecode1@reddit
harness DEEZ NUTS
tecneeq@reddit
cleverusernametry@reddit
Harness engineering <- you are here Context engineering Prompt engineering
Equivalent_Job_2257@reddit
You forgot one more - governance. It's all around in LinkedIn.
Pleasant-Shallot-707@reddit
Governance has been used in software engineering for ever
Equivalent_Job_2257@reddit
Maybe by 5% by those who understood why, now 95% "experts" talk about this.
jacek2023@reddit (OP)
I am sorry, but no matter how bad I think about Reddit, LinkedIn is much much worse
q5sys@reddit
Linkedin is just Reddit wearing a suit.
Equivalent_Job_2257@reddit
Compared to the LinkedIn, edit feels like a breathe of fresh air.
BothYou243@reddit
bro straight from theo's video
Pleasant-Shallot-707@reddit
lol “I just started paying attention, what’s this new buzzword?”
Koalababies@reddit
Buzzword? No - I think it's just indicative of where models and AI in general is heading.
I think we're running into diminishing returns in regards to "model intelligence" that we're able to distill via training and the increase in hearing about harnesses kind of echoes that. More and more of the benefits of AI are going to come from how we're able to leverage tooling surrounding the models, not just the models themselves.
vaksninus@reddit
It's a good way to describe the code used to employ models like Claude Code. The Claude Code leak also gave way to a lot of experimentation (at least for me and some others). It's a good word tbh.
MuDotGen@reddit
TLDR; "Harness" seems like a good term for the underlying controller between LLM inference servers and some kind of task, Chat, Assistant, or Coder (which all seem to be in the realm of "Agent" since it can "make its own choices").
This is something I had noticed. People seem to use a lot of words while not fully agreeing on the meanings, a byproduct of this being all so new. This means I've heard things like Claude Code or Cursor called Agentic AI tools or Agentic IDEs, etc., but they've expanded to do other things with tools and MCP servers (such as reading files, making files, sending emails, etc. which is why it seems Claude Coworker is now separate from both Chat and Code now), sometimes being called AI Assistants (such as PicoClaw and OpenClaw) which can and are most often used for coding (as the only users who would use them tend to be more computer literate anyway). It's essentially a program that acts as an autonomous agent with an LLM backed communication interface (or rather, more of the, well, harness to allow you to do those things instead of being the agent itself), but the problem is that interface may or may not be a CLI (you can use a GUI these days too), so I've been struggling to figure out what the best word is.
Other programs that run the LLMs themselves like Ollama and llama.cpp or LMStudio, etc., which only run the LLM inference itself, so they're often called "Providers", "LLM Servers", or "Inference Servers", or LLM Engines, but an engine sounds like it could apply to agentic AI too, like an AI Engine (as AI doesn't refer to just language models, hence why we don't use the more generic "AI" that everyone knows to refer to LLM chat interfaces like ChatGPT or Gemini).
From my understanding with what "Test Harnesses" are in Software engineering,
LLMs are more like the "engine", the lifeblood, if you will, and AI/agent harnesses are the wrapper systems that control and make it ready for use by either an actual AI Agent or a Human who provides input. LLM -> Provides either just inference of intent or actually automated reasoning to decide what tools to use and how to meet an user's goal, and a harness is the one that controls that, defining limits and security and capabilities.
Basically, LLMs seem to have evolved from NLP for natural interfaces with computers to being "brains" for agentic AI as well, but harness seems to be a good term for both use cases, whether a chat interface or an actual autonomous agent you could just let run for long periods of times while it figures out how to handle its directives. Getting some Tron vibes here.
In this new landscape, it's really no wonder this is really difficult to explain to the average person other than "The AI."
tkenben@reddit
In this description, is a harness basically the new "base starting prompt" but with navigation tools?
MuDotGen@reddit
Tldr; pretty much based on what I understand, but not just like giving a toddler a space and some tools they can use, more like a lot of untapped potential for anything that can make the toddler do more useful (or I guess not useful, whatever the user wants) things. So, the base starting prompt could be included in a harness, but it isn't the harness itself.
---- You can ignore the ramblings below.
As a controller, it's more of the logic that controls how you even interact with the LLM and how it is even used. OpenAI style endpoints allow for specific request schema for chats or completion as those are the most common uses of LLMs, but the harness would need to understand how to send not just system prompt and chat history or tool schema, it would also dictate, restrict, and enable how the LLM is used.
For example, I'm playing around with a two step system of separating semantic inference and reasoning from the syntax enforcement in a second and final LLM call. Tests on what I called a semantic router (takes a question or task in natural language -> sends LLM request with full reasoning, then sends the reinforced answer again in a second call with llama.cpp grammar constraints that forces the output into an expected option or JSON format, etc.), showed improvement over completely restricting grammar in one call or just allowing smaller models to try and fit the expected format or options given.
This use isn't Chat, Cowork or autonomous help with common use-cases, nor is a coder, just more of a smarter classifier and similar to what tool calls do, but with pretty much guaranteed valid syntax output. That's why I think harness makes more general sense as a descriptor of what I'm making and testing. Taking that wild animal and putting it to use to do just what I want. So yeah, base/system prompt and hyperparameter adjustments for a model sound like the bare minimum harness, but you can really make a lot of different "controllers."
slippery@reddit
500,000 lines of code for the harness. That's a serious app and makes LLMs much more useful.
Cupakov@reddit
Or it’s a fuckton of LLM-generated spaghetti code, I wonder which is it
slippery@reddit
You could go through the Claude Code leak to see for yourself. From the analysis I read, it was pretty tight with advanced context compaction algorithms and fine grained analytics.
The latest version of the codex desktop app is supposed to be very good, but I'm not using chatGPT right now.
CircularSeasoning@reddit
meme incoming:
LLM: Huge, scary, powerful beast
LLM with harness: Huge, scary, powerful beast riding a tiny tricycle with training wheels slapped on by HR.
-Django@reddit
LLM: airplane engine
LLM with harness: airplane
CircularSeasoning@reddit
You may ship the meme.
-Django@reddit
https://i.redd.it/ragudzxfruvg1.gif
CircularSeasoning@reddit
Any time anyone says anything on any internet platform, it should have a pre-set upvote on behalf of the AI. A rocket emoji will do fir universality.
CircularSeasoning@reddit
Iteration approved.
Zeeplankton@reddit
I never looked through the source code. Is it notably valuable compared to what like OpenCode or codex and others have already figured out / do better?
Loose_Object_8311@reddit
The surface area of a harness can be quite wide ranging. I'd say it's the sum total of all the scaffolding you employ to keep the agent on the rails. It also includes all the tooling and guidance you put in your repo to support better feedback loops between your agent and your software, and add in consistency/determinism to improve performance.
red_hare@reddit
This exactly. Most agents are a loop in a web server. Claude code is like one file of loop and 1.8K files of everything else. The everything else really needed a name.
CircularSeasoning@reddit
Depends what you want out of your LLM. If you're an army general, you might want to call it a "cage" or "reinforced containment zone" because army LLMs are probably highly hazardous.
Heavy-Focus-1964@reddit
yeah it really was a missing term from the AI lexicon, especially because not all harnesses are CLIs
and the vivid imagery of putting a harness on a wild beast to make it controllable can’t be denied
lqvz@reddit
I like “harness” better than “agent.” People refer to Claude Code or OpenClaw as “agents” when I prefer the term “harness.”
MuDotGen@reddit
I like the term agent as a concept we are working towards though, but in the strict sense, it's still very much a work in progress depending on your definition of agentic. Harness feels more specific and accurate to the current landscape.
GraciousMule@reddit
It’s replaced Wrapper. I don’t know why, but it has.
blamestross@reddit
You wrap things. You harness creatures. It's a narrative of agency thing. If you don't feel like you can control it, you harness it.
Thus test harness. The whole point is the thing under test isn't well controlled.
Versus a wrapper normally being deterministic transforms in and out of it.
MuDotGen@reddit
TLDR; ignore my ramblings. I agree on your point about the deterministic nature of wrappers.
That imagery makes more sense to me. LLMs come with increased capability at the cost of deterministic transforms. It's technically deterministic from a purely programming standpoint (reduce all RNG to same seed, etc., you get the same results every time), but wildly capable of putting out some really wild, un-formatted, and unexpected results. A wrapper is a lot more straightforward, but LLM inference not so much, yeah\~
I give an upvote for harness in this situation as it clearly highlights what I've been attempting to do to give more consistent output with what I call an SLM router (trying methods with two-step inference and final grammar constraints, etc., which has had some cool results as I learn from trying it), and the whole process sounds a lot like, well, training a wild animal. lol
It's kind of funny how neural networks are all about training weights to produce novel solutions to potentially various problems from given inputs, and now we need to train or steer those models to output solutions to more specific scenarios. I guess that's the trade-off for convenience since we could technically train smaller models to do specific tasks, maybe even better, but LLMs feel more approachable since it deals with natural language maybe.
Automatic-Arm8153@reddit
Not really. Wrappers were used for businesses claiming they were revolutionary but all they were was an API with a system prompt to the latest OpenAI or Anthropic model.
Eg. If I started a site called like bizness . ai -With the premise that it’s a AI built for businesses
Harness has always been around, it went from the cutting edge enthusiasts to now being widely accepted. A harness is something more than just a simple system prompt.
So while similar in concept they are two different things.
substandard-tech@reddit
No. It’s a good word.
“A harness is a device or structure used to hold, support, control, or connect something so it can function safely or effectively
mateszhun@reddit
It’s a good word, which is exactly why it will quickly become a buzzword. Just as "agentic company" is the current corporate cliche, CEOs will soon be using harness as the next wave of marketing bullshit like, "We're building a harness for AI in XY field".
JamesEvoAI@reddit
It's also an important part of the quality equation:
https://wolfbench.ai/
ihexx@reddit
Yes, it kind of won as the term we use to describe these things. there wasn't consensus before on what to call them; 'scaffold' was the most popular in official releases when talking about agentic code pre-2024. 'wrappers', 'task management system'.
but yeah, 'harness' became a lot more commonplace
gpt872323@reddit
yes just scaffolding.
anon377362@reddit
April is just when you noticed it. It’s been used a lot for many months or more
Happysedits@reddit
It existed in may
NordCoderd@reddit
For me as not native speaker and who read a lot in English - I started seeing this word everywhere last month. I noticed the same as OP
EffectiveCeilingFan@reddit
No. “Testing harness” has been around since I got into local AI. “Agentic harness” might be pretty new but just harness has been used for a while.
PathIntelligent7082@reddit
you don't get the concept of the buzzword... Of course, "harness" is a older term, but it's a buzzword now, not since you got into ai...
mr_eking@reddit
Yes. "Harness" as in "test harness" has been a common software engineering term for at least 15 years. "Agentic harness" is a new adoption of the harness concept, and it's a pretty good use, in my opinion.
smithy_dll@reddit
Borrowed from other fields of engineering. In electrical test harnesses have been used for a very long time to test wire harnesses for example.
CircularSeasoning@reddit
Ultimately borrowed, um, from the English language, which borrowed "harness" from:
Similar word: "hernia". Example usage:
"I popped a hernia from all the hard work of explaining to people on r/LocalLlama that the word "harness" is a pretty decent word that can be shared by many people in different fields and nobody would be harmed or lessened by it."
son_et_lumiere@reddit
quit swinging your harnois around.
CircularSeasoning@reddit
I can't help it. My harness grows larger with every update.
jacek2023@reddit (OP)
True but the current meaning is agentic
Heavy-Focus-1964@reddit
it’s not. it’s just shorthand based on the context
DataPhreak@reddit
You are literally making shit up.
eli_pizza@reddit
You searched for “agent harness” and got results about agent harnesses and this proves something?
DataPhreak@reddit
It proves that u/jacek2023 is correct.
Heavy-Focus-1964@reddit
i have no idea what this is supposed to prove
CircularSeasoning@reddit
Let's be fair to all in this seemingly ridiculous debate: words can have more than one meaning or connotation and can be shared by different fields and niches.
Harness, as in test harness, has been used in software testing for a long time but... LLMs are more than just "software" and the harnesses we use for them encompass psychology and other things as well as plain software test harnessing techniques.
Everyone happy?
Physics-Affectionate@reddit
its been a buzzword for a while
KringleKrispi@reddit
I use harness for work my entire carrier so I have no idea what are you talking about
Lesser-than@reddit
first thing you learn about AI and llms, researchers rename everything to sound unique and SOTA :/ . take every new buzzword with a grain of salt , its likely a very much old tech in new suit.
temperature_5@reddit
No, it has been used in the AI engineer LARPing community for quite some time!
olearyboy@reddit
Agent was too passé
Zanion@reddit
Agents go in the harnesses.
Next we're gonna put the harnesses in harnesses and have a Megazord or smthn. Idk I don't work at YC.
DarePitiful5750@reddit
No not at all, it gets regurgitated every decade or so. Usually in fluffy marketing pitches. Typically if they used that word, you'd know their service was garbage.
Zulfiqaar@reddit
Sortof? Scaffold used to be more popular comparatively I think, but it was always a technical term
DataPhreak@reddit
"Architecture" was perfectly fine for 3 years. "Harness" can go fuck itself.
Uninterested_Viewer@reddit
Huh? Architecture is not a replacement for the word harness in this context.
DataPhreak@reddit
They're literally called Agents. Have been called agents for 3 years. And yes, I mean Agent Architecture, just like Harness means Agent Harness.
https://www.ibm.com/think/topics/agentic-architecture
Uninterested_Viewer@reddit
If you meant "agent architecture" you should have said 'agent architecture". Feels like an enormous hole in your logic right there..
Setting that aside: harness doesn't need the qualifier of "agent" in this context while "architecture" absolutely does, which is one reason why harness is a useful term. I also don't agree that "agent" is necessarily implied here. If anything, the term "agent" became a huge buzzword that lost a lot of meaning.. it implies a level of agency that a harness doesn't necessarily need to provide an LLM.
DataPhreak@reddit
Either they both need the qualifier or they both don't.
Uninterested_Viewer@reddit
You're in a conversation about general AI/LLMs and somebody asks "what's are your thoughts on the best architecture for coding?". That demands a follow up question to know what they mean.
If somebody asks "what's are your thoughts on the best harness for coding?" you immediately can understand they're asking about the Claude codes, geminiCLIs, antigravity, codex, roocodes of the world.
I really don't know why I'm trying to explain this to you.. you know. You know.
DataPhreak@reddit
We wouldn't say best architecture for coding, we would say best Agent for coding. Especially since the Claude Code architecture was closed source and nobody could actually talk to the architecture with any level of legitimacy.
DataPhreak@reddit
Case and point: https://www.reddit.com/r/AgentsOfAI/comments/1sngeuh/which_agents_are_you_using_in_your_development/
FriskyFennecFox@reddit
Yes, it's still "prompt engineering" in its core. We have a surprisingly big amount of buzzwords for it!
timbo2m@reddit
Yep
There's the model, the harness is how you use the model. The harness includes stuff like tools (web search/fetch, command line stuff etc)
Here's some interesting stuff about them from OpenAI and Anthropic
https://openai.com/index/harness-engineering/
https://www.anthropic.com/engineering/harness-design-long-running-apps
If you want to make your own agent harness, start here https://ghuntley.com/agent/
otobot9@reddit
At this point buzzword is a buzzword
MrMisterShin@reddit
No not a new buzzword, it was used many times last year. With the spike in popularity of agentic coding tools. AI Agentic Harness and Scaffolding are effectively used interchangeably. Mostly to describe (Claude code, codex, cursor, cline, kilo, roo, etc etc) these are the harness or scaffolding which try to boost performance for agentic use-cases.
tarruda@reddit
"A harness is a set of straps that are put on a horse so it can be hitched to a wagon or a carriage."
Only the horse is AI and the wagon/carriage is your job.
Queasy_Asparagus69@reddit
I like it. Not everything is an agent
ayylmaonade@reddit
I wouldn't say it's a buzzword, just the proper description for what a... harness is. It's a good descriptor and differentiator from "Web UI" or "Chatbot", etc.
misanthrophiccunt@reddit
Ask Agatha
/wink wink
Kahvana@reddit
Yup, still no clue what it fully entails.
Marksta@reddit
Not really? Orchistrator and any phrasing around that feels more buzz-wordy to me. Because those projects are usually promising the earth and the heavens, when the reality is usually a glorified chat app.
LLMs are inherently useless for anything but chatting without a harness proper. And I've been getting the feeling they're largely useless at chatting without a harness as well. Especially the more training they receive to be agentic, tool usage dependant, their need for a harness that manages the context and tools and spawning sub-LLMs and what have you becomes even more needed now.
Fheredin@reddit
I've referred to AI prompting as "wrangling" on multiple occasions. It isn't that the LLM is alive, but that the system prompt, the expectations the people training the model had, and the things I want to actually do with an LLM seem to wind up in disagreement a lot.
fractalcrust@reddit
its the tool use loop and conversation management, dont think its a buzzword.
what are opencode, claude code, codex, pi, goose etc
Jumpy_Fuel_1060@reddit
Nah it's been around forever. A testing harness is something I've heard of since forever. I take it to mean some ad hoc structure someone built that ties together parts that are otherwise unrelated. QA uses the term all the time, "testing harness". Devs don't use the term because they think their stuff is related and together perfectly.
I've like use the word "goop". Sounds less official, and more indicative in my confidence of it's engineering.
c_pardue@reddit
not a new word, just popular lately for reasons. it is a very old term.
georgeApuiu@reddit
It is until it is not
denoflore_ai_guy@reddit
It's a little more adventurous than coherence or recurisve
Dismal-Effect-1914@reddit
Its just a good word to describe whatever framework/tools you are using to extend a models capabilities.
honestduane@reddit
No it's been around for over a year now, It describes the core wrapper used for inference and how that works, and you're just behind. But don't worry the next buzzword that you'll think of is orchestration, And that's also old hat.
HongPong@reddit
don't worry all you need is a "harness" and an llm feeding code back to itself and that's the "agi". very confident people in Facebook believe this
I_am_BrokenCog@reddit
no. "harness" in relation to "testing harness" and "component harness" have been around for decades.
Commercial-Chest-992@reddit
Others are saying no, but I tend to agree that it’s everywhere more. What does it mean in this context?
CircularSeasoning@reddit
2026 is the Chinese year of the Fire Horse, which is fitting when you think symbolically:
Fire --> Electricity.
Horse --> Work animal, but also good for recreation.
So the Fire Horse = AI, meaning this is the year for AI to really shine and come of age(nts).
https://en.wikipedia.org/wiki/Horse_harness
Harness is a great term because AI is basically a large, powerful beast summoned to do work. If not harnessed, however, it can be either useless or even destructive.
To me it's also interesting how last year was the year of the "Wood Snake" and all the astrology peeps were going on about how the transition from Wood Snake to Fire Horse is a time to shed, or molt... and right around that time we all got blasted by stories about Moltbook, the social media for agents.
https://en.wikipedia.org/wiki/Moltbook
mat8675@reddit
Harness is the best description for what we are building. I’m glad it’s finally caught on.
1ncehost@reddit
Ive been using harness to describe llm runners since two years ago 🤷♂️
x8code@reddit
Yes it is. I'm sick of it already. It's just another word for LLM clients / agents.
false79@reddit
Harness has been around since claude clode. CC got released Feb 2025.
You are very late to the party.
asevans48@reddit
Yeah, I said it once and now cannot stop hearing it.
BidWestern1056@reddit
yeah and it's pretty dumb. anthropic likes to come up with stupid names and abstractions to do things like prompt templating (let's call it skills!)