What exactly does Pi harness mean?
Posted by FrozenFishEnjoyer@reddit | LocalLLaMA | View on Reddit | 71 comments
Hello everyone. I've been reading through this sub for a long time trying to understand what exactly this harness thing is.
The most common word people use here is "Pi Harness", but I'm not sure what exactly it is. I think a lot of people new to local LLMs have this question.
To those who use this Pi Harness, can you explain in the comments what exactly it is? How does it work?
Thanks!
-dysangel-@reddit
A harness just means a framework/set of tools that the model can use to do things. Pi is a coding agent harness
MuDotGen@reddit
To be fair, it's literally been this month where we've all come to even agree that "Harness" is a fairly accurate term for the AI orchestrator programs that we use are (as it "harnesses" the wild and unruly LLM inferences into something usable for agentic AI tasks), so it's pretty new term buzzword.
To answer OP's question though, Pi (also called pi-mono or pi.dev for its domain name as Pi by itself is very ambiguous) is an AI/Agent Harness, a small program that runs locally on your computer, that when given access to a provider (local like llama.cpp server or external API like Claude or OpenAI/GPT), you can talk to it, customize it to use tools for reading and writing files, looking things up, coding, whatever you want.
There are other common harnesses like OpenCode which is built specifically for coding, but Pi is becoming popular for how "light-weight" and customizable it is since you can make skills and tools or fork it and build on top of it. It has a very small system prompt, so it felt very usable for me, even when using a small model like Qwen3.5-4B. Coding will always require much bigger models to be useful (a common debate on this sub about what the bare minimum actually is).
If you don't know what you'd want to do or can't run a very powerful local LLM, it's actually quite easy to start with in my experience so far. You can bring your own API keys too, and it has a setup for that. (Ironically, I had more trouble setting up the local LLM provider though. It has a models.yaml or json file I believe to configure that, depending on what you're comfortable with. If you can't do local, others have suggested OpenRouter is a useful choice.)
I may have made some inaccurate points, but hopefully that gets the gist across.
Cane_P@reddit
Harness is an old term that have been used in software engineering, for decades. A "test harness" is a collection of software tools, data, and configurations used to automate testing by simulating the environment in which a component operates. So not that far off, from what the LLM harnesses does. It just makes sense, to continue to use the same term.
arcanemachined@reddit
Don't forget about https://shittycodingagent.ai/
-dysangel-@reddit
Is it any good?
relmny@reddit
Aren't both "harness" and "orchestrator" terms kinda interchangeable? (at least for some harnesses/orchestrators)
cheesecakegood@reddit
Do you know if there is a meaningful difference between that and the fork ohmypi?
Fine_League311@reddit
Ich mache das mit Mathe und schon weit vor harnes. Ich nenne es ADI = anti dump index. Somit kann ich während des Inputs sogar filtern fürs Training. Ich glaube meine Spinnerei ist älter, lach. Viele Tools achten nicht auf noise.
_derpiii_@reddit
I like how you explained it. But how is a harness different than a 'scaffold'?
arcanemachined@reddit
Nobody calls it a scaffold, so there's that.
It's a new class of tool which allows you, an LLM, and your computer to interact with each other. The world has settled on the term "harness" to describe it, so that's what it is.
_derpiii_@reddit
> Nobody calls it a scaffold
Maybe not within this community, but that terms been used within my circles :)
thread-e-printing@reddit
Yeah, LLMs like to make up stupid private terminology so that you can't make sense with other humans
_derpiii_@reddit
> some kind of magic domination gun
AHAHAH
cheesecakegood@reddit
They are very different nouns and one fits much better. A scaffold is a temporary structure that supports construction and enables slow progress. Also sometimes people already call stuff like a project template setup “scaffolding” in a similar context (filler files and folders that show shape but are temporary)
Harness is what you might fasten to a horse or something. It guides effort in a direction, connecting power to a task. It’s not a perfect word in the sense that it doesn’t give a sense of orchestration or delegation, but it does match what you want the agentic setup to do: offer a means of steering models in helpful productive directions.
_derpiii_@reddit
Wow, I love that analogy! I would have never made either connections (was not aware of harness being a connection point), thank you for that :D
-dysangel-@reddit
I'd say practically they're pretty interchangeable terms, but if I were to make up some difference, I'd say scaffold sounds more fixed in place, so has more implications of a fixed set of tools and maybe even a fixed workflow. Whereas a harness sounds to me something much more flexible where the agent has a lot more choice in how to approach things, and it's easy to add MCP servers to extend functionality.
_derpiii_@reddit
Got it, I like that view.
I don't mean to sound pedantic. I'm new and like knowing the nuanced terms.
Scaffold to me sounds 'fixed' too, aka environment/runtime overhead. Harness feels like more of the tooling abstraction layer.
HomsarWasRight@reddit
I don’t think there’s a substantive difference. Just different words for the same idea. Harness is just what’s become the popular term very recently.
_derpiii_@reddit
I'm getting that vibe as well. Just curious if there's any nuances in the technical definitions between them.
Aaaaaaaaaeeeee@reddit
The "kanban" GUI of Cline has different options for harness, so i did think I've missed out on the newest ideas. Thought it might be more involved like the openai compatible api standards. But if they're not some standardized plug-n-play material, no need to think about it too much.
thread-e-printing@reddit
Scaffolds don't connect a draft animal to an implement
jacek2023@reddit
It's like opencode but better, I use it each day now
https://github.com/badlogic/pi-mono
_derpiii_@reddit
> It's like opencode but better, I use it each day now
What got you to switch? It's been trending recently so I've been watching videos about it but I haven't seen the appeal yet.
Pleasant-Shallot-707@reddit
It’s cleaner and way less opinionated on prompts so you can construct your own harness beyond the basics and have relatively free rein. It keeps the context very clean this way
_derpiii_@reddit
So it's the archlinux of harnesses?
my_name_isnt_clever@reddit
Actually, yeah pretty much.
_derpiii_@reddit
Okay, that's got me sold. Now looking into the meta of what to set up :)
my_name_isnt_clever@reddit
Honestly you only need to know one thing: run
pi ~/.piand tell it to create it's own extensions for anything you want. I had it build out basic web tools and todo that way._derpiii_@reddit
No recommended plugins/workflows? I'm looking forward to it :)
jacek2023@reddit
I was big fan of Arch Linux 20 years ago I remember I was maintaining some packages in AUR, is it still fun? :)
_derpiii_@reddit
Arch + i3wm is **the most** fun and crisp OS I've ever had. Ever.
It's a shame the hardware never really evolved much. But I would pick it up again in a heartbeat if an M macbook could run it.
ariagloris@reddit
pi btw
annodomini@reddit
The appeal to me is that it's minimalist and extensible.
I want a fairly bare bones harness, so I can understand every part of it, before I add more on top of it. Also helps with starting with very little context usage to begin with, so you get less context rot.
Also, the features that Pi does have are great; the
/treemode is really nice, let's you go back and start over from certain points in your conversation._derpiii_@reddit
That sounds very appealing, esp since I've been experimenting down the opposite unnecessary overhead approach of OmO (it's jfc level "why? tf" a minute)
audioen@reddit
I've tried to use this, but I eventually threw it out.
The main reason is that qwen3.6-27b struggles with using the edit tool. Quite a lot -- something I haven't seen happening on any other harness. It gets bad to the point that the model suddenly may even decide that the edit tool is unusable, and starts writing bash scripts and python programs to perform the edits instead, and seems to have success doing it that way. It should not be a quantization issue, as KV cache is either bf16 or fp16, and the model has been either the official fp8 or at minimum unsloth q6_k gguf, both which should be alright in terms of their general accuracy.
As commentary, it is weird to me that the text replacement is literally a search-replace operation. I think I always assumed it to behave on basis of line ranges, e.g. the model instructs the edit tools to remove lines 50-55 and provides the replacement text, but actually the edit operations are based on providing the exact copy of the old text, down to last tab/space whitespace detail, and it must match just once in the file to be acceptable. I see the models struggling with the whitespace in particular, and writing sed scripts all the time so it can see the exact tab/space arrangement for the text to substitute. I don't know why that is necessary in the first place, as I assume that the model should have seen the exact whitespace already from its file reads. (It may be that there is some kind of Python bias here at work because there whitespace is more regular and controlled in that language, whereas I have mixed tab-space arrangements due to multiple people working on a non-Python language.)
The other thing I don't like about tool calls in vllm space is that there is no grammar-based tool call syntax enforcement. As far as I know, in llama.cpp, the tool calls are grammar constrained generation: once the model writes the tokens that start a tool call, that enforces schema-constrained generation from the model until the end of the cool call, but in case of vllm, there is only a post-completion general parser, and that sort of thing is 100% reliant on the model writing the call correctly. For whatever strange reason, with Pi, the qwen3.6-27b makes a lot of mistakes, typically providing the path incorrectly, for example two times in the tool call, which immediately causes rejection despite the redundant path is, in principle, harmless. I haven't read what the edit tool description is for the model, but I bet it's somehow unclear, because whatever the reason, the model struggles mightily in file edits despite it knows exactly what it should get done.
Karyo_Ten@reddit
Replace the edit tool with hashlines: https://github.com/RimuruW/pi-hashline-edit
Or use Oh-my-pi
Writeup from the author: https://blog.can.ac/2026/02/12/the-harness-problem/
jacek2023@reddit
You just confirmed, that you are real user of local LLM (most people here just lie). I had exactly same issue with Gemma models. I solved it by gradually adding some rules to AGENTS.md, but now I think a better solution may be to reimplement edit tool. I also tried hashedit, it adds hashes to lines.
my_name_isnt_clever@reddit
What, you think anyone who is able to find them useful is just lying for no reason? Maybe the problem is you.
Subject_Mix_8339@reddit
I ran into this quite a bit with 27B. Oddly enough, the 35B-A3b seemed to use the edit tool correctly most of the time.
VoidAlchemy@reddit
Sames. I last used opencode to vibe up a pi extension to auto-detect llama-server models running on localhost:8080 and haven't moved back!
pi is *much* leaner so i enjoy that first 10k fastest part of the context window now. plus its not a TUI so my copy/paste between terminals just works. i like it.
hemantkarandikar@reddit
I don't have an answer to OP's question. Instead, as a novice, I have question:
My need is pdf processing -medical reports, investment reports. Basically private stuff.
mac mini M4 16gb ram, macos 15.
Have ollama. openwebui and RAGFLOW in Docker.
Have tried:
NAME ID SIZE MODIFIED
MHKetbi/DeepSeek-R1-Distill-Llama-8B-NexaQuant:latest fc632354bc24 5.3 GB 19 hours ago
qwen2.5:7b 845dbda0ea48 4.7 GB 19 hours ago
gemma3:12b f4031aab637d 8.1 GB 2 weeks ago
mxbai-embed-large:latest 468836162de7 669 MB 2 weeks ago
and Gemma 4 with various settings like chunk size, overlap, temp, top-k, full context etc. Models are too slow, make mistakes.
I tried RAGFLOW, and I can see that it prepares the input as chunks of clean tables. But the LLM queries result in incomplete or wrong answers. I also tried Medgemma. Same issues.
How do you guys get decent results? Will Pi harness help?
Can someone point to some good guides? I will learn. Will try to.
LocoMod@reddit
It’s a successful stealth self promotion campaign perpetrated in this sub and n random comments for the past few weeks. The be of the better successful attempts I’ve seen in trying to skirt this subs rules. There are 1000 harnesses. This is just the latest one that will be dead in a few months.
Yea. I said it.
SnooPaintings8639@reddit
I used to use OpenCode, but I've dropped it in favor of Pi.
Of the other 1000 harnesses, which single one would you acutally suggest, from the same class? I mean, agentic CLI tool, not VSCode extension or single task-management like Aider. I see lots of vibe coded and forked stuff, but I genuinely can't find anything legit.
So yeah, fight the stealth-self-promotion by providing better alternatives. I will gladly test something that is **good**.
my_name_isnt_clever@reddit
Same. I've tried Cline, Aider, Mistral Vibe, OpenCode, and Pi in that order. And I haven't touched the others since I started digging into Pi, it's just the best option for the specific constraints of local LLMs in my opinion.
my_name_isnt_clever@reddit
Jesus christ people, I know LLMs are new but every name you see is not a malicious orchestracted astroturfing campaign. People can just like a thing and talk about it, there's not a conspiracy against you and your favorite tool.
our_sole@reddit
Naming that project pi (pi.dev?) was a really dumb idea. I've been ignoring it thinking its about raspberry pi.
tecneeq@reddit
Pi is an agent that you chat with. It then uses a remote LLM server to get you answers or execute scripts.
Back then we called it a program. Not long ago people called it an app. Today it's a harness. Tomorrow we will call it something else, but it will stay a piece of software you run to get stuff done.
rosie254@reddit
it's a lightweight coding agent thats meant to be an alternative to the likes of opencode and claude code. when i tried to use it it didn't seem to work so well with small-ish local models such as gemma4 26b, mostly due to it needing to always get the search/replace exactly right
Important_Quote_1180@reddit
OpenClaw Hermes opencode codex pi are all harnesses or wrappers. Try to image the LLM as billions of points of knowledge. The harnesses give these points structure to follow.
Zanion@reddit
It's a minimalist wrapper that calls an LLM in a loop and lacks almost all the capabilities you expect from an agentic harness.
o0genesis0o@reddit
Harness is the collection of deterministic software wrapping around an LLM provider, to turn LLM from autocomplete engine into an "agent". At very least, this harness can execute LLM in a loop and trigger all of its tool call, until the LLM decides to stop calling tool. At this point, the final response would be returned to user. The harness can do all sorts of extra things that the harness developers believe to be useful, such as modifying the chat history before sending to LLM, injecting or removing stuffs, changing tool lists, sandboxing all the tool calls, checking security violations of all LLM's tool calls, sanitising inputs and outputs, etc.
Pi is a lightweight and simple harness that does very little beyond running the loop and giving LLM access to tool. If you want other features, you ask Pi to build for you. You can use Pi as the core to build other applications (think of OpenClaw). OpenCode, Codex, Claude Code, even the loop running inside ChatGPT web app and similar are all LLM harness.
Noted that the definition I described above is somewhat different from what some people think about when they think about "Agent Harness". My colleagues working on LLM evaluations think of agent harness as test harness for agents. So, from their view, the whole Pi or Claude Code is an agent. And harness is the thing that wrap around agent to run tests, evaluate, or just stopping from from rm -rf the system.
_derpiii_@reddit
> Harness is the collection of deterministic software wrapping around an LLM provider, to turn LLM from autocomplete engine into an "agent"
I like how you're specific about the deterministic software. Where would you fit in the other overhead like file system?
o0genesis0o@reddit
When I design my harness, I consider file system a built-in tool. Technically, it's no different from any other tool, but I just find it easier if my own code handles the interfacing with the file system. Nothing stops one from designing an MCP for filesystem, nor injecting snapshot of file system into agent's context. Technically, from LLM perspective, ext4 or S3 makes no difference.
_derpiii_@reddit
> Technically, from LLM perspective, ext4 or S3 makes no difference.
I guess that's true. It's a different abstraction layer - the runtime environment (besides niche edge cases) doesn't really matter.
Protopia@reddit
Others have answered what a harness is, but the reasons that your choice of harness is important, and what makes Pi distinctive are...
HARNESSES
The choice of harness has at least as much impact - if not more - on the quality of your agentic output as the LLM you use.
The harness is responsible for everything other than the thinking and token output - it manages:
PI HARNESS
Pi is a basic but highly extensible starting point upon which you can reinvent your own specific variant of wheel.
If you want to do something unique - like a personal agent that communicates via Telegram, has a "soul"/"personality", has a time-based shelter and remembers your preferences - it's a great piece of software to use as a starting point.
It also has loads of extensions people write and share, so if you can find the one that fits exactly what you want (and it's high quality and actively supported), then you don't have to reinvent the wheel.
But if you simply want to have a GOAT agentic coding harness, especially Spec Driven Development, and use it to create great code WITHOUT reinventing the wheel, Pi may NOT be what you are looking for.
JuniorDeveloper73@reddit
its the new buzzword like agents ,USA loves stupid shit one after another,you know gAmE cHaNgEr,things like that
cms2307@reddit
Crazy how someone that’s supposedly a newbie like you can have such strong opinions especially about the one of two countries releasing SOTA models
JuniorDeveloper73@reddit
of course all comes from USA
cms2307@reddit
Yeah we’re the champ I hope you think about that every time you run a model, China is no different with stealing data so if you don’t like it better quit using local llms
JuniorDeveloper73@reddit
And what about all the content from other countries Einstein???better stop using that part of the models.
cms2307@reddit
Nice job editing your comment and deleting your other one, if all you have to do is steal data then why hasn’t Argentina made a SOTA model? 🤣🤣 clearly it takes real talent and only two countries have it. Also, America is absolutely a country and not a continent, there’s three tectonic plates, the North American and Central American plates correspond to the continent of North America, and the South American tectonic plate corresponds to South America. The Americas aren’t even really physically connected because of the Darien gap and there was a water gap until 2.7 million years ago. Saying America is one continent is just Latin American cope.
JuniorDeveloper73@reddit
what comment deleted???America its a continent again go to shool.
cms2307@reddit
The content doesn’t train itself 🤷♂️
JuniorDeveloper73@reddit
lol you dont even have strong arguments.
Makers7886@reddit
hell yeah USA #1
fastlanedev@reddit
A harness is what enabled generated text to have an effect on the world around it by extracting/injecting/doing things with that generated text at runtime
natermer@reddit
They are talking about pi coding agent.
https://pi.dev/
It is described as a "harness" because it is designed to be extensible so you can essentially make your own personal agents out of it.
This is where OpenClaw came from. It was built on top of Pi.
cms2307@reddit
If all you have to do is steal content why hasn’t Argentina made a Sota model then 🤣🤣 clearly you need real talent and only two countries have it
rebelSun25@reddit
A harness is the "thing" you interact with as a user. You either use the CLI or some visual gui to type in some prompt, command, etc into this harness. Pi is just another harness, like opencode, github copilot, claude code, etc. It's very light. By light, I mean, that each harness comes with its own set of behaviours, tools by default. Pi is quite minimal and it relies on just 4 tools to begin with.
imshookboi@reddit
its basically a bare minimum coding agent harness (some use the term agent framework or agent runtime). Think of it this way: Claude code is like 65k tokens sent to the model before your request is even included Pi I think is less than 1k. Way less features but it'll eat less tokens which is important with local models.
reality_comes@reddit
Harness is the new word to describe the software that runs an AI model as an agent. Pi is a popular one for coding specifically.
The harness basically is trying to replicate what the human does, make plans, take actions, make memories, remember things that are important, monitor progress, test, reconfigure, test again, throw a tantrum and delete the whole codebase, etc.