What exactly does Pi harness mean?

[-]

-dysangel-@reddit

A harness just means a framework/set of tools that the model can use to do things. Pi is a coding agent harness

[-]

To be fair, it's literally been this month where we've all come to even agree that "Harness" is a fairly accurate term for the AI orchestrator programs that we use are (as it "harnesses" the wild and unruly LLM inferences into something usable for agentic AI tasks), so it's pretty new term buzzword.

To answer OP's question though, Pi (also called pi-mono or pi.dev for its domain name as Pi by itself is very ambiguous) is an AI/Agent Harness, a small program that runs locally on your computer, that when given access to a provider (local like llama.cpp server or external API like Claude or OpenAI/GPT), you can talk to it, customize it to use tools for reading and writing files, looking things up, coding, whatever you want.

There are other common harnesses like OpenCode which is built specifically for coding, but Pi is becoming popular for how "light-weight" and customizable it is since you can make skills and tools or fork it and build on top of it. It has a very small system prompt, so it felt very usable for me, even when using a small model like Qwen3.5-4B. Coding will always require much bigger models to be useful (a common debate on this sub about what the bare minimum actually is).

If you don't know what you'd want to do or can't run a very powerful local LLM, it's actually quite easy to start with in my experience so far. You can bring your own API keys too, and it has a setup for that. (Ironically, I had more trouble setting up the local LLM provider though. It has a models.yaml or json file I believe to configure that, depending on what you're comfortable with. If you can't do local, others have suggested OpenRouter is a useful choice.)

I may have made some inaccurate points, but hopefully that gets the gist across.

[-]

Cane_P@reddit

Harness is an old term that have been used in software engineering, for decades. A "test harness" is a collection of software tools, data, and configurations used to automate testing by simulating the environment in which a component operates. So not that far off, from what the LLM harnesses does. It just makes sense, to continue to use the same term.

[-]

arcanemachined@reddit

also called pi-mono or pi.dev for its domain name as Pi by itself is very ambiguous

Don't forget about https://shittycodingagent.ai/

[-]

-dysangel-@reddit

Is it any good?

[-]

relmny@reddit

Aren't both "harness" and "orchestrator" terms kinda interchangeable? (at least for some harnesses/orchestrators)

[-]

cheesecakegood@reddit

Do you know if there is a meaningful difference between that and the fork ohmypi?

[-]

Fine_League311@reddit

Ich mache das mit Mathe und schon weit vor harnes. Ich nenne es ADI = anti dump index. Somit kann ich während des Inputs sogar filtern fürs Training. Ich glaube meine Spinnerei ist älter, lach. Viele Tools achten nicht auf noise.

[-]

_derpiii_@reddit

I like how you explained it. But how is a harness different than a 'scaffold'?

[-]

arcanemachined@reddit

Nobody calls it a scaffold, so there's that.

It's a new class of tool which allows you, an LLM, and your computer to interact with each other. The world has settled on the term "harness" to describe it, so that's what it is.

[-]

_derpiii_@reddit

> Nobody calls it a scaffold

Maybe not within this community, but that terms been used within my circles :)

[-]

thread-e-printing@reddit

Yeah, LLMs like to make up stupid private terminology so that you can't make sense with other humans

[-]

_derpiii_@reddit

> some kind of magic domination gun

AHAHAH

[-]

cheesecakegood@reddit

They are very different nouns and one fits much better. A scaffold is a temporary structure that supports construction and enables slow progress. Also sometimes people already call stuff like a project template setup “scaffolding” in a similar context (filler files and folders that show shape but are temporary)

Harness is what you might fasten to a horse or something. It guides effort in a direction, connecting power to a task. It’s not a perfect word in the sense that it doesn’t give a sense of orchestration or delegation, but it does match what you want the agentic setup to do: offer a means of steering models in helpful productive directions.

[-]

_derpiii_@reddit

Wow, I love that analogy! I would have never made either connections (was not aware of harness being a connection point), thank you for that :D

[-]

-dysangel-@reddit

I'd say practically they're pretty interchangeable terms, but if I were to make up some difference, I'd say scaffold sounds more fixed in place, so has more implications of a fixed set of tools and maybe even a fixed workflow. Whereas a harness sounds to me something much more flexible where the agent has a lot more choice in how to approach things, and it's easy to add MCP servers to extend functionality.

[-]

_derpiii_@reddit

Got it, I like that view.

I don't mean to sound pedantic. I'm new and like knowing the nuanced terms.

Scaffold to me sounds 'fixed' too, aka environment/runtime overhead. Harness feels like more of the tooling abstraction layer.

[-]

HomsarWasRight@reddit

I don’t think there’s a substantive difference. Just different words for the same idea. Harness is just what’s become the popular term very recently.

[-]

_derpiii_@reddit

I'm getting that vibe as well. Just curious if there's any nuances in the technical definitions between them.

[-]

Aaaaaaaaaeeeee@reddit

The "kanban" GUI of Cline has different options for harness, so i did think I've missed out on the newest ideas. Thought it might be more involved like the openai compatible api standards. But if they're not some standardized plug-n-play material, no need to think about it too much.

[-]

thread-e-printing@reddit

Scaffolds don't connect a draft animal to an implement

[-]

jacek2023@reddit

It's like opencode but better, I use it each day now

https://github.com/badlogic/pi-mono

[-]

_derpiii_@reddit

> It's like opencode but better, I use it each day now

What got you to switch? It's been trending recently so I've been watching videos about it but I haven't seen the appeal yet.

[-]

Pleasant-Shallot-707@reddit

It’s cleaner and way less opinionated on prompts so you can construct your own harness beyond the basics and have relatively free rein. It keeps the context very clean this way

[-]

_derpiii_@reddit

So it's the archlinux of harnesses?

[-]

my_name_isnt_clever@reddit

Actually, yeah pretty much.

[-]

_derpiii_@reddit

Okay, that's got me sold. Now looking into the meta of what to set up :)

[-]

my_name_isnt_clever@reddit

Honestly you only need to know one thing: run pi ~/.pi and tell it to create it's own extensions for anything you want. I had it build out basic web tools and todo that way.

[-]

_derpiii_@reddit

No recommended plugins/workflows? I'm looking forward to it :)

[-]

jacek2023@reddit

I was big fan of Arch Linux 20 years ago I remember I was maintaining some packages in AUR, is it still fun? :)

[-]

_derpiii_@reddit

Arch + i3wm is **the most** fun and crisp OS I've ever had. Ever.

It's a shame the hardware never really evolved much. But I would pick it up again in a heartbeat if an M macbook could run it.

[-]

ariagloris@reddit

pi btw

[-]

annodomini@reddit

The appeal to me is that it's minimalist and extensible.

I want a fairly bare bones harness, so I can understand every part of it, before I add more on top of it. Also helps with starting with very little context usage to begin with, so you get less context rot.

Also, the features that Pi does have are great; the /tree mode is really nice, let's you go back and start over from certain points in your conversation.

[-]

_derpiii_@reddit

That sounds very appealing, esp since I've been experimenting down the opposite unnecessary overhead approach of OmO (it's jfc level "why? tf" a minute)

[-]

audioen@reddit

I've tried to use this, but I eventually threw it out.

The main reason is that qwen3.6-27b struggles with using the edit tool. Quite a lot -- something I haven't seen happening on any other harness. It gets bad to the point that the model suddenly may even decide that the edit tool is unusable, and starts writing bash scripts and python programs to perform the edits instead, and seems to have success doing it that way. It should not be a quantization issue, as KV cache is either bf16 or fp16, and the model has been either the official fp8 or at minimum unsloth q6_k gguf, both which should be alright in terms of their general accuracy.

As commentary, it is weird to me that the text replacement is literally a search-replace operation. I think I always assumed it to behave on basis of line ranges, e.g. the model instructs the edit tools to remove lines 50-55 and provides the replacement text, but actually the edit operations are based on providing the exact copy of the old text, down to last tab/space whitespace detail, and it must match just once in the file to be acceptable. I see the models struggling with the whitespace in particular, and writing sed scripts all the time so it can see the exact tab/space arrangement for the text to substitute. I don't know why that is necessary in the first place, as I assume that the model should have seen the exact whitespace already from its file reads. (It may be that there is some kind of Python bias here at work because there whitespace is more regular and controlled in that language, whereas I have mixed tab-space arrangements due to multiple people working on a non-Python language.)

The other thing I don't like about tool calls in vllm space is that there is no grammar-based tool call syntax enforcement. As far as I know, in llama.cpp, the tool calls are grammar constrained generation: once the model writes the tokens that start a tool call, that enforces schema-constrained generation from the model until the end of the cool call, but in case of vllm, there is only a post-completion general parser, and that sort of thing is 100% reliant on the model writing the call correctly. For whatever strange reason, with Pi, the qwen3.6-27b makes a lot of mistakes, typically providing the path incorrectly, for example two times in the tool call, which immediately causes rejection despite the redundant path is, in principle, harmless. I haven't read what the edit tool description is for the model, but I bet it's somehow unclear, because whatever the reason, the model struggles mightily in file edits despite it knows exactly what it should get done.

[-]

Karyo_Ten@reddit

Replace the edit tool with hashlines: https://github.com/RimuruW/pi-hashline-edit

Or use Oh-my-pi

Writeup from the author: https://blog.can.ac/2026/02/12/the-harness-problem/

[-]

jacek2023@reddit

You just confirmed, that you are real user of local LLM (most people here just lie). I had exactly same issue with Gemma models. I solved it by gradually adding some rules to AGENTS.md, but now I think a better solution may be to reimplement edit tool. I also tried hashedit, it adds hashes to lines.

[-]

my_name_isnt_clever@reddit

What, you think anyone who is able to find them useful is just lying for no reason? Maybe the problem is you.

[-]

Subject_Mix_8339@reddit

I ran into this quite a bit with 27B. Oddly enough, the 35B-A3b seemed to use the edit tool correctly most of the time.

[-]

VoidAlchemy@reddit

Sames. I last used opencode to vibe up a pi extension to auto-detect llama-server models running on localhost:8080 and haven't moved back!

pi is *much* leaner so i enjoy that first 10k fastest part of the context window now. plus its not a TUI so my copy/paste between terminals just works. i like it.

[-]

hemantkarandikar@reddit

I don't have an answer to OP's question. Instead, as a novice, I have question:

My need is pdf processing -medical reports, investment reports. Basically private stuff.
mac mini M4 16gb ram, macos 15.

Have ollama. openwebui and RAGFLOW in Docker.

Have tried:
NAME ID SIZE MODIFIED

MHKetbi/DeepSeek-R1-Distill-Llama-8B-NexaQuant:latest fc632354bc24 5.3 GB 19 hours ago

qwen2.5:7b 845dbda0ea48 4.7 GB 19 hours ago

gemma3:12b f4031aab637d 8.1 GB 2 weeks ago

mxbai-embed-large:latest 468836162de7 669 MB 2 weeks ago

and Gemma 4 with various settings like chunk size, overlap, temp, top-k, full context etc. Models are too slow, make mistakes.

I tried RAGFLOW, and I can see that it prepares the input as chunks of clean tables. But the LLM queries result in incomplete or wrong answers. I also tried Medgemma. Same issues.

How do you guys get decent results? Will Pi harness help?

Can someone point to some good guides? I will learn. Will try to.

[-]

LocoMod@reddit

It’s a successful stealth self promotion campaign perpetrated in this sub and n random comments for the past few weeks. The be of the better successful attempts I’ve seen in trying to skirt this subs rules. There are 1000 harnesses. This is just the latest one that will be dead in a few months.

Yea. I said it.

[-]

SnooPaintings8639@reddit

I used to use OpenCode, but I've dropped it in favor of Pi.

Of the other 1000 harnesses, which single one would you acutally suggest, from the same class? I mean, agentic CLI tool, not VSCode extension or single task-management like Aider. I see lots of vibe coded and forked stuff, but I genuinely can't find anything legit.

So yeah, fight the stealth-self-promotion by providing better alternatives. I will gladly test something that is **good**.

[-]

my_name_isnt_clever@reddit

Same. I've tried Cline, Aider, Mistral Vibe, OpenCode, and Pi in that order. And I haven't touched the others since I started digging into Pi, it's just the best option for the specific constraints of local LLMs in my opinion.

[-]

my_name_isnt_clever@reddit

Jesus christ people, I know LLMs are new but every name you see is not a malicious orchestracted astroturfing campaign. People can just like a thing and talk about it, there's not a conspiracy against you and your favorite tool.

[-]

our_sole@reddit

Naming that project pi (pi.dev?) was a really dumb idea. I've been ignoring it thinking its about raspberry pi.

[-]

tecneeq@reddit

Pi is an agent that you chat with. It then uses a remote LLM server to get you answers or execute scripts.

Back then we called it a program. Not long ago people called it an app. Today it's a harness. Tomorrow we will call it something else, but it will stay a piece of software you run to get stuff done.

[-]

rosie254@reddit

it's a lightweight coding agent thats meant to be an alternative to the likes of opencode and claude code. when i tried to use it it didn't seem to work so well with small-ish local models such as gemma4 26b, mostly due to it needing to always get the search/replace exactly right

[-]

Important_Quote_1180@reddit

OpenClaw Hermes opencode codex pi are all harnesses or wrappers. Try to image the LLM as billions of points of knowledge. The harnesses give these points structure to follow.

[-]

Zanion@reddit

It's a minimalist wrapper that calls an LLM in a loop and lacks almost all the capabilities you expect from an agentic harness.

[-]

o0genesis0o@reddit

Harness is the collection of deterministic software wrapping around an LLM provider, to turn LLM from autocomplete engine into an "agent". At very least, this harness can execute LLM in a loop and trigger all of its tool call, until the LLM decides to stop calling tool. At this point, the final response would be returned to user. The harness can do all sorts of extra things that the harness developers believe to be useful, such as modifying the chat history before sending to LLM, injecting or removing stuffs, changing tool lists, sandboxing all the tool calls, checking security violations of all LLM's tool calls, sanitising inputs and outputs, etc.

Pi is a lightweight and simple harness that does very little beyond running the loop and giving LLM access to tool. If you want other features, you ask Pi to build for you. You can use Pi as the core to build other applications (think of OpenClaw). OpenCode, Codex, Claude Code, even the loop running inside ChatGPT web app and similar are all LLM harness.

Noted that the definition I described above is somewhat different from what some people think about when they think about "Agent Harness". My colleagues working on LLM evaluations think of agent harness as test harness for agents. So, from their view, the whole Pi or Claude Code is an agent. And harness is the thing that wrap around agent to run tests, evaluate, or just stopping from from rm -rf the system.

[-]

_derpiii_@reddit

> Harness is the collection of deterministic software wrapping around an LLM provider, to turn LLM from autocomplete engine into an "agent"

I like how you're specific about the deterministic software. Where would you fit in the other overhead like file system?

[-]

o0genesis0o@reddit

When I design my harness, I consider file system a built-in tool. Technically, it's no different from any other tool, but I just find it easier if my own code handles the interfacing with the file system. Nothing stops one from designing an MCP for filesystem, nor injecting snapshot of file system into agent's context. Technically, from LLM perspective, ext4 or S3 makes no difference.

[-]

_derpiii_@reddit

> Technically, from LLM perspective, ext4 or S3 makes no difference.

I guess that's true. It's a different abstraction layer - the runtime environment (besides niche edge cases) doesn't really matter.

[-]

Protopia@reddit

Others have answered what a harness is, but the reasons that your choice of harness is important, and what makes Pi distinctive are...

HARNESSES

The choice of harness has at least as much impact - if not more - on the quality of your agentic output as the LLM you use.

The harness is responsible for everything other than the thinking and token output - it manages:

the context (keeping it small and focused)
the system prompt (keeping it small and focused)
the tools (again keeping the context small and focused and enabling actions)
the skills (specialised prompts and tools for different tasks)
the workflow (breaking tasks into smaller, more manageable chunks, doing tasks in logical steps, making workflow and LLM routing choices)
managing queues of tasks, and coordination between parallel execution using swarms

PI HARNESS

Pi is a basic but highly extensible starting point upon which you can reinvent your own specific variant of wheel.

If you want to do something unique - like a personal agent that communicates via Telegram, has a "soul"/"personality", has a time-based shelter and remembers your preferences - it's a great piece of software to use as a starting point.

It also has loads of extensions people write and share, so if you can find the one that fits exactly what you want (and it's high quality and actively supported), then you don't have to reinvent the wheel.

But if you simply want to have a GOAT agentic coding harness, especially Spec Driven Development, and use it to create great code WITHOUT reinventing the wheel, Pi may NOT be what you are looking for.

[-]

JuniorDeveloper73@reddit

its the new buzzword like agents ,USA loves stupid shit one after another,you know gAmE cHaNgEr,things like that

[-]

cms2307@reddit

Crazy how someone that’s supposedly a newbie like you can have such strong opinions especially about the one of two countries releasing SOTA models

[-]

JuniorDeveloper73@reddit

of course all comes from USA

[-]

cms2307@reddit

Yeah we’re the champ I hope you think about that every time you run a model, China is no different with stealing data so if you don’t like it better quit using local llms

[-]

JuniorDeveloper73@reddit

And what about all the content from other countries Einstein???better stop using that part of the models.

[-]

cms2307@reddit

Nice job editing your comment and deleting your other one, if all you have to do is steal data then why hasn’t Argentina made a SOTA model? 🤣🤣 clearly it takes real talent and only two countries have it. Also, America is absolutely a country and not a continent, there’s three tectonic plates, the North American and Central American plates correspond to the continent of North America, and the South American tectonic plate corresponds to South America. The Americas aren’t even really physically connected because of the Darien gap and there was a water gap until 2.7 million years ago. Saying America is one continent is just Latin American cope.

[-]

JuniorDeveloper73@reddit

what comment deleted???America its a continent again go to shool.

[-]

cms2307@reddit

The content doesn’t train itself 🤷‍♂️

[-]

JuniorDeveloper73@reddit

lol you dont even have strong arguments.

[-]

Makers7886@reddit

hell yeah USA #1

[-]

fastlanedev@reddit

A harness is what enabled generated text to have an effect on the world around it by extracting/injecting/doing things with that generated text at runtime

[-]

natermer@reddit

They are talking about pi coding agent.

https://pi.dev/

It is described as a "harness" because it is designed to be extensible so you can essentially make your own personal agents out of it.

This is where OpenClaw came from. It was built on top of Pi.

[-]

cms2307@reddit

If all you have to do is steal content why hasn’t Argentina made a Sota model then 🤣🤣 clearly you need real talent and only two countries have it

[-]

rebelSun25@reddit

A harness is the "thing" you interact with as a user. You either use the CLI or some visual gui to type in some prompt, command, etc into this harness. Pi is just another harness, like opencode, github copilot, claude code, etc. It's very light. By light, I mean, that each harness comes with its own set of behaviours, tools by default. Pi is quite minimal and it relies on just 4 tools to begin with.

[-]

imshookboi@reddit

its basically a bare minimum coding agent harness (some use the term agent framework or agent runtime). Think of it this way: Claude code is like 65k tokens sent to the model before your request is even included Pi I think is less than 1k. Way less features but it'll eat less tokens which is important with local models.

[-]

reality_comes@reddit

Harness is the new word to describe the software that runs an AI model as an agent. Pi is a popular one for coding specifically.

The harness basically is trying to replicate what the human does, make plans, take actions, make memories, remember things that are important, monitor progress, test, reconfigure, test again, throw a tantrum and delete the whole codebase, etc.