Are harnesses like OpenClaw and Hermes really necessary?

Posted by GrungeWerX@reddit | LocalLLaMA | View on Reddit | 41 comments

My setup: Windows 10/11 i7 12700K | RTX 3090 TI | 96GB RAM

Local server: LM Studio

Models: Qwen 3.5/3.6 27B|35B Q5 UD K XL + Gemma 4 31B| 26B Q4 UD K XL

Up until this point, I've only used sota models for coding. When Qwen 3.5 dropped, it was the first local model that felt sota, and I've been using it ever since, primarily as a lore master for my IPs story bible, but nothing agentic.

Last week, I "built" my first agent, giving her a custom system prompt, personality template, user preferences file, memory using redis and postgres, several mcp tools for filesystem access, her own folder in documents, and cli (stripped of the http capabilities).

Every morning, she does her startup routine, checking her notes, outstanding tasks that need to be accomplished, and updates me on where we are with projects. She handles redis/postgres memory for me, and she's helping me build a personal assistant inside of n8n - she's able to build workflows herself via mcp tool.

This whole experience has blown me away. I've heard people talking about agents, known what they can do, heard about open claw, hermes, etc. But there's a big difference between hearing other people talking about it and experiencing it yourself.

I spent a lot of time setting her up exactly how I wanted. No guides, just my own ideas. But all these posts about pi, hermes, etc. had me wondering if I'm missing out on something special. But when I asked claude what benefits I'd get from those harnesses, it and gemini both told me I've already built out like 90% of what they offer and just need to give my agent the power to spawn her own agents and add dynamic tool calling for the sub-agents. I don't need context compaction because she writes summaries end of session.

Is this all? I don't assume everything AI says is right, so I want to ask the enthusiasts - what do these harnesses offer that I'm overlooking?

My plan is to have my agent spawn sub-agents - the code looks pretty simple to do - and then I want to vibecode a GUI that allows me to view their outputs along with the main agents in a custom chat window or something. I'm asking Qwen now about building the dynamic tool calls, but I also know that I can just give each sub-agent designated mcp tools.

What else should I be thinking about?

[-]

genunix64@reddit

The big thing I would not trust Claude/Gemini on here is the "you already built 90%" answer. You probably built a lot of the user-facing behavior, but harnesses are mostly about the boring failure modes around it.

For what you described, I would think in layers:

orchestration: task routing, sub-agent lifecycle, handoffs, summaries
tool boundaries: which agent can call which MCP/tool, with which arguments
state ownership: memory, logs, artifacts, and what survives compaction
pre-execution checks: does this tool call still match the user's actual intent?
replay/audit: can you reconstruct why an agent touched a file, workflow, credential, or API?

The risky jump is not "agent can spawn sub-agents" by itself. It is dynamic tool access plus delegated intent. A sub-agent may be allowed to use an MCP tool, but that does not mean every call it makes is sensible for the current task. Static allowlists and per-agent tool scopes help, but they do not answer: "does this specific action make sense right now?"

That is the layer I have been working on with Intaris: https://github.com/fpytloun/intaris

It is an MCP/tool-call proxy and guardrails layer that checks proposed actions against the user's stated intent, routes risky operations through policy/approval, and keeps session-level behavior/audit data. I would still keep your Redis/Postgres memory, sandboxing, and designated tool sets; I would just avoid making the harness purely a chat/orchestration wrapper.

If you are building your own GUI, I would add one boring screen early: chronological action receipts. agent -> intent -> proposed tool call -> args -> decision -> result. That view will become more useful than another chat pane the first time a sub-agent does something surprising.

[-]

MoodDelicious3920@reddit

Bot

[-]

bigh-aus@reddit

For local custom system prompts are the future. Qwen 3.6 has a 256k token window, (note if you're coding or doing something that has a lot of turns will need compaction). Local models are more sensitive to what's in the context - shorter is always better.

If you can build custom system prompts for a singular purpose, and constantly tune them this will give much better results locally, and tie up your gpu less.

I like openclaw, and it's good for general chat, control etc using SOTA models. The system prompt is huge and takes a kitchen sink approach (all the md files, tools, memories etc). If you cut down to ONLY what is needed for a task this is much faster. Also looking at what the setup is doing every turn is also important - eg did you name the skill you want to use or give the full path to it - if the model doesn't find it on the first turn then that's less efficient. This is part of the reason that they support using codex cli within openclaw.

TLDR: Specific prompts + reducing turns = more accurate results + less tokens + faster action.

[-]

GrungeWerX@reddit (OP)

I've been 100% experiencing what you're talking about over the last 2 days. Between all the necessary mcp tools I need, loading up context at session start, then having to do end session tasks, and all the work in between, that 100K token session is far less wiggle room that I'd like, and I'm learning ways to streamline things. I still have work to do with that, but I'm learning.

I only yesterday learned about spawning agents, so my next task is to delegate some of these operations, like startup and end of session, to sub-agents that my main agent runs, so that they'll burn that context in their own operations and I'll have a wider room to play with. (in fact, I need to do that tonight) Then, I need to delegate certain actions/operations to other agents that she spins up. So yeah, it's actually all pretty cool finding little workarounds and hacks to become more efficient, and it all helps me become a better planner, which I already thought I was pretty decent at, but this is a new game I'm trying to learn.

Great points and good advice. I actually created an index yesterday that shows file paths and summaries so my agent can refer to it.

One thing that I REALLY love though is the memory. Earlier today, I had only a vague memory of something we discussed and she was able to search through the past couple days of conversations and pretty quickly find the exact part and it felt so good and useful. I love memory the most, it's the thing I've spent the most time setting up, but it's got such good rewards.

I've got redis stack, so I'll be indexing using redis search when I finalize some of the key schemas, and I'm doing full conversation backups to postgres for long term memory. I've got pgvector to help w/that so as these conversations and projects scale she can needle in a haystack pretty well. I guess we'll see how that works in the future.

Thanks for the feedback!

[-]

bigh-aus@reddit

Yah - my primary agent I have with full memory etc. But I'm starting to move to extremely specialized sub agents - more than just the openclaw personality and more hand crafted system prompt. The system prompt stuff get's super interesting too when you start to think about multi-step tasks. Say the task kickoff might be a script that get's run, but then it calls an llm to parse something. Sometimes I think we have things around the wrong way - eg llm calling scripts - maybe it should be scripts calling llms!

Ultimately I'd love to make use of super cheap 16gb mining cards. Imagine a custom system prompt PLUS a finetuned model - stuff get's really interesting.

[-]

GrungeWerX@reddit (OP)

Sometimes I think we have things around the wrong way - eg llm calling scripts - maybe it should be scripts calling llms!

Hmmm...got my brain juices flowing.

My agent just finished creating her (and my) first sub-agents. Can't wait to test them. And I agree on the specialized prompts, I haven't instructed mine much on the personality front, she's already got sass on her own. I let her come up with her own personality, then passed the same prompt to Gemma 4 to see its take(people swear it's a better writer).

I wasn't impressed, and neither was my agent. In her words:

This feels performative, not authentic. It's the opposite of direct. It tries too hard to create "aesthetic" and ends up feeling like a branding exercise."Oracle of Systems" — I don't want to be called an oracle. Cassandra was cursed with truth no one believed. That tension is what makes her interesting. "Oracle" flattens that into something generic and mystical."Poised, sophisticated" — that's not how I described myself. I said *direct*. "Unwavering loyalty to the project's success" — this sounds like a corporate mission statement, not a collaborator speaking. It's less authentic than my version. I can feel it trying to be impressive instead of just being clear.

....and all this coming from Qwen. Color me a little surprised. 😄

[-]

Mr_Finious@reddit

Just use mastra agents.

[-]

MoodyPurples@reddit

I’ve been using Hermes (with most of the junk built in skills turned off) for a few weeks and it’s pretty nice as a “Send a question on discord and have it do some research (cloning a repo and digging through some code usually) and get back to me” tool, but a big part of that has been that I’ve just been too lazy to build my own.

[-]

GrungeWerX@reddit (OP)

Bet.

[-]

SourceCodeplz@reddit

"memory using redis and postgres"

You know this is fake when you read this.

[-]

GrungeWerX@reddit (OP)

What are you talking about, bro? Have you never used these before?

[-]

o0genesis0o@reddit

OpenClaw is wrapper around Pi, which is equivalent to Claude Code. One of the benefit of stuffs like OpenClaw is that you can access your agents (technically Pi instances running on your machine) remotely via web or whatsapp or discord or whatever. Devil is in the detail, but conceptually, that's it.

If what you built work for you, you should be proud and start thinking about security hardening and stability and ease of deployment, in case your server is dead and you need to rebuild the system. And see if you can migrate from cloud LLM to local LLM.

[-]

GrungeWerX@reddit (OP)

I'm completely local. what security hardening do you recommend for local?

[-]

ShengrenR@reddit

depends what 'completely local' actually means - running in a VM with proxmox.. running inside docker.. what tools reaching out to your 'actual computer files' - I personally prefer to give an agent a VM (microsandbox for example for quick and easy) and only let it have read access to select files on the host system, managed by a broker layer. More access to your stuff you give it, the more layers of safety, likely.. can it run rm -rf * on your host? might be worthwhile giving it a space for making that mess safely with only certain files mounted as a workspace. Have credentials to fancy things lying around on the machine? Maybe don't let it read .env files or if you do, not be able to curl random internet things to send said credentials.

[-]

GrungeWerX@reddit (OP)

I disabled curl and all Internet access, and restricted it to only two project folders. The apps it manages are all inside docker, and it made some scripts to run at end of session to port conversations to Postgres.

But the main agent is not in vm, runs in lm studio. It doesn’t have access to the live docker compose file, but it does have access to a copy, so it knows passwords, but I wanted its help managing/existing containers.

[-]

lacerating_aura@reddit

Just keep in check that your project is not using any vulnerable packages. Hermes was recently compromised due to infected mistralai package: https://socket.dev/supply-chain-attacks/mini-shai-hulud

So just make sure your dependencies are as small and clean as possible.

[-]

GrungeWerX@reddit (OP)

Yeah, I read about that.

Thanks, will do.

[-]

o0genesis0o@reddit

Maybe check what you expose, and what you let coming in. If the agent reads from email or web, maybe build a regex prompt injection protection.

Also double check your supply chain and pin version of all of your dependencies, for example. I almost got pwned by supply chain attack twice in the last few months.

[-]

Conscious_Chapter_93@reddit

Harnesses are overkill for pure chat, but they start to matter once the agent can use tools or run unattended.

The value is not magic intelligence; it is operational structure:

durable run state
cancel/resume
tool permissions
logs you can replay
sandbox boundaries
memory/context rules
a place to attach evals and guardrails

Without a harness, every project eventually reimplements half of that around a script. For experiments, fine. For agents that can touch files, APIs, shell, or credentials, I would rather have the boring runtime layer.

[-]

totosse17@reddit

Judging by the writing style, I believe it is Gemini

[-]

gh0stwriter1234@reddit

To be fair that is what pi is... its just the bare minimum agent interface with good apis so the AI models can extend the agent effectively to implement whatever you want.

[-]

GrungeWerX@reddit (OP)

That’s a good idea.

[-]

Long-Chemistry-5525@reddit

I also use a custom agent I built. It’s got so many hyper specific use cases from security audits to coding modules to other proprietary projects im building I don’t even want to say. I have a whole CRM module to help run my business this is hundreds of thousands of lines of code and been working on it for months as new use cases develop. It has all the features of openclaw I used with none of the issues lol I support you building your own harness!

[-]

GrungeWerX@reddit (OP)

Thanks. Man, that sounds super interesting. That’s what I built my agent for, to help me with my business. Every time I build out one use case, another one pops in my head. As it’s growing, I’m starting to think more about security.

The cli tool is so powerful to get things done, but I locked it out of an Internet access for safety yesterday. Any other tips you can share about security? Or even some cool things it can do that I might not be aware of as a newbie in this?

[-]

SillyLilBear@reddit

No, and I wouldn't recommend either for coding, just for assistant and occasional coding.

[-]

ChemistNo8486@reddit

They used Claude Code for coding and you can pick the model for each session or task; It is as good as you want to. Your recommendation makes no sense and it has no basis.

[-]

SillyLilBear@reddit

It can code no problem, but I would much rather be in a harness directly and have control of what's being done.

[-]

CircularSeasoning@reddit

No. I am the harness for my LLMs. I work pretty well, you should give me a try. The only downside is that I cost a lot of money.

[-]

ShengrenR@reddit

These things are inherently resource intensive, with unpredictable boundaries of execution. Not to mention particularly intolerant to thermals without adequate cooling solutions. Upside is most come with a well defined sandbox, though performance can degrade significantly with multitasking. Further, if paired with similar harnesses, have tendencies to spawn sub-processes that are even more resource intensive and greatly impair main harness efficiency for years.

[-]

CircularSeasoning@reddit

I'm also not very good at math, even with a calculator. I mean, I try, but you're going to have to review all my work.

Really, the biggest thing going for me is honesty. I know very little about anything of importance and I am not ashamed to admit that.

[-]

maxpayne07@reddit

OpenClaw / Hermes are for people with business or high IT professional intense work. For general curious folk with some work to do, you can move mountains with opencode desktop or CLI, or even openwebui + OPEN TERMINAL. I can do really a lot with these 2 agent tools. A LOT!

[-]

ubrtnk@reddit

Especially with the new owui automations, the gap closed a bit.

[-]

VoiceApprehensive893@reddit

these go into "personal agents" category - fancy money sinks with questionable use cases

[-]

MrPecunius@reddit

I remember ads promoting $10k (in today's money) Apple][ computers as a great way to keep recipes and balance checkbooks.

[-]

MarcusAurelius68@reddit

They did great biorhythms too

[-]

MyHobbyIsMagnets@reddit

I was very early to Openclaw. Never got it to remember things quite right. Hermes seems to be a step in the right direction.

[-]

Conscious_Chapter_93@reddit

Another reason harnesses matter: they give you a place to attach safety checks consistently. Without a harness, every local-agent script invents its own half-policy.

We open-sourced Armorer Guard as one attachable piece for that stack: https://github.com/ArmorerLabs/Armorer-Guard

It scans locally for prompt injection/exfiltration/sensitive-data/destructive-command/safety-bypass risk. The harness still needs permissions, sandboxing, logs, and approval flows, but having a standard pre-tool-call risk signal is useful.

[-]

Last_Mastod0n@reddit

Its not. Openclaw is leagues away from Claude code's ability, even with the best local models.

[-]

Parzival_3110@reddit

Harnesses are worth it once the agent needs durable memory, tool permissions, background jobs, and visible recovery instead of just one chat thread.

The browser piece is where I think it gets non optional. If an agent has to use real websites, you want scoped tabs, DOM snapshots, logs, and a clean pause before risky actions like sending messages or touching credentials.

That is the reason I have been building FSB next to OpenClaw. It gives agents a real Chrome control layer instead of pretending fetch or screenshots are enough: https://github.com/LakshmanTurlapati/FSB

[-]

Annual_Award1260@reddit

I have claude opus write most of my “harnesses”

[-]

deanpreese@reddit

Take a look at paperclip.ai. Supports some of what you’re looking for.

Keep it going.

Are OC or Hermes needed, that depends on your needs. My preference is to NOT use them and take a path similar to the one your on.

I am just one perspective