Anybody who tried Hermes-Agent?
Posted by HaAtidChai@reddit | LocalLLaMA | View on Reddit | 75 comments

Curious to hear from those that were interested in this new open source project from NousResearch and compare it to OpenClaw? I know the latter is rife with security vulnerabilities but I'd love to hear if it functions similarly. Also from their repo they mention honcho integration for persistent memory across sessions.
Suitable_Currency440@reddit
Its amazing, its openclaw already set up and working, its like an OC with 1 week of debugging manually done + rag + memory persistence + better tool calling. (Qwen3.5-9b, 16gbvram), 10/10, only will go back to OC if it becomes at least on par with it
huzbum@reddit
Hmm. How many tps you getting on 9b? Might be worth switching to 35b with some experts offloaded. I think 35b MoE is smarter than 9b, and it might be faster with all layers offloaded to GPU and some of the experts offloaded to CPU.
I get 35tps with 35b on my 3060, I imagine your 16GB GPU would do better whatever it is.
ay-em-real@reddit
Did you mean been 35B or like Gemma 4. People say Gemma 4 is better overall? I mean it would.be the best to just download amd try each of them for our personal preferences I guess. I have rx6600.8gb and for some reason I cant get it to run with even Gemma e4b or llma3.2 3b even , it just times out and gives me error. Im definitely doing something wrong here lol, I just got into openclaw and agentic models , I just recently started learning
huzbum@reddit
I meant Qwen3.5 35b, but now I would definitely look at Qwen3.6 35b!
I was using 3.5 27b on my 3090 with 4b on my 3060 for faster secondary tasks. Now I switched to 3.6 35b on my 3090 for everything.
I tried 3.5 35b on my 3060 and it was generating tokens at a very useable 35 tokens per second. I tried the official Gemma 4 26b and it was using a lot more memory than it should have. I later tried an Unsloth quant and it worked as expected.
On an 8GB GPU, it could go either way for performance, depends on system RAM vs GPU bottlenecks. 3.6 35b is definitely smarter than 3.5 9b. I would try it. Use llama.cpp or LM Studio. Enable flash attention, 8 bit kv cache, offload all layers to GPU, offload all experts to CPU. Then turn down expert offload until it’s a snug fit. I’m guessing 2/3 to 3/4.
Different_Fun@reddit
Does it work with local models?
sickleRunner@reddit
I tried hermes on primeclaws.com, it's nice that you can switch between hermes and openclaw and also you get AI models for free
Suitable_Currency440@reddit
Fairly good with qwen3.5-4b, very decent with qwen3.5-9b, VERY good with 27b. Personal agent? Yes Coding for high complexity tasks? Not really, but with high guidance? Yes
Different_Fun@reddit
With what GPU are you running the 27b?
HaAtidChai@reddit (OP)
Hermes uses Honcho memory package, I want to know if it comes installed from the repo. Also how much memory you need to launch sessions per agents (Not counting the memory for local inference).
Iziman95@reddit
So far disappointing.
Coming from OpenClaw with multiple agents, Hermes currently only supports a single agent.
Setup also took around 15 minutes because it kept looping on “It looks like Hermes isn't configured yet” without any error explaining why it thought the setup was incomplete. It eventually worked after several attempts.
On the first prompt it started executing things I never asked for, like importing OC crons
The Telegram token also got truncated during onboarding.
Overall, too much hassle for something that is supposed to replace an OC setup
Jonathan_Rivera@reddit
Yeah, I am trying to turn it into a personal assistant and feel like I'm training it and it keeps making mistakes. One thing that helped immensely is creating a claud code skill to optimize hermes agent and a memory file that claude writes to documenting changes along the way. Everytime hermes f'ucks up I tell claude to look at the telegram conversation and see what went wrong. It is connected to obsidian locally and they share a readme file so claude can step it and help while keeping things consistent.
Hopeful-Cricket5740@reddit
why dont you just use claude code ? just curious... like you made it better by adding claude skills. You use claude to bug test it, and you connected claude basically to the memory (obsidian). at this point why dont people just build a wrapper for claude that talks to telegram or what ever (which claude has similar features already)
Jonathan_Rivera@reddit
Fair question, I like the product but I don't think they have the best intentions for the users and I can't leverage everything on one company.
About a month ago myself and many others on reddit noticed we were burning thorough our plans budget at a ridiculous pace. I could go all week and not hit a budget and now I was hitting it in 3 days into the week. You hit the support chat, no response or it's broken. Hundreds of people cannot reach anyone to complain. After a week the topic is dominating all the claude related subs and an employee posts on X that there is no issue. Frustration spills over to X. Another week goes by and they acknowledge there was some A/B testing going on. Again, it was a post from a random employee and not the official Anthropic support. Now they are A/B testing removing claude code from the pro plan on new user sign ups.
They focus on integrations and kill market share for other companies like Figma, and one they convert over their customers it becomes a walled garden like apple. Open source is the way forward.
kidflashonnikes@reddit
I run a lab at one of the largest AI companies in the world, we just deployed Hermes - incredible. I can't even believe someone made this. This is coming from someone that runs a lab in SF that compress's brainwave data in real time with LLMs, direct threads into brain tissue. I gave everyone on my team 2 days off from work, to conduct a hackathon - science fiction things were created in 48 hours at my team.
Superb-Egg9541@reddit
Any chance we can get an inside look on this? Maybe a blog post or a video? Hell, have hermes do a write up. I'd be interested.
kidflashonnikes@reddit
Im sorry I cant. I work for one of the big labs, sadly I cant. I will say this - the go to model for intelligence on decent hardware is qwen 3.5 27B, if you have two 3090s, use the UD_5XL quant from unlsoth - its amazing. You will get about 25 t/s with this one, at a contex size of 32k, which is perfect for agentic coding on Hermes. If you want more speed - go with GML 4.7 flash, you will get up to 50 t/s on one 3090 at a 32k context window due to the MLA architecture for kvcache ect, run the unsloth quant. These are 2 models that my team used for quick spin up to have the agents actually work. This was before the v6 update that came out yesterday.
My team did an amazing job with this as a fun thing for work - one of their agents is making money already with Telegram. Good luck, this is about as much as I can say
kidflashonnikes@reddit
Also, something importnat to add. use llama.cpp = the newer update fixes the thinking for the qwen 3.5 models - the tool calling failed when using the thinking mode - this is fixed now. Make sure if you run the qwen 3.5 27B on two GPUs (mutli) - flash attention will likely crash the CUDA work on the Nvidia.
spaceface83@reddit
Ohhh I need to try this. I'm running qwen 3.5 122b on my dgx spark and the time spent thinking is insane. I'm not using an nvfp4 quantized model but stillllll. Thinking takes forever. Hopefully if I grab a newer it helps the amount of thinking it takes to respond to "testing" for instance. The plan was to use an llm router like nadirclaw with a 27b model for basic calls and the 122 for complex ones. I'll probably just end up changing to vllm though, for the nvfp4 support
Then again I was gonna do that with openclaw now I need to go play with hermes first!
huzbum@reddit
Is there a reason to run both? benchmarks look like 27b dense is roughly equivalent to 122b. If I were going to run a faster model, it would be 35b or drop down to like 9b or 4b.
spaceface83@reddit
For hermes I ended up running everything on 122b. If I was hardware constrained I would choose the 27b over the 35b though just because it appears much better at that size to use a dense model.
huzbum@reddit
I'm running 27b on my 3090. It's not fast, I'm getting about 35tps. I guess it IS faster than 122b on my system, I only getting about 15tps on my 3090 with most of the experts offloaded. 12tps on my 3060 with all but 2 experts offloaded.
So I'm guessing 27b is *probably* significantly faster than 122b on a unified memory system, but if you're offloading simpler tasks, I'd want something a lot faster like 35b. I get 110tps with 35b on my 3090.
I'm still running Hermes on my GLM 5 Turbo, because why not. I might switch it over to 27b at some point though. Might experiment with some offloading to 4b on my 3060.
kidflashonnikes@reddit
I have 4 RTX PRO 6000s, with 1 TB of DDR5 5600 RAM, with a 96 Core CPU and 14 TB of nvme storage. I run many models together, and them sub agent them, so for example, I am running , multiple full sized qwen 3.5 27B models for tool calling, creating a fleet of 10-12 agnets that effectively are running a business (side hustle) for me. Its probably one of the first companies in the world entirely ran by agents, with a single human (me) in the loop. So far, I made a few thousand USD, its not much, it's mainly a fun side project, but people underestimate how effective the qwen dense 3.5 model is when used properly on the right hardware. I woudl not be surprised if they only released a small variant of qwen 3.6, but kept the main model gated.
Its clear to me what happeend with qwen 3.5 27B - after that, that will likely be the pinnacle in terms of open sourced models with quality. MOving forward, unless the weights are hacked and released, you will absolutely begin to see a downward trend in open sourced models being released. The trend is clear - we are hitting a point where these models are too powerful to be released - they can no longer be in the public, weights will be considered issues of national security.
Disclaimer, I work for one of the big three AI labs - so I can personally attest to the fact that we have spoken already with many labs that open source models - and I can 100% confirm this is going to happen - manily because we got the "talk" from the current white house admin internally (NDA, can't say more than this) that open source models will be cracked down - with this also being applied to hardware - you will see nvidia begin to stop supplying the world with more GPUs for consumers. It sucks - I know, but we need to push back hard and prevent this from happening
AlienRedditMaster@reddit
Even more important to release them to the people. The harder the elite cracks down, the cooler the leaks will be. Cybepunk ftw! ~~Information~~ Intelligence wants to be free.
Silly_Individual4056@reddit
You could be Prometheus and bring fire to us mortals
spaceface83@reddit
yeah honestly for agentic processing i dont care that much about tokens per section as long as its within reason. I care more about how sound the models reasoning is. i typically get like 30 tokens/sec at 122B i think. Even with a DGX spark though, 122B Model + some room for context and you cant do much more.
I have a 5080 on my "normal" computer, so if i ever cared enough i could run some smaller models there at much faster speeds, but thats too much effort for me to orchestrate that compared to the gain i'd get :D
kidflashonnikes@reddit
Get the latest version of llama cpp - the thinking bug is fixed but not the reprocessing bug
spaceface83@reddit
I'll try it out!
ArthurDentsBlueTowel@reddit
It’s a cool story but yeah that’s wildly vague and not helpful.
Particular-Cause-862@reddit
Yea it has bugs, but doesnt have the vulnerabilities cuz they dont want to merge 100 PRs each day, which is fine
huzbum@reddit
Not sure what you mean about single agent... each context is an agent. I just open another slack conversation and it's a new agent. Or a new terminal/CLI.
houseofmates@reddit
exactly
Independent-Pin8300@reddit
AFAIK it actually supports multiple agents? see https://hermes-agent.nousresearch.com/docs/user-guide/profiles/
Final_Elevator_1128@reddit
the architecture split nobody talks about enoug. Hermes for the outer loop, llm-wiki for the inner loop. each layer one job. completely changes how capable your agent is on niche topics
Final_Elevator_1128@reddit
been running the inner/outer infra split for two months. the domain knowledge gap disappeared. Hermes + llm-wiki-compiler. github.com/atomicmemory/llm-wiki-compiler
Double-Fun2396@reddit
I set it up yesterday, and now it has its own repo. Memo (my Hermes agent) writes issues that I can approve. From there, it automatically sets up a plan and an Excalidraw architecture diagram where we discuss the architecture. After that, Memo just starts building it. Man, I love the Hermes-agent. 😍
Potential-Toe1320@reddit
I dont see whats soo great about it. It has a brutal time trying to operate Local Models and API services at the same time....Back to crew ai for me....huge waste of time, unfortunately.
PastTumbleweed6713@reddit
It took me an hour to set up
used openrouter and defaulted to Opus 4.6
would recommend.
Ok-Internal9317@reddit
Hows the cost looking?
sebas85@reddit
It can use your Claude or ChatGPT subscription. That will keep the costs at least predictable.
LynxComprehensive193@reddit
How can you use Claude subscription with it?
ZeroPiXEL-@reddit
Loving hermes, it was incredibly easy to setup, and got ollama working. Also was able to get claude to work with it. Just type hermes model -> Anthropic -> oauth (claude code). (requires to install claude code (with npm)). Works great using sonnet.
OJ-Houston@reddit
Why install Claude code with npm instead of curl https://claude.ai/install.sh | bash
dhlrepacked@reddit
And why is Claude code required? Shouldn’t Hermes replace itv
dontquestionmyaction@reddit
It uses the Claude Code login flow.
8Frostyunderpants@reddit
Also keen to know and which other low cost models could work
fathah_crg@reddit
If you are looking for easy setup then you must try, Native Desktop app.
One click setup, https://github.com/fathah/hermes-desktop
OMGThighGap@reddit
What's the difference between Hermes (or agents in general) and something like opencode?
houseofmates@reddit
opencode is mostly a coding tool for working in the terminal on a project, like cursor. or kiro. or antigravity. or jetbrains ide. or windsurf. hermes/openclaw style agents are broader automation systems that can run continuously, use tools, manage tasks, and act more like a background worker or assistant. you can have several agents running at once on both openclaw and hermes
Comfortable-Air-4630@reddit
Configuration is super easy, telegram integration works perfect . Will be testing tomorrow via docker setup.
mroj84@reddit
estou com dificulfdade em subir ele no docker, ams queria rodar em sidecar com o openclaw. acho que o caminho é mais dificil. como vcs configurarm o container?
papaloukas@reddit
No fully dockerized yet: https://github.com/NousResearch/hermes-agent/pull/1841
Holiday-Pirate-5258@reddit
I've got this inside docker today and it is working perfectly. The only setback was the sudo thing but easy to fix
spaceface83@reddit
I'm running an ARM version of the docker container on my DGX Spark and it works great!
papaloukas@reddit
The PR was merged in main branch. It has some pending issues like missing arm64 image or uvx commad missing.
ambassadortim@reddit
I have a question. Why is telegram often used for this type of setup? Is it for remote connectuoj via phone from anywhere or other?
thatguyinline@reddit
Socket based messaging providers allow the agent to maintain the connection instead of having to expose to internet for receiving webhook. Also a very simple integration.
ambassadortim@reddit
Thanks for the reply
jreoka1@reddit
I like it better than openclaw especially lately. it just works vs openclaw having a lot of issues for me for some reason
Key-Substance5991@reddit
its excellent. i have tested most. current setup is nanclaw but wll migrate fully to hermes agent soon.
TastyChickenLegs@reddit
Installed this morning in less than 10 minutes. The installer migrated my OC agent and took care of the Telegram as well. I literally did nothing except configure the Ollama model.
So far it's crazy fast and has home assistant support builtin.
I fought with Open claw for days with memory problems and broken configs. I realize it's still early to post a good review but the setup was flawless.
Adventurous_Machine2@reddit
How did you set up Home Assistant? It only shows that it can control Philips Hue
ariefb79@reddit
i think hermes great agent but because this is new it has some bugs🧐
Crazy_horse_72@reddit
Guys...Hermes Agent...is super...you have to try it...I tried Openclaw...and agent zero..they are good ..but hermes with Openai Codex is SUPPPPER
matr_kulcha_zindabad@reddit
hey , I am curious. What all are you using it for ?
Crazy_horse_72@reddit
No ..I'm here...I'm using it for every task on linux servers, proxmox, python coding, LXC , Docker.. Iask it to act...connect in SSH et voilà..I use it with minimax 2.7 (9$ per month) and openai codex (20$) per month.
for the rest the other tool I use is Zo computer, for everything related to coding and app development.
Fist try Hermes, after report your feedback..for me it's great
golden_corn01@reddit
he seems a little suspect lol
matr_kulcha_zindabad@reddit
nah just looking for inspiration
golden_corn01@reddit
no, not you. Crazy horse. Some of the responses in here seem like bots or spammers
brianlmerritt@reddit
these days you are a bot. i am a bot. we're all just fucking bots :D
sanchomuzax@reddit
I love the memory management of Hermes Agent. I haven't tried OC; I was looking for a Raspberry Pi-level agent with proper memory management. When I couldn't find one, I started developing my own CodexClaw agent, but I'm not skilled enough to make something like that work well. That's when I came across Hermes. I really like it!
Avios0101110@reddit
Glad i'm not the only one.
Excellent-Baker-1177@reddit
I’ve tried openclaw, nanobot, nanoclaw, etc. but Hermes-Agent is my favorite! It takes all the features I actually care about with none of the bloat.
More auditable/safe imo. Works well with my local models. I have hermes installed on one always-on mac mini with ssh access to my ubuntu servers via pre-shared keys. The mac mini is the “source of truth” or hub for all the orchestration.
I use it either directly on mac mini, or ssh to mac mini from macbook, or via telegram. No multi-node setup!
For lightweight stuff Im currently testing pi.dev against opencode, but so far really liking to lighter token usage.
tracagnotto@reddit
Well, to be honest I run both openclaw and hermes.
Was also thinking to make one of the integrate their A2A protocol to communicate (or a MCP) and well, let's get to the point:
Openclaw developers are a fucking disgrace and often break it on every release. That said I think it has more functionalities and multi agents that win over Hermes.
But Hermes is much much more stable and straightforward and it manages overall stability much better so I really like that it actually runs and never breaks or suddenly interrupts without any reason, things that happens in openclaw.
Recommended to try at least. Running on free Stepfun 3.5 flash
Comfortable-Rice9403@reddit
I will set this up and give it a try
zmanning@reddit
its really nice. highly recomment