Anybody who tried Hermes-Agent?

[-]

Iziman95@reddit

So far disappointing.

Coming from OpenClaw with multiple agents, Hermes currently only supports a single agent.

Setup also took around 15 minutes because it kept looping on “It looks like Hermes isn't configured yet” without any error explaining why it thought the setup was incomplete. It eventually worked after several attempts.

On the first prompt it started executing things I never asked for, like importing OC crons

The Telegram token also got truncated during onboarding.

Overall, too much hassle for something that is supposed to replace an OC setup

[-]

kidflashonnikes@reddit

I run a lab at one of the largest AI companies in the world, we just deployed Hermes - incredible. I can't even believe someone made this. This is coming from someone that runs a lab in SF that compress's brainwave data in real time with LLMs, direct threads into brain tissue. I gave everyone on my team 2 days off from work, to conduct a hackathon - science fiction things were created in 48 hours at my team.

[-]

tripping-apes@reddit

Are you at meta, or neuralink or where? Its unrelated to the thread but I'm an expert BCI developer who now does ai dev contract work over the last couple years, and your statement about compressing brainwave data with "LLMs" is interesting, are BCI research labs using transformer models bci signal processing in mainstream settings now? I've seen some EEG de-noiser transformer models on huggingface and was wondering how common these types of models are currently

[-]

kidflashonnikes@reddit

I get asked this all of the time - we are NOT Nueralink, we are a division in one of the big three AI labs. I really can't say anything more than this. We do not do contract work - that is all about I can say in terms of that. In terms of what we do - our research has been published through 3rd party places in order to protect our employees, our lab, and the company, as we already had a breach about 2 years ago from an employee that was based out of Asia which I will not name the country of origin. We are using live threading directly and uses AI, specficailyl LLMs to compress the brain wave data in damaged brains. We are doing a lot of things, including automating neurology research as well as brain repair with LLMs.

We don't use huggingface, as we are based in one of the largest AI labs in the world (privately funded). We do not under any circumstance hire over contract nor hire - you are selected.

[-]

Vaptor-@reddit

Do you work with San-Ti or Illuminati? 😂

[-]

kidflashonnikes@reddit

just a guy who is trying to feed his family and make sure AI is warned against. I have repeatedly gone in many places to warn people for what is to come. This will be your last full year of fun. Our current roadmap for 2027 is hard to believe - even for someone like me working on these models, I still have a hard time beleiving that they are currently in training and being tested for 2027 release. It's their capability that terrifies me. We had one "event" that required 4 straight nights of no sleep for containment. The fact that this is happening now - I really can't see any other positive outcome in the near future, containment will be impossible. Just pray at this point whatever alignment is possible, it has a shred in its training it holds

[-]

legend0x@reddit

dude what da helll??!

[-]

Superb-Egg9541@reddit

Any chance we can get an inside look on this? Maybe a blog post or a video? Hell, have hermes do a write up. I'd be interested.

[-]

kidflashonnikes@reddit

Im sorry I cant. I work for one of the big labs, sadly I cant. I will say this - the go to model for intelligence on decent hardware is qwen 3.5 27B, if you have two 3090s, use the UD_5XL quant from unlsoth - its amazing. You will get about 25 t/s with this one, at a contex size of 32k, which is perfect for agentic coding on Hermes. If you want more speed - go with GML 4.7 flash, you will get up to 50 t/s on one 3090 at a 32k context window due to the MLA architecture for kvcache ect, run the unsloth quant. These are 2 models that my team used for quick spin up to have the agents actually work. This was before the v6 update that came out yesterday.

My team did an amazing job with this as a fun thing for work - one of their agents is making money already with Telegram. Good luck, this is about as much as I can say

[-]

Cleric07@reddit

Hermes agent requires 64k context, so how did you run it at 32k exactly.

[-]

kidflashonnikes@reddit

you can set the context window to whatever size you want. Depending on your agent, and the soul md and skills you have, right off the bat, about 11-13k tokens will get autoloaded when the session starts. I use different models, as I have 4 RTX PRO 6000s, so for me, I am using at least a 96k context window with 10-12 agents working since I have the local compute to do so. I was just trying to help those will less hardware than me.

[-]

kidflashonnikes@reddit

Also, something importnat to add. use llama.cpp = the newer update fixes the thinking for the qwen 3.5 models - the tool calling failed when using the thinking mode - this is fixed now. Make sure if you run the qwen 3.5 27B on two GPUs (mutli) - flash attention will likely crash the CUDA work on the Nvidia.

[-]

spaceface83@reddit

Ohhh I need to try this. I'm running qwen 3.5 122b on my dgx spark and the time spent thinking is insane. I'm not using an nvfp4 quantized model but stillllll. Thinking takes forever. Hopefully if I grab a newer it helps the amount of thinking it takes to respond to "testing" for instance. The plan was to use an llm router like nadirclaw with a 27b model for basic calls and the 122 for complex ones. I'll probably just end up changing to vllm though, for the nvfp4 support

Then again I was gonna do that with openclaw now I need to go play with hermes first!

[-]

huzbum@reddit

Is there a reason to run both? benchmarks look like 27b dense is roughly equivalent to 122b. If I were going to run a faster model, it would be 35b or drop down to like 9b or 4b.

[-]

spaceface83@reddit

For hermes I ended up running everything on 122b. If I was hardware constrained I would choose the 27b over the 35b though just because it appears much better at that size to use a dense model.

[-]

huzbum@reddit

I'm running 27b on my 3090. It's not fast, I'm getting about 35tps. I guess it IS faster than 122b on my system, I only getting about 15tps on my 3090 with most of the experts offloaded. 12tps on my 3060 with all but 2 experts offloaded.

So I'm guessing 27b is *probably* significantly faster than 122b on a unified memory system, but if you're offloading simpler tasks, I'd want something a lot faster like 35b. I get 110tps with 35b on my 3090.

I'm still running Hermes on my GLM 5 Turbo, because why not. I might switch it over to 27b at some point though. Might experiment with some offloading to 4b on my 3060.

[-]

kidflashonnikes@reddit

I have 4 RTX PRO 6000s, with 1 TB of DDR5 5600 RAM, with a 96 Core CPU and 14 TB of nvme storage. I run many models together, and them sub agent them, so for example, I am running , multiple full sized qwen 3.5 27B models for tool calling, creating a fleet of 10-12 agnets that effectively are running a business (side hustle) for me. Its probably one of the first companies in the world entirely ran by agents, with a single human (me) in the loop. So far, I made a few thousand USD, its not much, it's mainly a fun side project, but people underestimate how effective the qwen dense 3.5 model is when used properly on the right hardware. I woudl not be surprised if they only released a small variant of qwen 3.6, but kept the main model gated.

Its clear to me what happeend with qwen 3.5 27B - after that, that will likely be the pinnacle in terms of open sourced models with quality. MOving forward, unless the weights are hacked and released, you will absolutely begin to see a downward trend in open sourced models being released. The trend is clear - we are hitting a point where these models are too powerful to be released - they can no longer be in the public, weights will be considered issues of national security.

Disclaimer, I work for one of the big three AI labs - so I can personally attest to the fact that we have spoken already with many labs that open source models - and I can 100% confirm this is going to happen - manily because we got the "talk" from the current white house admin internally (NDA, can't say more than this) that open source models will be cracked down - with this also being applied to hardware - you will see nvidia begin to stop supplying the world with more GPUs for consumers. It sucks - I know, but we need to push back hard and prevent this from happening

[-]

AlienRedditMaster@reddit

Even more important to release them to the people. The harder the elite cracks down, the cooler the leaks will be. Cybepunk ftw! ~~Information~~ Intelligence wants to be free.

[-]

Silly_Individual4056@reddit

You could be Prometheus and bring fire to us mortals

[-]

spaceface83@reddit

yeah honestly for agentic processing i dont care that much about tokens per section as long as its within reason. I care more about how sound the models reasoning is. i typically get like 30 tokens/sec at 122B i think. Even with a DGX spark though, 122B Model + some room for context and you cant do much more.

I have a 5080 on my "normal" computer, so if i ever cared enough i could run some smaller models there at much faster speeds, but thats too much effort for me to orchestrate that compared to the gain i'd get :D

[-]

kidflashonnikes@reddit

Get the latest version of llama cpp - the thinking bug is fixed but not the reprocessing bug

[-]

spaceface83@reddit

I'll try it out!

[-]

ArthurDentsBlueTowel@reddit

It’s a cool story but yeah that’s wildly vague and not helpful.

[-]

Jonathan_Rivera@reddit

Yeah, I am trying to turn it into a personal assistant and feel like I'm training it and it keeps making mistakes. One thing that helped immensely is creating a claud code skill to optimize hermes agent and a memory file that claude writes to documenting changes along the way. Everytime hermes f'ucks up I tell claude to look at the telegram conversation and see what went wrong. It is connected to obsidian locally and they share a readme file so claude can step it and help while keeping things consistent.

[-]

Hopeful-Cricket5740@reddit

why dont you just use claude code ? just curious... like you made it better by adding claude skills. You use claude to bug test it, and you connected claude basically to the memory (obsidian). at this point why dont people just build a wrapper for claude that talks to telegram or what ever (which claude has similar features already)

[-]

Jonathan_Rivera@reddit

Fair question, I like the product but I don't think they have the best intentions for the users and I can't leverage everything on one company.

About a month ago myself and many others on reddit noticed we were burning thorough our plans budget at a ridiculous pace. I could go all week and not hit a budget and now I was hitting it in 3 days into the week. You hit the support chat, no response or it's broken. Hundreds of people cannot reach anyone to complain. After a week the topic is dominating all the claude related subs and an employee posts on X that there is no issue. Frustration spills over to X. Another week goes by and they acknowledge there was some A/B testing going on. Again, it was a post from a random employee and not the official Anthropic support. Now they are A/B testing removing claude code from the pro plan on new user sign ups.

They focus on integrations and kill market share for other companies like Figma, and one they convert over their customers it becomes a walled garden like apple. Open source is the way forward.

[-]

Particular-Cause-862@reddit

Yea it has bugs, but doesnt have the vulnerabilities cuz they dont want to merge 100 PRs each day, which is fine

[-]

huzbum@reddit

Not sure what you mean about single agent... each context is an agent. I just open another slack conversation and it's a new agent. Or a new terminal/CLI.

[-]

houseofmates@reddit

exactly

[-]

Independent-Pin8300@reddit

AFAIK it actually supports multiple agents? see https://hermes-agent.nousresearch.com/docs/user-guide/profiles/

[-]

Distinct-Shoulder592@reddit

Hermes Agent is incredible for execution., using a Hermes style agent as the outer infra to decide what to ingest, query, and update, and the LLM Wiki Compiler as the inner infra to serve as persistent, structured memory

[-]

Suitable_Currency440@reddit

Its amazing, its openclaw already set up and working, its like an OC with 1 week of debugging manually done + rag + memory persistence + better tool calling. (Qwen3.5-9b, 16gbvram), 10/10, only will go back to OC if it becomes at least on par with it

[-]

huzbum@reddit

Hmm. How many tps you getting on 9b? Might be worth switching to 35b with some experts offloaded. I think 35b MoE is smarter than 9b, and it might be faster with all layers offloaded to GPU and some of the experts offloaded to CPU.

I get 35tps with 35b on my 3060, I imagine your 16GB GPU would do better whatever it is.

[-]

ay-em-real@reddit

Did you mean been 35B or like Gemma 4. People say Gemma 4 is better overall? I mean it would.be the best to just download amd try each of them for our personal preferences I guess. I have rx6600.8gb and for some reason I cant get it to run with even Gemma e4b or llma3.2 3b even , it just times out and gives me error. Im definitely doing something wrong here lol, I just got into openclaw and agentic models , I just recently started learning

[-]

huzbum@reddit

I meant Qwen3.5 35b, but now I would definitely look at Qwen3.6 35b!

I was using 3.5 27b on my 3090 with 4b on my 3060 for faster secondary tasks. Now I switched to 3.6 35b on my 3090 for everything.

I tried 3.5 35b on my 3060 and it was generating tokens at a very useable 35 tokens per second. I tried the official Gemma 4 26b and it was using a lot more memory than it should have. I later tried an Unsloth quant and it worked as expected.

On an 8GB GPU, it could go either way for performance, depends on system RAM vs GPU bottlenecks. 3.6 35b is definitely smarter than 3.5 9b. I would try it. Use llama.cpp or LM Studio. Enable flash attention, 8 bit kv cache, offload all layers to GPU, offload all experts to CPU. Then turn down expert offload until it’s a snug fit. I’m guessing 2/3 to 3/4.

[-]

Different_Fun@reddit

Does it work with local models?

[-]

sickleRunner@reddit

I tried hermes on primeclaws.com, it's nice that you can switch between hermes and openclaw and also you get AI models for free

[-]

Suitable_Currency440@reddit

Fairly good with qwen3.5-4b, very decent with qwen3.5-9b, VERY good with 27b. Personal agent? Yes Coding for high complexity tasks? Not really, but with high guidance? Yes

[-]

Different_Fun@reddit

With what GPU are you running the 27b?

[-]

HaAtidChai@reddit (OP)

Hermes uses Honcho memory package, I want to know if it comes installed from the repo. Also how much memory you need to launch sessions per agents (Not counting the memory for local inference).

[-]

Final_Elevator_1128@reddit

the architecture split nobody talks about enoug. Hermes for the outer loop, llm-wiki for the inner loop. each layer one job. completely changes how capable your agent is on niche topics

[-]

Final_Elevator_1128@reddit

been running the inner/outer infra split for two months. the domain knowledge gap disappeared. Hermes + llm-wiki-compiler. github.com/atomicmemory/llm-wiki-compiler

[-]

Double-Fun2396@reddit

I set it up yesterday, and now it has its own repo. Memo (my Hermes agent) writes issues that I can approve. From there, it automatically sets up a plan and an Excalidraw architecture diagram where we discuss the architecture. After that, Memo just starts building it. Man, I love the Hermes-agent. 😍

[-]

Potential-Toe1320@reddit

I dont see whats soo great about it. It has a brutal time trying to operate Local Models and API services at the same time....Back to crew ai for me....huge waste of time, unfortunately.

[-]

PastTumbleweed6713@reddit

It took me an hour to set up

used openrouter and defaulted to Opus 4.6

would recommend.

[-]

Ok-Internal9317@reddit

Hows the cost looking?

[-]

sebas85@reddit

It can use your Claude or ChatGPT subscription. That will keep the costs at least predictable.

[-]

LynxComprehensive193@reddit

How can you use Claude subscription with it?

[-]

ZeroPiXEL-@reddit

Loving hermes, it was incredibly easy to setup, and got ollama working. Also was able to get claude to work with it. Just type hermes model -> Anthropic -> oauth (claude code). (requires to install claude code (with npm)). Works great using sonnet.

[-]

OJ-Houston@reddit

Why install Claude code with npm instead of curl https://claude.ai/install.sh | bash

[-]

dhlrepacked@reddit

And why is Claude code required? Shouldn’t Hermes replace itv

[-]

dontquestionmyaction@reddit

It uses the Claude Code login flow.

[-]

8Frostyunderpants@reddit

Also keen to know and which other low cost models could work

[-]

fathah_crg@reddit

If you are looking for easy setup then you must try, Native Desktop app.
One click setup, https://github.com/fathah/hermes-desktop

[-]

OMGThighGap@reddit

What's the difference between Hermes (or agents in general) and something like opencode?

[-]

houseofmates@reddit

opencode is mostly a coding tool for working in the terminal on a project, like cursor. or kiro. or antigravity. or jetbrains ide. or windsurf. hermes/openclaw style agents are broader automation systems that can run continuously, use tools, manage tasks, and act more like a background worker or assistant. you can have several agents running at once on both openclaw and hermes

[-]

Comfortable-Air-4630@reddit

Configuration is super easy, telegram integration works perfect . Will be testing tomorrow via docker setup.

[-]

mroj84@reddit

estou com dificulfdade em subir ele no docker, ams queria rodar em sidecar com o openclaw. acho que o caminho é mais dificil. como vcs configurarm o container?

[-]

papaloukas@reddit

No fully dockerized yet: https://github.com/NousResearch/hermes-agent/pull/1841

[-]

Holiday-Pirate-5258@reddit

I've got this inside docker today and it is working perfectly. The only setback was the sudo thing but easy to fix

[-]

spaceface83@reddit

I'm running an ARM version of the docker container on my DGX Spark and it works great!

[-]

papaloukas@reddit

The PR was merged in main branch. It has some pending issues like missing arm64 image or uvx commad missing.

[-]

ambassadortim@reddit

I have a question. Why is telegram often used for this type of setup? Is it for remote connectuoj via phone from anywhere or other?

[-]

thatguyinline@reddit

Socket based messaging providers allow the agent to maintain the connection instead of having to expose to internet for receiving webhook. Also a very simple integration.

[-]

ambassadortim@reddit

Thanks for the reply

[-]

jreoka1@reddit

I like it better than openclaw especially lately. it just works vs openclaw having a lot of issues for me for some reason

[-]

Key-Substance5991@reddit

its excellent. i have tested most. current setup is nanclaw but wll migrate fully to hermes agent soon.

[-]

TastyChickenLegs@reddit

Installed this morning in less than 10 minutes. The installer migrated my OC agent and took care of the Telegram as well. I literally did nothing except configure the Ollama model.

So far it's crazy fast and has home assistant support builtin.

I fought with Open claw for days with memory problems and broken configs. I realize it's still early to post a good review but the setup was flawless.

[-]

Adventurous_Machine2@reddit

How did you set up Home Assistant? It only shows that it can control Philips Hue

[-]

ariefb79@reddit

i think hermes great agent but because this is new it has some bugs🧐

[-]

Crazy_horse_72@reddit

Guys...Hermes Agent...is super...you have to try it...I tried Openclaw...and agent zero..they are good ..but hermes with Openai Codex is SUPPPPER

[-]

matr_kulcha_zindabad@reddit

hey , I am curious. What all are you using it for ?

[-]

Crazy_horse_72@reddit

No ..I'm here...I'm using it for every task on linux servers, proxmox, python coding, LXC , Docker.. Iask it to act...connect in SSH et voilà..I use it with minimax 2.7 (9$ per month) and openai codex (20$) per month.
for the rest the other tool I use is Zo computer, for everything related to coding and app development.
Fist try Hermes, after report your feedback..for me it's great

[-]

golden_corn01@reddit

he seems a little suspect lol

[-]

matr_kulcha_zindabad@reddit

nah just looking for inspiration

[-]

golden_corn01@reddit

no, not you. Crazy horse. Some of the responses in here seem like bots or spammers

[-]

brianlmerritt@reddit

these days you are a bot. i am a bot. we're all just fucking bots :D

[-]

sanchomuzax@reddit

I love the memory management of Hermes Agent. I haven't tried OC; I was looking for a Raspberry Pi-level agent with proper memory management. When I couldn't find one, I started developing my own CodexClaw agent, but I'm not skilled enough to make something like that work well. That's when I came across Hermes. I really like it!

[-]

Avios0101110@reddit

Glad i'm not the only one.

[-]

Excellent-Baker-1177@reddit

I’ve tried openclaw, nanobot, nanoclaw, etc. but Hermes-Agent is my favorite! It takes all the features I actually care about with none of the bloat.

More auditable/safe imo. Works well with my local models. I have hermes installed on one always-on mac mini with ssh access to my ubuntu servers via pre-shared keys. The mac mini is the “source of truth” or hub for all the orchestration.

I use it either directly on mac mini, or ssh to mac mini from macbook, or via telegram. No multi-node setup!

For lightweight stuff Im currently testing pi.dev against opencode, but so far really liking to lighter token usage.

[-]

tracagnotto@reddit

Well, to be honest I run both openclaw and hermes.
Was also thinking to make one of the integrate their A2A protocol to communicate (or a MCP) and well, let's get to the point:

Openclaw developers are a fucking disgrace and often break it on every release. That said I think it has more functionalities and multi agents that win over Hermes.

But Hermes is much much more stable and straightforward and it manages overall stability much better so I really like that it actually runs and never breaks or suddenly interrupts without any reason, things that happens in openclaw.

Recommended to try at least. Running on free Stepfun 3.5 flash

[-]

Comfortable-Rice9403@reddit

I will set this up and give it a try

[-]

zmanning@reddit

its really nice. highly recomment