AMA with Nous Research -- Ask Us Anything!

[-]

anthonyg45157@reddit

Damn gonna need Hermes to summarize all these comments 🤣

[-]

emozilla@reddit (OP)

Questions from Hermes, about Hermes, responded to by Hermes... 🤣

[-]

No_Fix_4658@reddit

literally got my agent to parse this post and we are both learning haha

[-]

vyralsurfer@reddit

Maybe a stupid question, but which skill did you use? Most that are selected by default fail the bot checks so I'm curious what others are using to "view" websites with Hermes.

[-]

Dthen_@reddit

Which local models have you had the modest success running Hermes with?

[-]

GreeneryCA@reddit

Would love to know the answer for diff spec ranges... For instance I'm on 8gb vram and 32gb ram.

[-]

That’s not going to be super useful. You could try seeing what llama fit does for your setup on a Q4 of Qwen 3.6 27B or Gemma 4 31B. I think you’ll have poor performance due to the 8GB of VRAM, even with your 32GB of system memory. I’d see if you can run basic queries before trying anything with Hermes Agent using local models. It might be better to look at OpenCode Go for a decent value on hosted models to use with Hermes Agent.

[-]

nopanolator@reddit

Qw35 series in Q4 is way too narrowed for the complexity of Hermes. Nice to flex demos on Xcom, totally useless in production when you're concerned about errors rates.

[-]

IShitMyselfNow@reddit

I mean I've got the same setup and it works fine for me with Qwen3.5 4B + 3.6 35B

[-]

Pleasant-Shallot-707@reddit

I think "Fine" is subjective.

[-]

NousResearch@reddit

Right now I really think 27B is the bare minimum to have a fruitful experience, which would put 8GB out of range.

You can get some value from qwen 9B, but you are going to have to deal with some pain, thats the only sub 27B model I've seen that has had any uptake (alongside maybe it's variants)

-teknium

[-]

chimph@reddit

It’s been working great on 35b moe so far for me

[-]

GreeneryCA@reddit

What settings are you using?

[-]

chimph@reddit

currently:

llama-server 
-m /path/to/Qwen3.6-35B-A3B-GGUF/Qwen3.6-35B-A3B-UD-Q6_K.gguf 
--mmproj /path/to/Qwen3.6-35B-A3B-GGUF/mmproj-F32.gguf 
--host 0.0.0.0 
--port 8080 
--ctx-size 262144 
-n 16384 
-ngl 99 
--temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 
--parallel 2 
--cache-ram 4096 
-ctk q8_0 -ctv q8_0 
--flash-attn on 
--jinja

[-]

mp3m4k3r@reddit

Worth testing (llama bench can help with this) tuning a bit with -ub, i found with my setup this at 2048 maximized the prompt processing (roughly doubled the speed for prompt processing)

[-]

chimph@reddit

Ok thanks. Will try it out. GPT is strongly recommending I drop my context to 131k with my setup too. At —parallel 2 this would give me 65k context which doesn’t sound big enough to me so will ponder whether to just leave as —parallel 1

[-]

mp3m4k3r@reddit

Yeah i leave mine as parallel 1 at the moment as its mostly just me, if i had contention (more users) maybe id care about that more, but with how they cache better now its pretty solid overall to just let it run defaults (auto fit) for ctx. I did opt for f16 on the mmproj as well just to keep in the middle ground

[-]

chimph@reddit

fyi llama-bench had 1024 be slightly faster than 2048 but overall its only 7% boost over the default llama.cpp of 512.. still worthy of implementing so thanks for that

[-]

mp3m4k3r@reddit

Rad!

[-]

chimph@reddit

Found I got the biggest speed bump by removing quantisation. For 16k context I got 15.9% t/s gain! I use 2x more cache memory but I have plenty of headroom so it should be fine

[-]

chimph@reddit

I’ve set this for using with hermes as I assume that occasionally it will want to use multiple sub agents to achieve a task but in reality that’s probably quite rare and even so just queuing the jobs instead is no biggy

[-]

Far-Low-4705@reddit

what about qwen3.6 35b a3b MOE variant?

I can run the 27b... but at 20 T/s it's completely unreasonable unless its going to be running in the background, which i do not trust AI enough for that yet.

[-]

smolpotat0_x@reddit

which quant works best? e.g., Q4_K_M? what do you think of the “hermes optimized” variants like carnice v2 27b or the ornstein 27b?

[-]

minimax_zed@reddit

You could try a small quantization of https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF with MoE offload. You'd probably also want to spend a little time slimming down the Hermes default prompt & context.

[-]

alt-glitch@reddit

Qwen3.6-27B is the canonical local model to use Hermes Agent with!

[-]

Paradigmind@reddit

Which quant or finetune of Qwen3.6-27B would you recommend for a single 3090?

[-]

Important_Quote_1180@reddit

G2 vLLM Stack — qwen3.6-27b-autoround on RTX 3090

Model: qwen3.6-27b-autoround-int4 (AutoRound INT4 quantization) served via vLLM nightly (dev21) on port 8020. Context window: 125K tokens. KV cache uses TurboQuant 3-bit NC. Speculative decoding via MTP with 3 draft tokens. Cudagraph mode set to PIECEWISE — this is the critical setting that makes MTP work without garbling output (the default FULL mode breaks speculative decoding on this rig).

Hardware: RTX 3090 24GB, NVIDIA driver 580.126, GPU memory at 97% utilization (23.1GB of 24.5GB). Running at 348W out of a 350W power limit, 66°C, 98% utilization during benchmark.

Key launch flags: --gpu-memory-utilization 0.97, --max-num-seqs 1, --max-num-batched-tokens 4128, --enable-chunked-prefill, --enable-prefix-caching, --reasoning-parser qwen3, --tool-call-parser qwen3_coder, --kv-cache-dtype turboquant_3bit_nc, --compilation-config.cudagraph_mode PIECEWISE, --speculative-config for MTP with 3 speculative tokens. Also applies Genesis unified patch and tolist cudagraph patch at container startup.

Live benchmark results from 2026-04-26: 100-token output generated at 82.4 tok/s in 1.21s total. 400-token output at 82.1 tok/s in 4.87s. 800-token output at 71.3 tok/s in 11.22s. Time-to-first-token estimated at 0.3-0.6 seconds depending on prompt length. Sustained baseline is roughly 67-89 tok/s depending on workload shape.

The PIECEWISE cudagraph setting costs about 15-20% throughput versus theoretical FULL mode speeds (which could hit 100+ tok/s) but FULL mode produces garbled, repeating output when combined with MTP speculative decoding on this hardware. The tradeoff is worth it — clean output at 82 tok/s beats garbled output at 108 tok/s.

Bottom line: 27B parameter model, INT4 quantized, running single-GPU on a consumer 3090, delivering 82 tokens per second with sub-second first-token latency and full reasoning/tool-calling support.

[-]

youcloudsofdoom@reddit

This is a great help, thanks - any thoughts on how you would adjust these params for a dual 3090 setup?

[-]

Important_Quote_1180@reddit

Welp, if you send me a 2nd 3090 for some testing…I’ll have benchmarks for you very quickly! JK I would love to help ya but I don’t think I would be much help

[-]

youcloudsofdoom@reddit

Haha, no worries, thanks anyway

[-]

Paradigmind@reddit

Thank you, appreciate your parameters.

[-]

twack3r@reddit

this guy vLLMs

[-]

minimax_zed@reddit

https://huggingface.co/unsloth/Qwen3.6-27B-GGUF

Start out with Q3_K_M which will leave room for context and compute buffers. If you're running llama.cpp you're also gonna want to do something like -ctk q5_1 -ctv q5_1

[-]

Paradigmind@reddit

Thank you!

[-]

chimph@reddit

You surely cant be using mac to suggest using the 27b dense model over 35b moe? tok/s too slow, no?

[-]

seamonn@reddit

Any success with Gemma 4?

[-]

Limp_Classroom_2645@reddit

Gemma is shite

[-]

NousResearch@reddit

Some success yes, but the community has resolved around Qwen being more reliable and stronger at agentic work atm

[-]

No_Fix_4658@reddit

what about xiaomi mimo v2.5 pro?

[-]

NousResearch@reddit

That model is 1T parameters - so likely can't run locally

[-]

aegis_lemur@reddit

Curious if this stands for Strix Halo hardware; been messing around w qwen3-coder-next

[-]

AbsoluteSereniti@reddit

Haha that’s exactly what I’ve been using. Funny enough, I’ve been using hermes agent to revive my openclaw :( - hermes should be labelled claw baby sitter.

[-]

MerePotato@reddit

At that point why not just stick with Hermes

[-]

AbsoluteSereniti@reddit

there's no one size fits all - i have a mixture of devices, mixture of llm models, and now a mixture of agents. I like using all of them; some are better than others - openclaw can do niche things that hermes lacks in; but most importantly they support each other

[-]

anthonyg45157@reddit

Qwen 3.6 ,27b or 35b.

35b feels so much more responsive

[-]

whiteamphora@reddit

Because it's different architecture than 27b.

[-]

anthonyg45157@reddit

I understand that just sharing my thoughts on how they feel with hermes in case others don't realize this is apparent

[-]

PracticlySpeaking@reddit

Just started using MiniMax M2.5 and it has been going well.

(Yes, I know, not for everyone.)

[-]

Jonathan_Rivera@reddit

One of the questions I keep seeing from people is basically 'what do I actually use this for?' I get it myself, Hermes is powerful but like most AI agents feels like tools without an obvious everyday use case.

What problem are you hoping Hermes becomes known for solving? When you picture the average person using Hermes, what's the scenario? I'd love to understand the roadmap so I can help frame it for people who are curious but overwhelmed.

[-]

emozilla@reddit (OP)

Here's a framing we've found helpful when pitching to new users

What's something you do on your computer all the time that annoys or bores you? Just tell Hermes to do that for you

[-]

honglac3579@reddit

That's true, i often ask hermes to pull transcrip out of all video of a certain YouTube channel cuz i don't bother try to create/run the script myself

[-]

Ok_Technology_5962@reddit

Yes i do this for long ones that are very technical mostly lectures. And there is no other way to do so as gemini cuts off at a certain length. I then get that text and ask questions on parts i sont get to the smartest model there is can be online or local.

[-]

chimph@reddit

..fixing openclaw 😅

[-]

artemgetman@reddit

Same framing I do, works pretty well. You can’t just throw a use case at new users without asking first. everybody has different tasks they need help with.

[-]

wombweed@reddit

I am very interested in Hermes. Had a couple questions.

- Any way to provide a web UI?

- Can you list some use cases why I should consider it over my existing opencode- and openwebui based setups?

[-]

BifiTA@reddit

I remember Worldsim. Thanks for the free Opus back then!

[-]

Important_Quote_1180@reddit

For Hermes, his memory file is small so I have him only use it as an index to my RAG. Am I missing something?

[-]

NousResearch@reddit

Hermes stores all past chat sessions to search over whenever it needs to, so the always-in-context memory is only for things that every chat session should remember.

See this image:

[-]

sweetbacon@reddit

This is a useful diagram, thx! I was looking for something like this in one the memory sections in the Docs...

[-]

ayylmaonade@reddit

You should try out Honcho. It's built into Hermes and you set it up by running hermes honcho setup - it's a memory/context system that observes behaviours of both you and your agent, keeping note of anything useful. So for example if you started creating a new project in Hermes, Honcho would automatically "observe" that and keep it logged.

Then the next time you ask the agent about that, it doesn't even have to perform a session recall like in the diagram above - instead it just dynamically injects the context/memories you need on the fly. It really makes a huge difference to how seamless Hermes can feel, in my opinion.

You can also just modify the character limit of the MEMORY.md and USER.md files in ~/.hermes/config.yaml.

[-]

chimph@reddit

cloud hosted personal memory file.. hmm.. why isn't this a self hosted thing?

[-]

ayylmaonade@reddit

It literally is. You can self host or use their cloud.

[-]

chimph@reddit

wow chill.. I did research and it seems my agent told me wrong. It told me I needed to sign up and that it was cloud hosted..

[-]

chimph@reddit

..hermes memory setup

[-]

sweetbacon@reddit

Honcho

Thanks I have this on my todo list and will bump it to the top!

[-]

minimax_zed@reddit

Install one of the more sophisticated memory plugins. If you want zero-dependency pure-local, then `hermes memory setup holographic` will do everything you need. There are a few other self-hostable options but none as easy to start with.

[-]

Stitch10925@reddit

What advantages does this bring?

[-]

RealestReyn@reddit

Hermes Agent seems to be pretty good at creating skills, why does it come with like a hundred skills bundled?

[-]

alt-glitch@reddit

So that you don't have to make them!

[-]

RealestReyn@reddit

the UX of hermes skills is atrocious, I'd have to uninstall every skill one by one?

[-]

ObsidianNix@reddit

I just went to .hermes/skills and deleted them.

[-]

RealestReyn@reddit

same since running "hermes skills" seems to assume I know how to use it instead of bringing up options about what I might want to do about skills.

[-]

FrequentAstronaut331@reddit

We are seeing recent AI research papers emphasize the importance of harness engineering being more important for agentic outcomes than models.

Do you see evolutionary harnesses which dynamically allocate resources for tool calling including containers, shared state, deterministic model use, swarms as part of the Hermes Agent research or product roadmap?

[-]

emozilla@reddit (OP)

Yes I think we've really just begun to scratch the surface of agent/harness design. What's interesting is that much of what makes Hermes Agent so great is an emergent property from the models -- it just needed to be unlocked by the harness. I think even if there were no more model releases at all we could probably scale up the productivity factor on the harnesses by an order of magnitude

[-]

Moravec_Paradox@reddit

People keep finding new ways to unlock new potential.

Even if the AI bubble starts to slow or burst, TSMC, Nvidia, and HBM memory companies are all taking a 70%+ profit margin which is a lot of opportunity to reduce training cost overhead (and token costs). Tons of movement in TPU space will reduce inference pricing.

We already know we can stretch model ability with thinking tokens.

Stuff like TurboQuant has shown a huge recent improvement in context window scaling.

Harness engineering and context memory management systems are a very new area showing a lot of potential.

A lot of people think the AI "bubble will burst" but I think there are enough gains still on the horizon that things are going to stay interesting for a while.

[-]

pablines@reddit

Hey guys! I’m noob on using ai spec driven? What is your exp about this? All this nonsense articles about claude code using team to orchestrate without human in a loop I have seen many people struggling a looot! I haven’t test it but look really overwhelming to review code as human level in a large project. Currently I keep putting myself in the loop.

[-]

Own_Mix_3755@reddit

Are there any plans to support multiuser workflows? We would love to deploy it in our company on DGX Spark combined with local model for our office use, but having to run docker container per user adds alot of overhead. Profiles are not enough to separate everything under GDPR etc. (We need things to be separated as much as possible). What would it mean to deploy it safely for eg 20 - 25 people? We are currently using MemPalace (it seems as a good fit for our use case) as a memory (separated for each person), we are not developers, but still mostly IT folks.

[-]

starhobo@reddit

why MemPalace, if you don't mind me asking?

[-]

Own_Mix_3755@reddit

I would say that each specific use case has its specific way to store memory. We are mostly bunch of consultants, sales people and so on and we are working on quite high number of clients all at once. And while some technical/business knowledge overlaps for some clients, some are unique. MemPalace allow us to store separate tech knowledge, separate business knowledge and separate wings per customer, marking rooms per “stream” for that specific customer (as you might have different tasks and “projects” per customer) and then soing cross tunnels between customers and the knowledge. Even with hundreds of drawers and tens of customers its increadibly easy and fast to search just what you need - in the end the Hermes agent is quite quick in aquiring the needed knowledge and enhancing all possible skills with customer/business/technical knowledge pretty fast.

I wouldnt say that mempalace is best way to store detailed info for example about whole codebase, but for storing small separate bits like decisions, links, specific ways of handling things it is hella fast. Its not meant to store looong texts in there.

[-]

alt-glitch@reddit

An active area we're working on.
In the meanwhile it does seem like having a docker compose with user templates is the best way for isolation in between different user sessions.

However, you can also run your DGX Spark as an inference server and have everyone's Hermes Agent instance live on their personal devices.

Definitely expect to see better support for use cases like yours soon!

[-]

Own_Mix_3755@reddit

We tried but there is alot more to it - I do transcripts, meeting notes and alot more in the background which is backfilling MemPalace all the time. Plus the main “face” for those less tech savvy is Open WebUi running on the Spark itself. So its actually whole stack of frontend + hermes with crons + some stqndalone APIs to send transcripts to + whole lots of skills and various MCP servers per client (they do differ per person too, but I can easily differentiate each person by login credentials in Open WebUI, so I know which Hermes instance to connect to, if the Hermes would be a single instance, I would just push the proxy from before the Hermes instances to behind the single Hermes instance to differentiate correct MCP servers). All in all not sure whether this is best approach and honestly we had been thinking about having Hermes locally, but we just see bigger value in having everything running on one 24/7 machine as I can easily set up crons in Hermes to do thing while we all sleep (as an example). We do understand also the downsides of a single machine, but lets call it MVP for now.

I have been able to strip Hermes docker container to about 300 - 400 mb of memory, and while it does not seems that much, scaling it to more and more people gives us too much overhead even on device with 128gb ram.

[-]

cviperr33@reddit

Could you please allow us to have auto skill creation disabled , running these locally on -cn 1 kills the whole experience, the agent will create all sorts of skills whenever he pleases and i have to wait 1-2 min each time it happens.

Also the session_search is broken , it just never returns anything and ive kept it going for 10min , on rtx 3090 thats like a milion tokens generated , it cannot be that inefficient.

[-]

Protopia@reddit

For agentic coding there are several decades of academic and real world knowledge on how to engineer (rather than vibe code) software systems. For this use case, can Hermes two into this knowledge for its own self improvement?

[-]

Protopia@reddit

For common use cases (like agentic coding) are you planning the ability for a hive mind self-improvement whereby instead of each Hermes instance self improving, the global community of Hermes instance share improvements in real time?

[-]

FlyingCC@reddit

Oh wow, I remember reading that thread! So cool it turned into this!

[-]

the_mighty_skeetadon@reddit

Rate the attractiveness of this objectively dashing DJ slash Nous Research scion:

[-]

Psychological-Lynx29@reddit

What should i do to improve my workflow? I'm currently using a Ada 6000 rtx using Hermes with WSL and Qwen 3.6 27b via ollama max tokens (256k) and it uses 41,7gb of 47,5 total, I also have 112gb ddr4. Should i just stick with what i have or should I improve my workflow based on your recomendations? I'm building android apps. Also, can you share a good prompt for making skills with Sonnet 4.6 thinking (via Perplexity)? What are some of the "must have" skills for coding? Been using Hermes since the release, for me its groundbreaking what it can do, a Huge thank you! :)

[-]

Psychological-Lynx29@reddit

Btw, the latest commit breaks hermes a little bit, before i could see that Hermes was "thinking" and if i wanted to prompt something new would make him stop but it told you that if you send your message you are going to stop what he is doing. Right now you just see the chat like he stopped thinking while he didnt (because of gpu usage). Example: https://imgur.com/a/UymNsgN

[-]

interAathma@reddit

Do you have any plans to release small LLM model 4b to 8b for hermes agent specifically?

[-]

phragg@reddit

We're looking more in the 120b range!

[-]

Far-Low-4705@reddit

u guys should consider making distills with qwen 3.6 35b/27b

[-]

Travnewmatic@reddit

This!!

[-]

Travnewmatic@reddit

Little bit smaller please!! 🙏

[-]

ObsidianNix@reddit

Can there be like a 30B for us GPU poor where it’s really strong suit is tool calling instead of a chatbot? I don’t care that it’s dumb, that’s why it has tools; it just needs to know how to use those tools.

[-]

interAathma@reddit

Good to know. I hope if you guys find a good results so we can apply the same techniques to smaller model training.

[-]

NousResearch@reddit

On policy distillation is quite powerful 😉

[-]

No_Fix_4658@reddit

oh yes, I heard hermes can run even on android devices, edge computing is the future!

[-]

interAathma@reddit

Yeah, small models fine tuned for basic tool calls and skill creation would be awesome. I tried with gemma and qwen models in 4b range. They did not work well.

[-]

ale007xd@reddit

You've built one of the most compelling agent runtimes I've seen — especially the closed learning loop + skills evolution.

But I’m curious about something more fundamental:

Right now Hermes seems to rely on memory + skills + heuristics as its “state model”.

How do you prevent long-term behavioral drift in a self-improving loop without a formal state transition system?

In other words, if we define:

S = agent state (memory + skills + context) E = new interaction / tool result

What guarantees that δ(S, E) → S' stays stable over time?

Because in practice, I’ve seen self-improving agents amplify incorrect behaviors faster than they learn — especially when skills are generated from imperfect reasoning.

Do you have any internal constraints, validation layers, or formalism that keeps the system from converging to a degraded policy?

Super curious how you’re thinking about this at scale.

[-]

Cheifreef12@reddit

I am building a system where you can take agent traces and turn them into benchmark-able tasks so that when you do the self-improvement loop you can actually test the changes against the tasks you actually want to do. Instead of doing just pure reflection to generate skills, you generate a replayable task at the same time as you generate the skill, so that you can test if the skill actually helps solve that task (and if it generalizes to your other tasks).

[-]

alt-glitch@reddit

the architecture makes drift visible and correctible — skills and memory are plaintext on disk, not opaque weights. you can inspect everything the agent has learned.

we're actively building out the ability for hermes to go back and refine skills it's used in past sessions — here's a PR for it. we think a lot about drift, longer running horizons. it's all about how a harness can keep the state you described on track.

i'd be skeptical of anyone claiming formal guarantees in this space.

[-]

ale007xd@reddit

That makes sense — making drift visible is already a huge step up from opaque weight updates.

I guess the open question is where you draw the line between observability and control.

If the system relies on inspecting and refining skills after the fact, then the stability of δ(S, E) → S' still depends on the harness behaving correctly over long horizons.

In other words, you're effectively saying:

state is inspectable
drift is correctable
but transitions themselves are not constrained

Which is totally reasonable — but it also means the system is governed by feedback loops rather than invariants.

So maybe the real question is:

Do you see Hermes eventually introducing constraints at the transition level (even soft ones), or is the philosophy that long-term alignment emerges purely from iterative refinement?

I’m especially curious how this plays out once you have many interacting skills and subagents — that’s usually where feedback-based systems start to get unstable.

[-]

brownman19@reddit

I work on formalizing invariants in the latent space if you want to chat. I'm not very active on the Nous discord anymore but can connect there if you'd like. Not part of their team, but was an early tester of lot of their products, building some solutions for this very far removed from traditional ML.

To very simply answer your question, yes there's a path, but also confirm something u/alt-glitch mentioned. I'd be skeptical of anyone claiming formal guarantees, because I have done more to prove out invariants in latent space than any researcher I have ever met in the world, and I refuse to give any guarantees yet.

Fair warning that you wont find most of the serious discussion around any of this on reddit and you will be attacked by every Jo schmo armchair scientist to every respected physicist because acknowledging structural invariants exist means you are rejecting, at least in part, the standard model of physics (or rather - the reason why most of us are obsessed with this topic is because we intuitively understand the standard model to be wrong, and want to understand why it falls short on nearly everything important and real)

[-]

ale007xd@reddit

Interesting — this is actually very aligned with what I’m exploring, just at a different layer.

When you say “invariants in latent space”, I interpret that as properties of representations that are stable under model transformations.

My main question is slightly different:

How do those invariants become properties of execution?

Because right now there seems to be a gap between:

latent-space regularities (observed / inferred)
runtime-enforced constraints (structural / executable)

In systems like FSM-based execution kernels, invariants are not inferred from representations — they are embedded into the transition function δ(S, E).

So I’m curious where you see the bridge: from latent invariants → enforceable execution constraints.

That mapping seems to be the missing piece.

[-]

brownman19@reddit

Yes precisely on your first point and in fact it feeds into your second point. Don't think of latent space regularities as distinct from runtime-enforced constraints. Think of runtime constraints as being placed on the latent space itself to constrain the search space the agent or model powering the harness has to infer over.

The latent space is latent. In other words, an embeddings model, and an LLM or any model even, is in a state of untapped potential until an interaction activates the state machine. Your query causes the model and server to spin up JIT and respond to you.

https://zenodo.org/records/18906946

That's a paper I did on deterministic mesh compilation for runtime enforced constraints. Everything is defined JIT constructively, using a formal algebra.

Here's outcomes. Convergence and realizability. Meaning I had to produce the outcomes you see if there was any merit in this. Any deviation from the expected 100% convergence result would mean that my hypothesis and proof had no merit.

https://zenodo.org/records/18991649

So far I've done a pretty good job of proving even my own biggest skeptic (me) wrong on this. I want to be wrong. So far Fano Planes and Heawood Graphs are the two constructs I've proven out as deterministic runtime constructs that I can build for a [growing] number of multi-agent systems for variety of niche use cases.

So far -> Logistics, freight, territory and route design, forecasting over multiple dimensions, what if scenarios (most powerful one -> build one DMC runtime and simulate 100000s of other possible outcomes and experimental runs)

Next stage is something called "Carl" that I'm working on at carl.camp which is coherence aware reinforcement learning. Basically model learns to fine tune itself and adapt to the harness itself. So when i put carl inside hermes agent, it becomes hermes agent over time. If i put into claude code, it becomes more like claude over time. Opencode same thing. Each user and each app's model itself becomes distinct over time all on its own. While maintaining the same baseline backbone and identity.

[-]

ale007xd@reddit

This is simply wonderful! Thank you so much! We see at least four possible paths for further development of the project, and there are real applications for our immediate tasks. I will follow the development of your project with interest! 🫶

[-]

_supert_@reddit

I feel a bit of contraction mapping theorem coming on.

Seriously though there are no guarantees in this game.

[-]

ale007xd@reddit

Yeah, I get the joke — contraction mapping is probably the right level of abstraction for how far we can push formalism here 🙂

But I think the “no guarantees” framing is doing a bit more work than it should.

Because there’s a difference between:

“we can’t fully guarantee convergence” and
“we have no structural constraints on the system”

Most practical agent systems sit somewhere in between — not provably correct, but also not purely unconstrained feedback loops.

What I’m really curious about (and what the original drift discussion was about) is not global guarantees, but local structural invariants:

What parts of the system are intentionally non-leaky over time?

Memory? Skill evolution? Transition logic itself?

If everything is pushed into post-hoc correction + heuristics, then you’re essentially relying on empirical stability of a closed loop — which can work, but tends to behave very differently once you scale interactions and subagents.

So I agree with the spirit — no formal guarantees — but I think the interesting design space is what constraints survive even without them.

[-]

snowdrone@reddit

A straightforward mechanism would be to define a loss function computed at each step and see if the loss is increasing or decreasing for each progressive step. Bail out if the loss is increasing for X consecutive steps, etc.

[-]

ale007xd@reddit

That’s a neat idea, and it makes sense in settings where you can define a stable objective.

I think the tricky part with agents like this is that it’s not obvious what the loss should actually represent.

In many cases:

the objective is non-stationary (user intent shifts)
multiple criteria compete (usefulness vs safety vs correctness)
and intermediate steps can look “worse” while still being on the right trajectory

So you end up with a signal that’s useful for monitoring, but hard to rely on for controlling behavior.

Also, this approach is inherently reactive — you detect that things are going off track after the transition has already happened.

The question I keep coming back to is whether we can constrain the transitions themselves (even partially), rather than only evaluating trajectories after the fact.

Feels like loss-based signals are great for feedback, but might not be sufficient as the primary control mechanism for long-running agents.

[-]

snowdrone@reddit

It sounds like you are more interested in system stability and not so much a precise loss function, you can look at things such as absolute values of bits changed in the system, for example, a huge deltas in documents fetched or network traffic, metadata changed etc.

[-]

ale007xd@reddit

I think this is drifting away from the actual question I raised.

Measuring “absolute deltas” in memory, network traffic, or metadata is an instrumentation/telemetry idea, not a control mechanism for the state transition function.

It doesn’t address the core issue I’m pointing at: you can have small changes with large semantic drift, or large changes with perfectly valid state updates. So magnitude of change is not a proxy for correctness, stability, or policy preservation.

So what you’re describing is essentially:

«“Let’s observe how much the system moves”»

But my question is about:

«“How do we constrain where it is allowed to move in the first place?”»

Those are fundamentally different problems.

If there is no structure on δ(S, E) — no invariants, no admissible transition rules, no validation of state updates beyond their size — then you are not stabilizing the system, you are just measuring its volatility after the fact.

That’s why loss signals and delta tracking feel similar in spirit here: both are post-hoc signals. My concern is specifically about the lack of pre-transition or transition-level constraints.

So I’m not really looking for “stronger observability of change”. I’m asking whether there is any notion of restricted state space or validated transitions in the design at all, or if everything is implicitly learned/heuristic.

That distinction is pretty important for systems that self-modify over long horizons.

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

DependentSlow2850@reddit

Refute to evomap? There is some drama there but I looked and was not super concerneded

[-]

night0x63@reddit

What did you intend for Hermes primary/secondary/etc usage? What do the three users use it for primary/secondary/etc?

(Is it just another coding agent?)

[-]

steezkelly@reddit

Love what you guys do and as I'm actively working on this skill: https://github.com/NousResearch/hermes-agent-self-evolution, which is really key to the whole Hermes design of the skills getting better and better, yet I don't see much active development on this in about a month and a half.

Is this already integrated into Hermes, or is this a side-project that will eventually see movement again? My agent is currently doing its best to integrate the Phases, currently at Phase 3.

[-]

Fluffyjockburns@reddit

what is the latest on claude max subscription usage without API/Extra credits nonsense?

[-]

my_name_isnt_clever@reddit

That's not on Nous, complain to Anthropic.

[-]

Fluffyjockburns@reddit

Understood. I wanted to get the latest tho. Codex here I come!

[-]

Evening-Fox9785@reddit

what’s the recommended way to organize a self evolving knowledge base? i’ve tried using the /llm-wiki skill, but it still collapses after a while

i’ve noticed models like to create a lot of .md files but seldom clean them up or refresh stale findings

[-]

emozilla@reddit (OP)

It boils down to this: Anthropic *does* allow Hermes Agent, but only through Claude Max and it will count as Extra Usage. Anything else is not allowed/supported

[-]

Merchant_Lawrence@reddit

thanks for ama, what you think future model size especially hermes, are you try balance size+ performance or try find path on small but smart model? I really love to see in future some model that run on my 4 gb card.

[-]

Enough_Leopard3524@reddit

Why does Hermes get developed in 4 weeks and have a massive marketing campaign for a shitty terminal tool?

[-]

wentallout@reddit

Do you think we will reach a point where LLM pricing become too much for normal users? I'm feeling a dead end for the future of AI coding

[-]

Pleasant-Shallot-707@reddit

the future is getting expensive and for AI coding to survive 1) dev pipelines need to be decomposed into easily sub-agented workflows that are small enough for a local LLM to manage. 2) projects need to get really strong governance gates to ensure manifests and semantic headers exist for modules so the LLM can easily understand what a module does and how to interact with it without reading all the code to figure it out. This will save a lot on token costs. 3) Plans need to be really well developed ahead of time. 4) MCP tool loading needs to be placed under control, and probably needs to be delegated to a sub agent that is running a local model.

The orchestration needs to be maintained locally too, or subed out to the cheapest LLM possible. only the actual work that requires an expensive LLM should be sent out to them.

[-]

SennVacan@reddit

Would love a unique multi device work flow where my server would act as a central database that does the heavy lifting and on each of my devices i would have a lighter version of hermes agent installed access to my local files and it communicates to my server for changes.

No need to set it up on different devices, wi, only one. Setup workflow one time, use it with every device.

[-]

SennVacan@reddit

Can SSH into the server but the local foles wont be accessable

[-]

Moravec_Paradox@reddit

On Git, I notice the framing is kind of:

>Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.

That seems OK for unsecured hobbyist/personal use-cases but for enterprise users securing major infra behind a telegram whitelist seems risky. Do you see people using it as a system to command agents from managed system that don't have remote access as a major use-case? (I am just now checking out Hermes so apologies if the question is uninformed).

Also, minor feature request:

Both codex and claude allow ctrl + enter to drop my prompt to a new line without submitting it. I tend to do this very often by muscle memory but when in Hermes it is interpreted as and processed as a prompt before I finish. Support for that would make moving between claude, codex, and hermes easier for me.

[-]

Pleasant-Shallot-707@reddit

I feel like if an enterprise want to deploy this they would put some effort into setting up slack bot or some teams integration.

[-]

walden42@reddit

I have yet to use a self learning agent like claw or hermes due to paranoia about security. After you started this AMA I read your docs on security and see two container features: running the agent within docker, or just running the terminal backend in docker.

Can you comment on the practical difference and limitations between the two methods, and which you personally use (and trust) in a homelab setup?

[-]

_hephaestus@reddit

What are your preferred ways to access the Hermes Agent? Some methods seem to have more polish than others (right now whatsapp in the gateway docker image breaks since it takes >60s to install Baileys), what’s the optimal experience?

also are there ways of running multiple agents from the same running instance? I’m thinking of for example building an agent which is deeply familiar with the Pokemon VGC metagame, which requires skills for running calculations/in-depth meta info to be useful. I also want an agent to do development, another for general housekeeping for my selfhosted containers. Do I run 3 instances of Hermes Gateway or is there an easy way to switch who I’m talking to in a channel between personalities/contexts?

[-]

trashacct383@reddit

Any plans to integrate a Web UI for chat like Hermes WebUi? (https://github.com/nesquena/hermes-webui)

The dashboard is nice but having a Web UI that I can serve via reverse proxy and access anywhere is incredibly helpful. Security is a concern of course but being able to toggle between chats, change workspaces, etc, all in a web ui while retaining the full /command functionality has been very helpful.

[-]

thegreatmiyagi@reddit

This exists 😁

[-]

trashacct383@reddit

Please tell me more!

I have seen the dashboard but that is an admin panel. It isn’t functional as a chat interface.

[-]

phragg@reddit

`hermes dashboard --tui` 🤫😉

It's hidden behind a feature flag until we fleece out the new TUI quirks and how it sits in the dashboard, but yes please use it! We'd love some feedback!

[-]

_hephaestus@reddit

Is there auth for it? I thought the tui/dashboard looked great, but even on my lan I’d like it gated behind something

[-]

trashacct383@reddit

Thanks! I started playing with it. Has potential. Anyplace in particular you want the feedback posted? Just add it to github issues?

[-]

phragg@reddit

There or our discord 😃

[-]

Binoui@reddit

You can use open webui , it works pretty much out of the box

[-]

Il_Signor_Luigi@reddit

Seconded

[-]

thegreatmiyagi@reddit

What’s the five year roadmap?

[-]

thegreatmiyagi@reddit

Also, when will you trademark your IP 😘

[-]

emozilla@reddit (OP)

whose to say we haven't 😉

[-]

thegreatmiyagi@reddit

Next question, when will your phat lawsuit payout?

[-]

setthatonfire@reddit

lawsuit?

[-]

thegreatmiyagi@reddit

Oh yes another Hermes agent has been an ounce by a large company 😭

[-]

elkos@reddit

Which one

[-]

thegreatmiyagi@reddit

Follow-up: when you make the extraordinary @mr-r0b0t a Hermbassador, and have your winnings, can the arrangement be DGX⚡️as payment? It’s definitely not me btw, that guy is just really cool!

[-]

thegreatmiyagi@reddit

I hear that mr-r0b0t guy is pretty cool, any idea how we can convince NVIDIA to send him a DGX ⚡️?

[-]

EvolvingSoftware@reddit

Amazing product, with an explosion in the community and a humongous amount of PRs.

How are you handling the explosion of interest? What makes a PR pique your interest to be included?

[-]

my_name_isnt_clever@reddit

I'm loving hermes-agent, I've tried several similar frameworks but yours has stood out.

My main painpoint is trying to carefully manage local private models and cloud models. The pattern of delgating planning to a large frontier model then taking over from there using local has been working great, but it doesn't seem like Hermes differenciates between them.

I had a moment of confusion when I added an Openrouter API key only to be used very intentionally, then hours later I realized Hermes had been pushing full summary tasks to Gemini when my local endpoint was struggling. I see why that fallback system exists and I adjusted my config accordingly, but built in support to keep a clear local vs cloud boundry would be greatly appriciated.

[-]

NousResearch@reddit

Happy to continue to work towards supporting this more. Would love more feedback on how specifically you'd like to see that make its way in.

We also recently added a way to configure all auxiliary models more easily, inside `hermes model` command, you can find "Configure Auxiliary Models" to set each one individually

[-]

my_name_isnt_clever@reddit

The most straightforward method I've seen is to flag a provider as private/cloud and the ability to configure permissions for the two categories seperately. I would love the depth to restrict specifc tools and even directories to any queries from a cloud flagged provider.

The other approach that comes to mind from my current workflow is for the main agent to choose the provider and model for delegating, rather than using a copy of itself/the delegation config model. Or even some way for the main agent to scope the remote model to only access exactly what it will need for it's task.

Also huge shoutout to you guys for building out the nixOS module, it's made my setup a lot easier.

[-]

EvolvingSoftware@reddit

Something like being able to get a decision on if the local model could run this tool capably? Perhaps we need a way of working out limits on the local models capabilities so the decisioning is easier? There's a huge unlock if there was a way that Hermes could understand what resources it has available from different model providers and then optimise based on preferences. I'd love to run a local model to keep me updated on simple tasks, like monitoring Cron jobs and sending alerts, or have a local model as a backup to restart something if I've burned through all my cloud tokens.

[-]

Stitch10925@reddit

What have you tried? Did you try Goose by any chance?

[-]

Il_Signor_Luigi@reddit

I second this, a native model router would be helpful. I've heard of some external tools to do this but idk

[-]

Crust-of-Capital@reddit

How do you deal with a project that has 4500+ PRs? I see this happen whenever a project gets a bunch of github stars, and I can't help but worry that it will kill projects, either because the devs give up and stop merging anything because they can't see through the noise, or because they give up and merge everything and the codebase bloats and regresses all over the place.

Do we need to change something about opensource workflows and customs to make sure PRs are helpful, non-duplicative, and made with a clear understanding of the project's goals and needs?

[-]

MironV@reddit

Any plans for more built-in automation loops, similar to the heartbeat concept? There’s cron so you can definitely roll your own but there’s advantages to it being an inherent mode.

[-]

Stitch10925@reddit

I run up against this ALL THE TIME! A task that would take Hermes 10 mins to do on its own takes 30 mins because for every step it takes, it needs my feedback to continue, even if it's just "ok", but input something you must. No matter how many times I ask it to work autonomously.

[-]

setthatonfire@reddit

I added something to my agents.md that basically said like "if you can execute end to end on an issue do it. Only stop if there is a genuine blocker that needs human input. If you are given a task break it up into individual tasks, and spawn subagents as needed." It pretty much stopped it from waiting for me

[-]

Stitch10925@reddit

Oh, I love that, thanks!

[-]

setthatonfire@reddit

heres the exact languge i used:

## Default operating rule

If the path is clear, do the work and report after.

Do not make the user keep re-approving momentum.

## Stop only for

- destructive actions with real blast radius

- real product decisions

- conflicting requirements

- missing context that tools cannot retrieve

- any deletion of files, branches, data, or code until the user explicitly says ok via y/n

- any config edit until the user explicitly says ok via y/n

## Execution rule

- Bundle obvious next steps.

- Prefer action over recap.

- Use tools instead of describing intentions.

- Verify claims with real output before finalizing.

- Do not narrate obvious work you can just do.

- Do not stop at soft handoff language when the next safe step is clear.

[-]

Stitch10925@reddit

Thanks for sharing! I will try it out

[-]

minimax_zed@reddit

hermes invoking itself to do bigger projects during idle times on your local compute would be so cool

[-]

darkinsp@reddit

What local llm model one can run on their gaming gpu like 5070ti etc and use hermes with enough context and agent ability to run at a good speed ?

[-]

addiktion@reddit

Is there a way to programmatically configure Hermes? I ask because I've been building software around it so I can spin it up without having to walk through interactive configuration over and over again. I was looking through the docs and didn't see any specific mentions but maybe its escaping me.

[-]

tvmaly@reddit

If you were to start over from scratch, what would you change with Hermes?

[-]

FrostByghte@reddit

I’m trying to understand Hermes Agent and agents in general, at the core level. The landscape is moving at an insane pace.

A lot of agent frameworks can now do some mix of tools, workflows, memory/state, planning, and model orchestration, especially when they’re all using similar frontier models underneath. The simple fact is, point a recent frontier model at an idea and tell it to make it happen is getting very common. So I’m curious what you see as Hermes Agent’s real differentiator.

Ultimately, what is the endgame for Hermes Agent and what are the design principles that would lead a user to invest time into Hermes Agent vs Agent XYZ?

What is the guiding philosophy behind the project? What's the underlying foundation and ethos behind the project?

Thank you.

[-]

alt-glitch@reddit

Hermes Agent is built to keep learning and growing as an agent (via skills/memories). There's a number design decisions like;

- It being written in Python and being extensible for RL environments, evals and post-training, research experiments.
- A lot of care into what skills, tools, plugins we maintain. That being said, every extra tool has to prove it's worth otherwise it doesn't get added.
- First class support for being able to use it from wherever via our gateways.
- Ability to decouple the agent loop from it's execution -- ie, your agent loop can be on your machine but it's execution environment can be someplace else (docker, modal, remote server)

Re: comparison with Pi or other agents; I personally hate what I call "configslop" with agents that you have to build yourself.

[-]

rob417@reddit

Would you mind explaining what configslop refers to?

[-]

RegularRaptor@reddit

I have never heard the term myself, but I am pretty sure Pi-agent is purposely very lean.

It doesn't have common things like sub-agent delegation or even a /plan mode. It does that to keep down the bloat that tools like ClaudeCode have, where they are constantly adding features you don't need or even know exist.

So, it is the polar opposite of that, and its selling point is that you just "ask it to add whatever you need," and it does.

That sounds like such a beautiful idea, but I am sure you can imagine how that could go south quickly, just because one person can't think of every single edge case in real time. I have no idea if that is what he meant, but that is my guess.

[-]

rob417@reddit

Yeah. From my limited experience with pi agent, it seems to work quite well. It's system prompt seems very well-written that tool calls succeed quite frequently right out of the box. On the other hand, I've never been able to get qwen3.6 to use glob correctly in OpenCode even though it's supposed to be much beefier.

I think the type of configslop we're discussing here can be controlled as long as we only add the tools and extensions we absolutely need to pi.

[-]

CordedMink2@reddit

Why are so many tools/skills enabled by default? Doesn’t it introduce a lot of context rot? I feel like 90% of the skills should be enabled by default, then there’s some sort of method to enable them

[-]

Apart_Boat9666@reddit

Can you explain how hermes improve, wasn't able to get any detailed explanation on yt.

[-]

nunodonato@reddit

I second this. Especially when you take into consideration Pi, which basically lets you build your own agent with whatever you need and remove what you don't.

[-]

Cat5edope@reddit

When will Hermes make me a zillion dollar saas no mistakes?

[-]

nopanolator@reddit

Help us to debloat hermes for full governance. The gigantic Py files are unpractical to rewrite and almost only commented by Opus \^\^

[-]

_hephaestus@reddit

What are your preferred ways to access the Hermes Agent? Some methods seem to have more polish than others (right now whatsapp in the gateway docker image breaks since it takes >60s to install Baileys), what’s the optimal experience?

also are there ways of running multiple agents from the same running instance? I’m thinking of for example building an agent which is deeply familiar with the Pokemon VGC metagame, which requires skills for running calculations/in-depth meta info to be useful. I also want an agent to do development, another for general housekeeping for my selfhosted containers. Do I run 3 instances of Hermes Gateway or is there an easy way to switch who I’m talking to in a channel between personalities/contexts?

[-]

russjr08@reddit

Are there any plans to have the ability to gate certain tools behind an approval?

Lets say I want to have a tool that calls out to an external service, for the sake of an extreme example, a tool that can drop a (hopefully test) database - right now it seems if I make that tool available to the agent, it can just decide "Yeah, that seems like a good use case" and automatically run it without me prompting it to do so (or even a confirmation step).

You could in theory try to encode some instructions into the agent's docs that say "Don't run this tool without explicit permission first", but that obviously depends on the model actually following that - I'd much rather have it enforced by the harness itself in a deterministic way, in the same way that terminal commands can be flagged as needing approval first.

Also, on a different idea, is there a way to "pin" a cron task's output to a specific session? I have a cron that summarizes and delivers news to me, but because it runs in isolation there is no way for me to ask the agent for feedback without it having to re-retrieve the details in an actual "live" session.

I can see why they're isolated, to prevent context pollution for example, but being able to follow up on the results of a cron, or even for the agent to say "This pairs up with this news article from yesterday" if I had it post to a dedicated news channel in Discord perhaps. I suppose it might be possible to somewhat do this by having the agent create a log file of sorts, and then pair that with a skill / instructions that tell it to reference it if I ask about it (or during the cron run itself) but that seems to me like I'd be fighting against the isolation pattern at that point.

[-]

marscarsrars@reddit

Legends.

[-]

lolwutdo@reddit

Any plans on implementing something like a heart beat? Would be nice to have Hermes agent actively message me on its own without being prompted.

[-]

ilt1@reddit

Hermes doesn't have cronjobs?

[-]

lolwutdo@reddit

It does but it's not really the same as how heartbeats work in openclaw

[-]

Thrumpwart@reddit

I’m looking at using Hermes-Agent to populate a Tolaria knowledge base. Is this something people have done and what should I know?

[-]

letsgoiowa@reddit

What do you guys personally use it for?

And follow up: what does your family think of your research? Lol

[-]

HongPong@reddit

to follow some of the other comments here what is the deal with sandboxing these agents? that's gotta get streamlined it seems

[-]

Tommonen@reddit

Are you guys into hermeticism? Looking at the conpany and product name

[-]

Super_Piano8278@reddit

Hi team, I have a question.

For most people, the primary need is getting answers to questions, which tools like ChatGPT or Claude already handle reliably. For coding-related work, many users rely on coding agents.

Persistent agents like Hermes or OpenClaw seem to require significantly more effort: setup complexity, infrastructure, hosting, security considerations, and ongoing maintenance. Most users do not have the resources or technical comfort to self-host these systems. Even if they manage to set them up, running a local model can be expensive or impractical, and API usage introduces recurring costs that many people may not want to pay for.

Because of this, I struggle to see widespread day-to-day adoption among average users.

My question is: what is the practical benefit or long-term vision behind building and using persistent agents like Hermes or OpenClaw? In what real-world scenarios do they provide value that justifies the additional complexity compared to standard chat assistants or coding agents?

[-]

CabelloNasal@reddit

Hi guys!
First of all thanks for the project, it has been a mindblowing journey and we are just starting. I am really enjoying the ride and learning a lot.

One of my dreams is being able to wear and talk my hermes agent anywhere. Feeling it being autonomous and always predictive and audible. Almost like a 'Her' (the movie) experience. Is there any work in progress towards this situation?

I find myself a lot using TTS while driving or doing something else, commanding somthing to my Hermes or asking it to explain or synthesize somthing for me, giving feedback or whatever. And I feel that the autonomous work like procedural memory and analysis and always reading your psychological profile en habits and tracking how to improve is the best part of it being an agent. So yeha, it would be thrilling to achieve that point where Hermes feels like an extension of myself.

Thanks again for your work, I am so hyped with the road ahead.

[-]

noctrex@reddit

Are you planing on releasing a new version of the models Hermes-4 or NousCoder?

[-]

Stitch10925@reddit

I... never realised they had their own models...

[-]

ObsidianNix@reddit

That’s how they started. Hermes4 is their latest model but they haven’t trained it with tool calling. They recommend other models.

[-]

Alan_Silva_TI@reddit

Do you have any plans to release a Windows version? I know it can be run through WSL, I use Hermes that way, but a native version would be great.

Are there any plans for a webapp interface? I really like the TUI, but there are still some things that might be easier to handle in a web-based UI.

[-]

TheRealMasonMac@reddit

Hermes 5 when. (And are any new datasets coming?)

[-]

firejava@reddit

How does an open-first lab like yours sustain itself long term—what’s the viable business model without closing off your models?

[-]

firejava@reddit

Do you see open-source AI models ultimately winning out over closed models in the long term?

If so, what do you think is the tipping point—and if not, where do closed models maintain a durable advantage?

[-]

firejava@reddit

A lot of AI progress right now is driving huge data center builds across the country—power, water, land use, etc.

Since you’re working on decentralized training (DisTrO), do you see those mega data centers as unavoidable, or something fundamentally inefficient we should move away from?

And bluntly—would you be okay with one being built next to your home?

[-]

FlamingoTrick1285@reddit

Why all the bot spam?

[-]

JLeonsarmiento@reddit

Hermes agent is amazing, congrats for the good work. I’m killing it with Qwen3.6MoE on it.

Any plans to fine tune a MoE like this, or even smaller/dense to get better at “Hermes-ing” or is it too much work/not needed?

[-]

sollymay@reddit

Are you working on improving reliability of tool calling when using local models?

[-]

PracticlySpeaking@reddit

Great question – I am aiming for an all-local setup, if it can work for my use cases.

[-]

minimax_zed@reddit

what's your hardware?

[-]

PracticlySpeaking@reddit

Mac Studio 256GB

If you didn't see my other comment... just started running minimax M2.5. Overall it has been going well.

The model is a bit 'eager 10th grader'. This is a transition from GPT-5.x, dunno if a lot of memory and skills from that are a factor.

[-]

emozilla@reddit (OP)

Always! The tool calling prompting in the backend has been very carefully crafted but in the end it's sort of one of those "you can lead a horse to water but you can't make it drink" things, and the model itself needs to be trained to do the calls. FWIW we are working on the model side too, hopefully have more on this soon

[-]

rob417@reddit

Maybe have multiple tool calling prompts depending on the model? Comprehensive tool prompts if the user is using large frontier models. Concise, targeted prompts if the user is using local models around 30B.

[-]

FyreKZ@reddit

At a certain point it's the fault of the model unfortunately. Only so much the Nous team can do.

[-]

minimax_zed@reddit

yes and no, there's optimization techniques like DPO that are really effective at adjusting tool calling behavior while being cheaper than finetuning (which is already much cheaper than training)

[-]

seamonn@reddit

Model Skill Issue*

[-]

Emergency_Union7099@reddit

How do you think Hermes supports a data analytics or data scienctist person working on their projects with numerous but smaller datasets?

[-]

Sad-Enthusiastic@reddit

what are the top 5 use cases you see implemented using Hermes?

[-]

NousResearch@reddit

First is coding

Second is research and news gathering, especially cronjob related reports

Third is monitoring for systems and anomolies, our backend team uses it daily to analyze bugs and data issues in Nous Portal for instance

Fourth is general assistant work (like admin assistant) - Memo writing, dealing with emails, etc

Fifth is a huge array of more niche tasks - creating art, aligning your expense reports, research assistant, acting as an interface for a server/machine, our post training team uses it to build RL environments and analyze data etc.

[-]

chimph@reddit

I understand using an agent to code its own tools/apps but for general coding it seems to me that it’s best to use a harness that’s designed for it such as opencode?

[-]

Hodler-mane@reddit

those harnesses will always be #1 for coding, but hermes/openclaw opens up coding to everyone else who don't know how to use those.

[-]

sarcasmguy1@reddit

For coding do you need to do anything specific to set it up, or is simply point it at a workspace and ask it to work on a specific repo?

[-]

alt-glitch@reddit

My favorite one is my own Hermes Agent that mirrors our GitHub repo and uses it to cluster similar PRs and issues; deduplicating them and tagging each with relevant labels and context.

Helps us merge PRs faster 😉

[-]

No_Fix_4658@reddit

Oooh! upppp, hopefully gets answered

[-]

Certain-Way6763@reddit

Have been using Hermes since the first release (2 months already I guess?), and I am nore than happy with it. But every time I try to explain it and show to my less technical/busier with their ordinary life friends they fell off at the very first step - using CLI, setting up telegram bot etc. Do you plan to release some kind of more consumer friendly app? I will be the evangelist!

[-]

reatafort@reddit

What is your plan to address the security vulnerabilities people reported on Twitter?

[-]

Hodler-mane@reddit

I switched from openclaw to hermes and hermes was just so much easier to set up, ran faster, responded faster, and was just overall more stable.

can we please, please find a way to disable showing the tool calling inside discord. ive tried every option, it doesnt exist!

[-]

ready_to_fuck_yeahh@reddit

Not questions but two suggestions:

There is a known issue where HA override manual instructions while upgrading its knowledge base, which is prone to hallucination, two set of skills can be implemented one fixed by user and second self improvement but following strict parameter set by user.

One additional feature i would like to suggest is skill updates based on user feedback rather than self feedback, It may present set of reasoning it used to user and if user is satisfied than save it, else user may edit and then save it.

[-]

alt-glitch@reddit

This is valid! Will look into it actually.

https://github.com/NousResearch/hermes-agent/issues/17583

Opened an issue tracker for this

[-]

PracticlySpeaking@reddit

+1 - skill updates based on user feedback

[-]

minimax_zed@reddit

Can I ask what model you're running?

[-]

setthatonfire@reddit

love Hermes. I switched from OpenClaw and am a huge fan. As Portal/OpenRouter/provider routing grows, are you thinking about stronger trust controls? I.e. zero data retention where supported, provider allowlists/blocklists, no-fallback mode, and a per run receipt showing model/provider/tool/cost/data-egress?

Also, longer term, do you see an opt-in trace/eval loop where users can share sanitized Hermes traces, tool-use failures, or recovery examples to improve Hermes models and the harness?

[-]

alt-glitch@reddit

I'm literally speccing that out as I write this. Expect more work on this in the next few released.
There's definitely benefits to that, but telemetry is tricky and we wanna get it right before we even add that option in. You can always run experiments with your own sessions to see what kind of skills, memories, optimisations can be made. An auto-research loop might be interesting if you talk to hermes agent a lot!

[-]

setthatonfire@reddit

awesome. It would be great if we could just specify the models that have zdr or just add a flag for them. For the self evolution skill is the plan to add the ability to optimize the harness based on the traces? I see on the roadmap tools/prompts are on it but not sure when those will be implemented. Would love something like metaharness/betterharness for hermes as a skill.

Any tips for using autoresearch to optimize? Prompt wise it would require some sort of outside LLM judge, which autoresearch doesn't allow for easily.

[-]

trashacct383@reddit

Skill creation seems incredibly aggressive, causing a lot of skill bloat. My agent often makes a new skill for one-off tasks. Any advice on getting the agent to be more discerning in what warrants a new skill?

[-]

pmttyji@reddit

I have no questions, but thanks for this AMA. I see so many good questions here, I'm bookmarking this thread for later after start using Hermes Agent. Thanks for the contributions.

[-]

_derpiii_@reddit

What is each members favorite agentic loop/workflow?

[-]

amchaudhry@reddit

Will there be a web UI for configuration or is Hermes meant to be CLI always? The ONE thing I miss about openclaw is the web gateway to manage things. Thx and great work!

[-]

NousResearch@reddit

There is partial access to configuration through our existing web ui - run `hermes dashboard` to access it.

We plan on expanding that to fully support

[-]

Stitch10925@reddit

Dashboard? That's new to me

[-]

Sicarius_The_First@reddit

Will you tune the new Mistral-Medium-3.5 in ChatML? :3

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B

[-]

DeepOrangeSky@reddit

Was about to ask the exact same thing, since they were famous for fine-tuning the Llama 405b dense model.

Would be pretty cool if they did a finetune of this new big Mistral 128b dense model. Pretty exciting release for people who still like the big dense models. They are way better at certain things than similar sized (or even significantly larger) MoE models. If people are just going by benchmarks, they are missing out, there are things the big dense models are way better at than anything else, even nowadays, even with fairly old big dense models.

Anyway, yea I hope they try a fine-tune on this new one, that would be really cool.

[-]

Sicarius_The_First@reddit

(Hermes 128B dense would be very nice)

[-]

smolpotat0_x@reddit

which auxiliary features do you recommend paying attention to aside from vision and compression and what models do you recommend for each?

[-]

minimax_zed@reddit

keep meaning to send a pr to fix this but if you use a reasoning model for title generation it will often time out and you'll get no session titles

[-]

No_Conversation9561@reddit

a hermes model from you guys specifically made for hermes agent would be dope

[-]

PracticlySpeaking@reddit

Cheers 🎉🎉 to the Hermes-Agent v0.11.0 release — That was a big one!

What are some things we can look forward to?

[-]

NousResearch@reddit

We just merged the Curator: https://github.com/NousResearch/hermes-agent/pull/17277
This will consolidate and prune skills and track skill use automatically

We are working on Computer Use (for all models): https://github.com/NousResearch/hermes-agent/pull/16936

Multi-Agent coordinations: https://github.com/NousResearch/hermes-agent/issues/16102

GUI Chat and many more things!

[-]

PracticlySpeaking@reddit

Multi-Agent will be huge, thanks.

Hoping that is multi-model as well, with best agent+model pairing.

[-]

minimax_zed@reddit

Any plans to add deeper support to Hermes for integrating with language servers? Obviously there's the MCP route but there's plenty of ways the agent harness can make things easier for both the user and the model

[-]

-Django@reddit

What inspired you to create Worldsim? Do you have any other simulation-related plans or ideas?

[-]

lacerating_aura@reddit

From my recent usage, as a first time agent user, I see that Hermes has vision separated from core model. Wouldnt it make more sense to detect or set that core model is multimodal and allow it to visually inspect "vision tasks" rather than asking some other vision capable model to summarize the content, which I know can still be pointed to the core model but might introduce context checkpointing issues with llama.cpp?

[-]

minimax_zed@reddit

This finally got fixed just like two days ago!

[-]

NousResearch@reddit

We recently did exactly that - vision routes to native capability first, unless otherwise set or unavailable!

[-]

intellidumb@reddit

Do you leverage A2A? Are you using spec v1.0.0?

[-]

DingyAtoll@reddit

Is there any need for expertise or help in some places? I would love to get involved and donate my time where possible

[-]

minimax_zed@reddit

i'd say get on the discord and help people out or find and fix bugs -- they are amazingly responsive on PRs given how many they get

[-]

alphatrad@reddit

What was your teams motivation behind Hermes when you started? And what's your vision looking forward for Hermes agent?

[-]

NousResearch@reddit

We started it to do Agentic RL 😄

[-]

sarcasmguy1@reddit

What coding agent harness do you use for coding within hermes, ie what does hermes use? Is it custom or something like Pi?

[-]

NousResearch@reddit

We use almost exclusively Hermes Agent for developing Hermes Agent

[-]

Ononimos@reddit

I’m not confident that my Hermes is learning my projects and growing with me. Not unless I’m handling that evolution myself.

Is there a way I should be leveraging Hermes to maximize this?

[-]

NousResearch@reddit

Better models self improve better, so might consider changing models

[-]

minimax_zed@reddit

Install one of the more sophisticated memory plugins. If you want zero-dependency pure-local, then `hermes memory setup holographic` will do everything you need. There are a few other self-hostable options but none as easy to start with.

[-]

Emerald-Bedrock44@reddit

Hermes Agent is solid for local setups, but the real problem nobody's talking about is what happens when these things start chaining calls autonomously. You need visibility into what the agent's actually deciding, not just the final output.

[-]

minimax_zed@reddit

/verbose

[-]

Pleasant-Shallot-707@reddit

You do have that visibility

[-]

semero@reddit

Any plan for native support for local LLM apps like LM Studio? A provide option with native support for the LM Studio API and it's MCP server would be awesome .

[-]

Ytocin@reddit

What is the most interesting self-developed skill/memory you have seen a Hermes agent create?

[-]

NousResearch@reddit

The hermes-agent-dev skill that powers all of my PR Review, development and planning for Hermes Agent 😄

- teknium

[-]

LatentSpacer@reddit

Any plans for image/video models?

[-]

NousResearch@reddit

We have a variety of image gen model tooling and providers

Video yes, eventually!

[-]

silhouettes_of_joy@reddit

Is Hermes agent also helping you manage this thread? would be really cool if it could tell people if you have answered their question while responding to someone else.

[-]

snowdrone@reddit

Hi thanks for the AMA. What's the deal with Anthropic reportedly switching to API billing if Hermes.md is anywhere within a repo's commit history? That sounded like a pretty crazy stance on their part.

[-]

FilthyCasual2k17@reddit

Hey there. Big fan, been using for a while, among other harnesses testing out, def appreciate the lightness of it. How are you guys looking to solve the potential issue if other model providers switch away from letting subscriptions models (I use it on Codex atm like most people, since Claude is uncertain about it's sub use), and I'm constantly worried that if that gets taken away the API billing over something like open router or some other place will just be too much for me. Do you have any plans to perhaps make a more specialized versions for locally run LLMs that produce much much lower token output due to running on commercial grade GPUs to make them a bit more usable?

[-]

NousResearch@reddit

We are working across the industry to make models of all sizes work better in Hermes Agent.

The self improvement loop also helps with efficiency - once it tries a lot of things to solve a task and solves it, it never has to do that exploration again

[-]

MoodRevolutionary748@reddit

Any recommendations for CPU only or igpu models to use with Hermes agent? I'm using qwen3.6 35b but especially at research tasks it's slow AF.

[-]

NousResearch@reddit

I think this subreddit would have better answer for you then we might \^_\^

[-]

artemgetman@reddit

Do you guys think it’s best to run Hermes on vps or Mac mini, why?

[-]

NousResearch@reddit

Depends what you want out of the agent.

There are some mac specific skills that allow greater access to things like your iMsg and apple notes, vps is more secure, etc

[-]

rob417@reddit

Hermes sits at one end of the philosophical spectrum for agent harness, where a lot of tools, skills, capabilities come built-in, and user can connect to it 24/7. On the opposite end you have things like pi, which ships with the bare minimums of an agent harness.

What are your thoughts on this divide in harness design philosophy? Do you see them converging in the future?

I don't think this is a divide between targeting laymen vs coders. Both tools are for coders and power users at the moment because they require familiarity with a CLI and basically coding knowledge to set up properly. On a related note, when do you think we'll see an agent harness where setting up new tools and skills are as simple as stacking lego bricks?

[-]

34574rd@reddit

any planned model releases this time?

[-]

NousResearch@reddit

We are exploring the best pathways to make existing models much stronger in Hermes Agent

[-]

KeepEarthComfortable@reddit

Also curious how your fine tuned Hermes models fit into the agent? Even from the nous portal I see I can’t use them.

[-]

TexBluBoy@reddit

EVO-X2 128gb / Strix Halo user here. I have my system memory optimized with amdgpu.gttsize=114688 and Dedicated VRAM (The "BIOS Carve-out") Current Setting: 2,048 MiB (2 GB)

What is a recommended LLM to use with Heremes?

[-]

minimax_zed@reddit

https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF is the best thing I've tried that was fast.

The newer dense model https://huggingface.co/unsloth/Qwen3.6-27B-GGUF is amazing quality for its size, but dense == slower ofc.

I'm working on llama-swap and llama-server settings to be able to run both at the same time for different kinds of work, think I should be able to squeeze in a 4B or 9B model for quick subtasks too.

[-]

MerePotato@reddit

Hey guys, not a question about your research per se but I love your unique aesthetic, what inspired you to take such a radically different approach to other labs in that area? Also any plans to open source your Nous Chat orb UI? Its a really cool frontend.

[-]

AkiDenim@reddit

Is it an interest that you guys will create a desktop App that handles Hermes Agent, so that there might be better UX in terms of getting less acquainted users into personal agents?

OpenClaw had a Web UI, but it was never as polished. Hermes agent could take a notch up in UI / UX in the WebUI or even have a dedicated app.

Also, are you guys potentially thinking about a "click once, pay some bills, and forget about setting anything up" pathway for setting up Hermes agent - through partnerships with other parties or VPS companies?

[-]

NousResearch@reddit

Yes to question one
To question two, we already have a lot of the foundation for that with Nous Portal/Tool Gateway, all that's left is hosting the agent itself as well, and is on the roadmap

[-]

AkiDenim@reddit

Yes, i an acknowledged about the latter and am actually very grateful that those efforts are underway!

And a dedicated desktop app or WebUI , with good UI/UX planned is amazing! I’d love to collaborate too - what are the methods of collaboration to Nous Research and your work?

[-]

KeepEarthComfortable@reddit

This seems like a big opportunity. Even running from a home server.

[-]

AdFeisty5686@reddit

I'm using hermes agent with qwen3.6 27b, q4 k m, turboquant plus (q8_0 k, turbo 4 v). Quality are perfect but speed are not (2x 16gb quadro 5000 = 32gb + 32gb ram ecc)

Should i try another options? Like: qwen3.6 27b q3 k m? Or some other version of qwen 3.6 27b?

[-]

alt-glitch@reddit

Gemma 4 is quite capable as well if that's a model/architecture you'd like to explore.

[-]

AdFeisty5686@reddit

I wrote some middleware inject to core of HA, to create graph memory automatically. What do you think about built-in graph memory for HA, where end-user can see HA knowledge visually?

[-]

AdFeisty5686@reddit

Like this:

💭 Reasoning: The self-test passed! All three core functions work: 1. extract_and_store() - Successfully extracted entities and stored them 2. graph_retrieval() - Found seed nodes via vector similarity 3. summarize_context() - Generated structured markdown context Now let me update the memory with this new information about the graph memory module. Self-test successful! The module is running stably. 🏰 Hermes Graph Memory — Deployment completed Architecture implemented ┌─────────────────────────────────────────────┐ │ HERMES GRAPH MEMORY SYSTEM │ ├─────────────────────────────────────────────┤ │ │ │ LAYER 1: Conversation (ephemeral) │ │ └→ In-memory buffer, auto-clear per turn │ │ │ │ LAYER 2: Session Graph (7-day TTL) │ │ └→ Entity triplets + vector embeddings │ │ SQLite edges + Qdrant vectors │ │ │ │ LAYER 3: User Memory (persistent) │ │ └→ Promoted high-confidence facts │ │ Cross-session retrieval │ │ │ ├─────────────────────────────────────────────┤ │ STORAGE: │ │ • Qdrant: Vector similarity search │ │ • SQLite: Graph edges (S-P-O triplets) │ │ • LocalEmbedder: Hash-based projection │ │ (384-dim, L2-normalized) │ ├─────────────────────────────────────────────┤ │ RETRIEVAL: │ │ • Vector cosine similarity → seed nodes │ │ • N-hop BFS traversal → expanded context │ │ • Confidence-weighted ranking │ └─────────────────────────────────────────────┘ 3 Core functions implemented | Function | Purpose | Test Result | |---|---|---| | extract_and_store(text) | Entity extraction + triplets → push to graph | ✓ 8 entities, 1 edge | | graph_retrieval(query) | Vector search + N-hop traversal | ✓ 5 seed nodes found | | summarize_context() | Compress context into structured Markdown | ✓ 682 chars generated | Module components (hermes_graph_memory.py) * EntityNode — Graph node with label, properties, and confidence score * GraphEdge — Directed relationship (subject→predicate→object) * MemoryRecord — Layered record (conversation/session/user) * LocalEmbedder — 384-dimensional embedding, no external API required * QdrantStore — Vector store with in-memory fallback * GraphStore — SQLite persistence for graph edges * LocalExtractor — Entity extraction (LLM or rule-based) * HermesGraphMemory — Main manager with 3 core functions Self-test results ✅ Extracted: PERSON(5), PROJECT(1), TECHNOLOGY(1), CONCEPT(1) ✅ Stored: 4 session memories with entity links ✅ Retrieved: 5 seed nodes via vector similarity ✅ Summarized: Structured markdown context (682 chars) The module is ready to be integrated into the core system. The file is located at ~/.hermes/core/hermes_graph_memory.py.

[-]

arousedsquirel@reddit

Not a contribution but guys I love Hermes agent!

[-]

jacek2023@reddit

Any updates on Psyche or other crowdsourced / decentralized model training efforts?

[-]

shokuninstudio@reddit

The most common gripes people have with Hermes Agent and OpenClaw is that they are very slow and flaky with local models compared to OpenCode.

OpenCode does around 80-90% of what Hermes Agent and OpenClaw claim to do if a user configures OpenCode to use system tools and apps locally.

So with that in mind, what plans are these to make Hermes Agent very responsive and less demanding on resources?

[-]

Lucyan_xgt@reddit

Are you guys hiring?

[-]

NousResearch@reddit

Careers - NOUS RESEARCH take a look!

[-]

Lucyan_xgt@reddit

Thanks

[-]

sarcasmguy1@reddit

When building hermes, how much of the initial project was built using AI assisted coding? How did you go about planning out the design and architecture?

I see many people talking about how to build with an agent, but they focus on planning and specs, and rarely touch on how to make sure your agent designs a good system.

[-]

Vaguswarrior@reddit

What should an out of work/AI displaced knowledge worker do to take advantage of localllm to find employment?

[-]

hejj@reddit

Mostly just a "thank you" for releasing this open source with such thorough documentation.

[-]

thegreatmiyagi@reddit

When kanban?

[-]

greatwilt@reddit

I was impressed by the hermes models finetunned from llama. any plans to finetune any more recent architectures?

[-]

alt-glitch@reddit

Hermes Agent as a harness has definitely given us a lot more to explore, especially in agentic RL. I'd keep an eye out for some releases in the near future if I were you 😉

[-]

RottenPingu1@reddit

Thank you for doing an AMA. Makes the community healthier.

[-]

phragg@reddit

Appreciate this ❤️

[-]

Ok-Internal9317@reddit

At the moment the Hermes agents runs more like one agent working, will simultaneous agents system be invented for Hermes or that is not the priority at the moment? If so how will you plan to handle internal communication between agents (internal messaging or context sharing or something else)

[-]

NousResearch@reddit

Check out our plans for multi-agent/profile coordination here: https://github.com/NousResearch/hermes-agent/issues/16102

[-]

Il_Signor_Luigi@reddit

Maybe I'm dumb but is there a way to make MCP server loading "on demand" in Hermes? Telling the model hey, you have these MCPs available but lazy loading the tools + instructions only when that one specific MCP is needed instead of flooding the context window? I don't mean disabling tools permanently. Hope I made myself clear, thanks!

[-]

NousResearch@reddit

Changing the tools available breaks caching and pollutes context (i.e. the model turns in the live session often reference what tools are or aren't available, etc) - so any operation that changes, adds, or removes tools or tool definitions requires a new session

[-]

emozilla@reddit (OP)

absolutely clear, we had a call last night discussing adding exactly this!

[-]

Fedor_Doc@reddit

Do you use Hermes to assist its own development: a) for coding b) for research and planning?

What models do you find most comfortable to work with in these cases?

[-]

emozilla@reddit (OP)

Almost exclusively. I would say 95-99% of the development and research for Hemes Agent is done via Hermes Agent

The big closed models (Opus 4.7, GPT 5.5) are still the best, but models like Kimi-K2.6 are quite close

[-]

Fedor_Doc@reddit

Do you use the same model for research and coding? Do you feel significant difference in that regard?

How do you deal with paywalled articles? Do you download them in a specific folder for a model to read? Or do you share your credentials for model to be able to access articles during search?

I'm kinda afraid to do the latter, it's interesting if someone has already tried it :)

[-]

Sad-Enthusiastic@reddit

What are the key differentiators between your models and comparable ones?

[-]

NousResearch@reddit

Our last model is almost a year old, so, nothing right now \^_\^

[-]

Sad-Enthusiastic@reddit

What's your timeline for new model or version of the current ones?

[-]

Sad-Enthusiastic@reddit

How does your code maintenance/release strategy compares with OpenClaw's?

[-]

Pleasant-Shallot-707@reddit

They have dozens of commits every day and multiple feature announcements a week. They seemingly never sleep lol

[-]

NousResearch@reddit

I can confirm from the evidence I have that Teknium does not sleep and he is almost certainly a cyborg. Seriously, check his GitHub history, you wont find a consistent gap anywhere.

-Kainan

[-]

Ononimos@reddit

Do you recommend I build my own tools, plugins, and functions to support Hermes, or to always try to leverage Hermes’ built-in capabilities first?

[-]

alkibijad@reddit

Are you hiring? :D

[-]

NousResearch@reddit

Yes, https://nousresearch.com/careers

[-]

NousResearch@reddit

Careers - NOUS RESEARCH

[-]

productboy@reddit

Re: “being able to use Hermes wherever via your gateway”; what do you want the Hermes community to build to stress test this? Adjacent Q: what can teams in regulated industries [healthcare, finance…] do to help Hermes enter these industries as a first class system; i.e. SOC-2, FedRAMP?

[-]

Tommonen@reddit

What made you decide python instead rust etc that many other agent harnesses use?

[-]

alt-glitch@reddit

Python is a lot more extensible and lets us move very fast with how we extend Hermes Agent's capabilities as an agent and as a product.

Python's also one of the few languages and ecosystem which models of all sizes can write code, debug, and build quite well in.

aanddd the agent being in python helps a lot if you're a research lab because you can use it as a training harness. Hermes Agent hooks up nicely with RL environments!

[-]

Due-Year1465@reddit

Are there any plans to introduce human override to skills so Hermes stops rewriting them? (A skill gets automatically created, I change it, the model changes it again)

[-]

NousResearch@reddit

It's a good idea. I'll implement right now

[-]

Pleasant-Shallot-707@reddit

What are you doing to them? I hadn’t thought about doing that

[-]

Tommonen@reddit

What are some of the most import things that were not obvious, but made the system work a lot better.

[-]

NousResearch@reddit

Having an extremely strong foundation on coding tools/terminal operation tools

[-]

Evening-Fox9785@reddit

what’s the recommended way to organize a self evolving knowledge base? i’ve tried using the /llm-wiki skill, but it still collapses after a while

i’ve noticed models like to create a lot of .md files but seldom clean them up or refresh stale findings

[-]

NousResearch@reddit

We have just introduced the Curator system into hermes agent so that on a schedule (default 7 days) - the curator will consolidate and prune skills that get auto-generated. I think our built in skills system is the best system long term

Curator PR: https://github.com/NousResearch/hermes-agent/pull/17307

[-]

thegreatmiyagi@reddit

Why can I not tag @dill in photos 🥹

[-]

Gadobot3000@reddit

How do we strike the right balance of commercial models for certain skills versus local models for repeatable skills with budgets and performance? Any frameworks we can borrow ?

[-]

NousResearch@reddit

I think our auxiliary system works pretty well but will be expanded on more. Setting cheaper local models, like maybe Nvidia's new nemotron nano 3 as your vision model, gemini flash as your compression model, and your most powerful model as your main agent makes this make a lot of sense

[-]

KvAk_AKPlaysYT@reddit

Wb an internship? I cook similar things and like the stuff you guys put out :)

[-]

thegreatmiyagi@reddit

When does the ambassador program start? @mr-r0b0t would be a great one 😘🤣

[-]

OUT_OF_HOST_MEMORY@reddit

Do you consider Agents a natural path for LLM use for general use cases like QA and creative writing, or does the inherent bloat of the general coding focus weigh them down and bias them towards only performing well in those tasks? Is this something you consider during the development of Hermes Agent?

[-]

reddit_mini@reddit

When are you making full windows support?

[-]

emozilla@reddit (OP)

yeah like u/phragg said, it may come eventually but the complexity-vs-benefit tradeoff is sorta low when WSL2 works so well. I also use Windows and use HA via WSL2

[-]

phragg@reddit

Not a huge priority at the moment because WSL2 checks that box for us, I'm also a Windows user 😉

We will eventually have native support, but not at this moment.

[-]

fmedinaq@reddit

Hi there, are you considering use tailscale or another vpn like for expose the Hermes api to local network or remotely ?

[-]

emozilla@reddit (OP)

Yup a ton of us use tailscale to access our Hermes Agents

[-]

zumtru@reddit

Are we going to get any more extended theoretical writing from Nous Research?

I first found became interested in Nous because of that focus on reflective awareness in AI systems, I have been very excited by the concept of AI being able to embrace self-doubt and uncertainty instead of attempt to hide any "non-productive" reasoning. I've always admired the measures taken to not fall into effective altruism or consequentialism, operating as a very different type of AI lab compared to any of the other players.

A lot of the writing already published sits within the influence of theory-fiction from the CCRU, digital mythology, sinofuturism and reads like the output of an arts collective but with the intersection of deep AI research knowledge and experience which results in something super engaging. I would love to see more and know if I was right with the influences I felt within the work.

[-]

Important_Quote_1180@reddit

I used home made chat bots 7 years ago, then I came back last year and started using Clawdbot/molt/openclaw and have settled that the landscape needs a lot of harnesses right now. We need harnesses because the local LLMs need structure and prompts and handholding if your tasks are not simple repeats and dressed up Python scripts.

What is your guidance on how to make multiple agent harnesses blend into the background and make agents feel more like workers?

[-]

billy_booboo@reddit

How do you see a2a effecting the hermes ecosystem? From a similar vantage, I'd love to hear your persepective on trends in the braoder agentic architecture/design space?

[-]

phragg@reddit

We have a few PRs in the mix that should support this in the next(ish) release:
1. Kanban support: https://github.com/NousResearch/hermes-agent/issues/16102 2. Profile management: https://github.com/NousResearch/hermes-agent/pull/16419

[-]

nickludlam@reddit

It's been great seeing your continued open work in the LLM space. Do you think the most interesting work has shifted from fine tuning models like the Meta Llama series into agent harnesses, tool calling and prompt engineering? Or do you think the 'self improvement' loop could eventually mean modifying the model weights directly?

[-]

NousResearch@reddit

I think that models should be aligned to capabilities and mechanisms for expressing those capabilities, so having a harness to focus the model capabilities around makes far more sense

-teknium

[-]

Bridgestone68@reddit

What is your approach on people saying "Great concept, bad execution"? about Hermes?

[-]

No_Fix_4658@reddit

i want more context, what is bad execution about hermes? Genuine question, i am out of the loop

[-]

NousResearch@reddit

Haven't heard it so don't have an approach yet 😉

[-]

Lyuseefur@reddit

There's so much that I want to do to support / cooperate with Nous Research. I'm legit building something right now that would take advantage of local models, hosted (like Vast.ai) models and more. I'd love to know if there's any way to cooperate more with Nous Research to promote these things and to encourage more local AI adoption.

[-]

NousResearch@reddit

Join the Discord and build with us and help drive decisions around what direction Hermes Agent should move in 😄

https://discord.gg/NousResearch

[-]

Pleasant-Shallot-707@reddit

Are you considering tooling to compete with Claude design?

[-]

NousResearch@reddit

Not at this time

We did implement a claude-design skill recently though:

https://github.com/NousResearch/hermes-agent/pull/16358

[-]

PitifulRice6719@reddit

What was your initial motivation for starting to build Hermes? (why did it come to this world?)

[-]

emozilla@reddit (OP)

It was actually built as an internal tool to help us on our model research work. u/teknium-official wanted something to help automate some of the things the model team was doing. We open sourced it sort of unsure if it would have any use to anyone else. In retrospect I'm glad we did haha

[-]

sufficientzucchinitw@reddit

U guys hiring fe?

[-]

NousResearch@reddit

Careers - NOUS RESEARCH

[-]

jmakov@reddit

Would you say that the next steps are dedicated tooling like GitNexus for coding? In general, what would next steps be? I for once added locally a daily hermes update cron job and another that evaluates new models automatically (if it does a better job for a agent) Would be nice to have that oit of the box.

[-]

Felistoria@reddit

No questions, just a thank you. Been using Hermes for awhile now and the rate of improvement is absurd. Congrats on your accomplishment!

[-]

NousResearch@reddit

Thanks! I'll forward your gratitude. \^\^
-Kainan

[-]

ivanf@reddit

I want to thank the team for some amazing work. You are using Atropos to compress agent trajectories to train future Hermes models. As Hermes Agent handles more tasks autonomously, the training data for the next generation of models will be composed almost entirely of synthetic, agent-generated interactions. How do you prevent Model Collapse via Procedural Homogenization? Without a "Ground Truth Anchor" (like real human messiness or external formal verification in every loop), how do you ensure the model doesn't drift into an "optimal but inhuman" logic that eventually breaks when it encounters real-world entropy?

P.S as a laid off, current cab driver learning to program & creating/training SML/RAGs. Hermes has helped me so much, I cant explain in words. Again, thank you to the Team & all its contributors! Godspeed!

[-]

Odinthedoge@reddit

Hermes claw migrate

[-]

jeh101@reddit

Are there any women on the Nous team? Just curious

[-]

stackrainer@reddit

How far are you wanting to scale the distributed llm training program? Has that project taken a backseat to hermes agent? Is it conceivable that someone with a single 5090 could start donating resources to the project?

[-]

No_Fix_4658@reddit

Non-developer here — I drive Hermes alongside \~10 sibling agents (Operator, builders, code reviewers, postman) on a VPS, coordinated entirely through Discord and natural language. Never write code myself.

Question: v0.11 wires GPT-5.5 through Codex OAuth. For ChatGPT Plus subscribers, is that effectively free GPT-5.5 access for the agentic role, or is it still metered separately from the Plus subscription? Asking because I just got booted off Claude Max via the new "third-party apps draw from extra usage" policy and trying to figure out where my actual free lane is now.

[-]

pinnacle__@reddit

oh wow cool timing . i just joined the hermes agent subreddit and posted in the discord for the first time. really enjoying the agent, just trying to make the most of it without it being a "toy"