your daily driver stack, what's it look like? and why?

Posted by Pyrenaeda@reddit | LocalLLaMA | View on Reddit | 12 comments

What it says in the title, I'm interested in hearing what you all have landed on as a workable / useful stack for you.

Mine looks like this:

    back end inference servers - llama.cpp, vLLM
                       |
                       V
hermes-agent - cron jobs + OpenAI compatible endpoints
                       |
                       V
     home-grown web UI & iOS / Swift client

I landed on this for a couple reasons:

- I have test driven a bunch of the go-to front ends - Open WebUI, LobeHub, Libera Chat etc. Couldn't get behind them. Too many knobs and too many features. I don't mind lots of knobs but I don't want them in my chat UI. For that I'm looking for a slick and simple experience similar to ChatGPT and Claude UI (the chat side, not cowork). Plus I hate that they don't have good native mobile apps with streaming support. A slick mobile friendly experience is a must-have for me, and the solution of just dropping a shortcut to the mobile version of the web UI on my homescreen doesn't quite cut it.

- hermes-agent comes with a very nice and extensive packet of tools right out of the box, which really cuts down on the number of MCPs one needs. And cron jobs for agentic background work are great to have of course. I couldn't get behind using a messenger app as my primary "chat assistant" UI though for one main reason: it doesn't work for me to not be able to have multiple conversations running with an assistant at once and jump around between them.

So, that landed me where I am. couple of hermes-agent instances: one for background agentic work (for which I use one of the messenger apps as a control interface) and one as an AI assistant, that I interface with through my vibe coded POS-but-pretty web UI and iOS client using the hermes OpenAI compatible API.

How bout you all? OWUI + llama? straight hermes-agent / OpenClaw / etc? llama.cpp web UI and done? something more exotic / esoteric? rationale? lemme hear it.

[-]

ai_guy_nerd@reddit

The shift towards agentic backends is definitely where the real utility is. Most of the "chat" interfaces are just thin wrappers, but the real magic happens when the AI can actually trigger cron jobs or handle state outside of a single session.

A solid setup usually involves a mix of vLLM for the heavy lifting and a dedicated orchestrator for the logic. Some people use n8n or LangGraph, but for a more integrated experience, something like OpenClaw can work well for automating outreach or content pipelines in the background.

The biggest hurdle is always the "memory" piece. Moving away from simple chat history to a proper long-term memory file or vector DB changes the experience from a toy to a tool.

[-]

dlcsharp@reddit

100% agree with your takes on frontends. I went with a custom Python WebUI + Android wrapper app, taking inspiration from OpenWebUI and Claude Code in terms of features. I only implement what I think is relevant on my setup and my use cases, but I also like just messing around with it lol.

I've never used OpenClaw because I have no interest in using it, it's mostly just hype imo. If one day I find a real use case, I'll just implement a cron service and a few APIs in my system. I try to selfhost as much as I can, why would I want to talk with LLMs on Discord ?? It makes no sense to me as it goes agaisnt the point imo. Also, you lose custom UI, formatting etc..

Same goes for Hermes, though the "creates skills from experience" part could be interesting to replicate.

[-]

teleolurian@reddit

lm studio + gemma 26b + glm 4.7 + custom framework

[-]

FullOf_Bad_Ideas@reddit

Qwen 3.5 397B exl 3.5bpw in TabbyAPI at 262k ctx and OpenCode.

Sometimes OpenWebUI for simple chat. I don't use OpenClaw/Hermes agents yet, I'll wait till security is in the clear on them as there are too many Hermes/OpenClaw security failure stories.

[-]

Pyrenaeda@reddit (OP)

I’ve heard some OpenClaw security failure stories. Haven’t heard any yet involving hermes-agent. That’s not to dismiss the possibility, just to say in practice I haven’t heard of a case yet. Would be interested in reading about any cases you could point me to.

[-]

FullOf_Bad_Ideas@reddit

here you go - https://old.reddit.com/r/LocalLLaMA/comments/1sqqasu/hermes_just_mass_emailed_a_bunch_of_accounts_from/

[-]

Pyrenaeda@reddit (OP)

[-]

MrDwarf7@reddit

FYI: you can add a single bot to a new group on telegram and each is instanced - same as opening a new chat via TUI/tabs PLUS you can name the groups too!

[-]

Pyrenaeda@reddit (OP)

ya I messed around a little with doing telegram groups with the bot but it had a little more friction than what I like for my flow which amounts to "hit + button, start typing". Which is why I stopped going down that route - that and the fact that telegram even on desktop (Mac at least) keeps the conversation squished down to chat bubbles of fairly narrow width rather than being able to expand to use more of the screen real estate.

But if there's a trick to a two tap start of a new group with the bot ready for a conversation LMK, sure wouldn't turn it down!

[-]

Clean_Initial_9618@reddit

Adding Hermes agent bot to a group starts a new sessions ?

[-]

ttkciar@reddit

llama.cpp, with either bash scripts wrapping llama-completion or Perl / Python CLI scripts interfacing with llama-server's API.

I also have a Perl CGI script wrapping a Lucy index of Wikipedia, for RAG.

I have a handful of works-in-progress, including my Wikipedia-backed RAG inference, an Evol-Instruct implementation, a few different synthetic dataset generation projects, and a RAG-backed technical support IRC chatbot.

And most recently, a script which scans this subreddit and tries to detect rule-breaking posts.

[-]

Pyrenaeda@reddit (OP)

ooo. I like the sound of that last one. Particularly if it is oriented towards bot detection, of which we seem to have far too many these days. It gets old commenting and asking them for recipes for warm apple pie and such.