What frontend do you guys use?
Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 77 comments
I’m using vim lmao with a custom made plugin for completing text, so I was curious what yall use. Llama-server seems like a sensible default but it seems limited
Vusiwe@reddit
Oobabooga textgen webui
Napster3301@reddit
the "asks for frontend, gets coding agents" replies are the real answer. local llm community has collapsed to coding agents because thats the only space where local measurably wins vs frontier apis.
genuine question: anyone here actually daily-driving a local model for non-coding chat work? not "tried it once" but replaced chatgpt for general use?
Kapper_Bear@reddit
That's basically all I do with LLMs and I use LM Studio for that. It's simple enough for my simple uses. I have enough complexity running ComfyUI for image stuff. :)
Borkato@reddit (OP)
meth_priest@reddit
figured id pitch in with a project ive been working on for past 4 months.
been building a complex local RAG tool focused on code and document understanding. Basically as user you point it at any folder containing source code, docs, mixed repos — and it indexes and LLM retrieves across it. you pick whatever model you have running locally and it adapts around it (auto-detect based on model+hardware+kb).
hybrid search with a reranker on top, indexing rates each file based on your hardware so it doesnt just choke on larger codebases. supports 20+ programming languages and most common document formats so far. mostly built it because existing tools made too many assumptions about what kind of content you were throwing at them or what hardware you were running. this one tries not to.
still finishing it up — will post on this sub once the website and beta launch
Ill_Barber8709@reddit
I use Claude Code with Paul Hudson's agent and skills with Qwen3.6 27b because it works very well for me. I get the "MCP are useless, model knows how to do that stuff already" idea behind Pi, but I prefer good ol' deterministic code. And I don't think bothering on a few thousands of agentic tokens is worth the hassle if it means you have to give all of your trust to the model's capabilities.
Borkato@reddit (OP)
Link to the agent? 👀 I tried googling but didn’t see much
Ill_Barber8709@reddit
https://github.com/twostraws/Swift-Agent-Skills
https://github.com/twostraws/SwiftUI-Agent-Skill
TechnoByte_@reddit
Llama-server is more than enough tbh.
Fast, lightweight, built-in to llama.cpp.
I used Open WebUI before but I find it too bloated
a_beautiful_rhind@reddit
People spam all coding agents.
javasux@reddit
And what do you think agents are if not frontends?
relmny@reddit
that's, I guess, because many ppl here relate anything about LLMs to code.
LORD_CMDR_INTERNET@reddit
And so your contribution is?
Those tools are multimodal and can do any task, coding or otherwise
If you're thinking more like Claude Cowork then there's Eigent but what else?
ab2377@reddit
look at those down votes man! proves for most people in this sub most words are just jagons with no understanding hah! i will upvote this.
LORD_CMDR_INTERNET@reddit
lol who knows with reddit sometimes, I didn't even say anything controversial. guess i'd rather be right and downvoted than wrong and upvoted
a_beautiful_rhind@reddit
To me things like openwebui, sillytavern, mikupad are front ends. Agents are agents.
Awkward-Customer@reddit
OP is talking about using vim as their front end. Coding harnesses like Claude code definitely count in that context
a_beautiful_rhind@reddit
Sounds most like mikupad, tbh. Or they are doing FiM?
Borkato@reddit (OP)
Mikupad is actually the closest ngl, but when I wrote this post I was asking about actual frontends like sillytavern lol
LORD_CMDR_INTERNET@reddit
"front end" != "gui"
Stunning_Inside5182@reddit
I use Pi for my coding and then use BoltAI on Mac since that is my computer I use most of the time and the app is really polished and stable. I ran openwebui for around 6 months but personally I had issues with it being quite unstable and chats dissapearing and things. Still need to find another front end UI that can connect and be used with my iPhone.
daniel_nguyenx@reddit
Thanks for using BoltAI. FYI you can use the BoltAI mobile app for free while it's in beta, I'm going to release a really big update soon (same feature parity with Mac app including remote MCP servers). Try it here https://boltai.com/docs/boltai-mobile/getting-started
Thanks
Stunning_Inside5182@reddit
Oh thanks I didn't even realize this I will give it a go and will get back to you. For the MacOS app are there any plans to add support for searxng for local web search so I can keep everything local? Thanks alot for your work
daniel_nguyenx@reddit
Ah good idea. I think I had this feature request before but I forgot about it. I will try to add official support for it. But in the meantime, you can try to find a good MCP server for it.
Stunning_Inside5182@reddit
Thanks will do. Is there any chance you could look into adding an option to be able to choose an agent as the default for a particular project?
LORD_CMDR_INTERNET@reddit
pi
Objective-Error1223@reddit
Pi all the way, if you don't want the bare bones version just grab Little-Coder (it's designed for small models but I use it with 27B Q8 and it honestly works great).
ab2377@reddit
there never was any problems small models (or big) with pi or opencode, dont know what made up problem little coder solved.
Objective-Error1223@reddit
Dunno, I didn’t make the harness it just works. 🤷♂️
ab2377@reddit
and you never used pi or opencode that didnt work and you had to use little-coder?
Objective-Error1223@reddit
That’s like asking “and you’ve never used Gemma or Mistral that didn’t work and you had to use Qwen?”.
We’re in the age of evolving harnesses, models, IDEs, etc. Hell we even have different ways of running the various thousands of different models from llama.cpp, omlx, vmlx, vllm, lm studio to kobold.
What works for one person may not work for another due to their software/hardware limitations, OS uses, purpose of use, scope of the work, and finally their personal preferences.
film_man_84@reddit
LM Studio, sometimes Open Web UI, sometimes llamacpp-server web UI, sometimes Koboldcpp web ui and sometimes even Silly Tavern.
Most used ones are LM Studio and Open Web UI.
WishfulAgenda@reddit
VS Code with Continue Dev, OpenCode, LibreChat/LangFuse.
PeteInBrissie@reddit
I'm really liking LibreChat
WishfulAgenda@reddit
Danny’s done a great job with it and when paired with clickhouse it’s even better :-)
chris_0611@reddit
Frontend?!? What's that?
Raw OpenAI api calls straight from the commandline using curl
Borkato@reddit (OP)
Do you have an alias set up or anything or do you type out “{messages: [role: user, content…” every time?
ProfessionalSpend589@reddit
There are a few examples in the llama.cpp docs which you can copy from and just edit the prompt.
Alternative_Web7202@reddit
He just forgot to put /s in the end
ea_man@reddit
Come on man don't be lazy, telnet there!
ab2377@reddit
everyone's life should be this simple!
Ill_Barber8709@reddit
You do not like bloated software like Pi I see. A man of commitment.
Southern_Sun_2106@reddit
Telegram, so that I can talk with my agent from anywhere. Plus, it can generate music for me, send my any file I want (and it has access to), do long-horizon agent work (I don't need to keep telegram alive, it will just notify me). Of course, you have to be careful what you give it access to, just like with any other setup. In use Qwen3.6 35B A3B q4km gguf at the moment running locally on a dedicated machine.
the-username-is-here@reddit
Claude Code.
Tried pi and opencode - they are nice and light, but are really bare bones and need a lot of work to start using properly.
spammmmmmmmy@reddit
I made a html chat page to manage ollama api requests. Cut/paste from code blocks, copy files etc.
Tormeister@reddit
OpenWebUI for big LLM queries
OpenCode for local LLM coding
temperature_5@reddit
I'm dog fooding my own chat/agent UI, along with llama-server. So simple to make a chat client, so impossible to stop adding features... :-D
Barafu@reddit
I use Cherry Studio. When I needed an application, it was the only decent app available on FlatHub. Since that, I did not have a reason to change. Chatbox was nice too.
But I also use a lot the webpage chat of DeepSeek. Its ability to do internet search is majestic. It has almost replaced DuckDuckGo in my daily use. I wish I could set up an LLM search that good with a local LLM.
Risen_from_ash@reddit
My agent’s web search game got better by orders of magnitude when I started using Firecrawl instead of my locally hosted SearXNG. Qwen 3.6 35b a3b with Firecrawl feels like GPT 5.5. Blows my mind.
meth_priest@reddit
figured id make an account to pitch in
been building a local RAG tool focused on code and document understanding. point it at any folder like source code, docs, mixed repos — and it indexes and retrieves across it. you pick whatever model you have locally and it adapts around it. hybrid search with a reranker on top, indexing rates each file based on your hardware so it doesnt just choke on larger codebases. supports 20+ programming languages and most common document formats so far.
mostly built it because existing tools made too many assumptions about what kind of content you were throwing at them or what hardware you were running. this one tries not to. also works 100% offline - so usable for sensitive files
still finishing it up — will keep hear from me once the website and beta launch
ansmo@reddit
Also made my own from scratch. It's obviously not as polished as the big boys but it has features tailored to me and my usecases (reading, writing, tts, research, granular experimentation and pattern tracking, daily scouting reports, artifacts, image gen and editing, etc) and I know exactly how all of it works. I use it for basically everything except for coding. For coding agents, I'm on opencode and a fork of openclaude.
skryking@reddit
My own custom tui that I spec'd out for qwen to code.
bgravato@reddit
Long time vim user here...
I have now been experimenting with VSCodium + continue add-on for AI integration (+ vim add-on, to get vim shortcuts on VSC). I'm still getting used to it, but the vim add-on to replicate vim shortcuts/modes really helps in the transition.
suprjami@reddit
There is https://github.com/madox2/vim-ai if you prefer the VSCode style of chat/rewrite/fill-in.
OpenCode obeys your
EDITORenvironment variable if you want to drop in and out of agent mode.o0genesis0o@reddit
I have two "frontend"
One is a custom made productivity system + workflow + agent web app that I built for myself and my wife to use. It's accessible on all of my devices via VPN and good enough for tasks that we usually use chatgpt or gemini or whatever web frontend.
The other "frontend" is Pi agent + Obsidian or neovim. This one is for local coding or knowledge base management. I wish to migrate more responsibility to the web app frontend, but it's not that convenient vs opening a terminal.
TheDapperYank@reddit
I used Claude to build my own front end with tool calling, memory, compaction, etc and use llama.cpp for the backend. Nothing against anything else already on the market, but I kind of viewed this as more of an academic exercise and wanted to spec out my own with my own feature set.
PigSlam@reddit
I’m doing this with Codex right now.
Fit_Squash6874@reddit
Lmstudio
elijahebanks@reddit
Your mum's
florinandrei@reddit
Pleasant-Shallot-707@reddit
Pi and Hermes
drunnells@reddit
Open WebUI for conversation, Hermes Agent CLI or dashboard for getting work done, trying to get into Open Code. For non-local I like the direction the Codex desktop app is going.
ayylmaonade@reddit
I've been using Open-WebUI in docker since like, I wanna say jan-feb 2025? That's about the time I got into local LLMs (and AI in general). It's definitely not perfect and has its issues, but I find it to be the best all arounder. It's like a customisable ChatGPT.com, sort of.
Just the ability to add my own custom tools and have a debian sandbox, memory, crons, great RAG, and the ability to add any openAI style endpoint (openrouter, etc) makes it better than everything else out there for my use cases.
I do use Hermes Agent alongside it though. I have open-webui setup in firefox as the sidebar AI chatbot, so it's just convenient.
fallingdowndizzyvr@reddit
Llama-cli.
Miriel_z@reddit
I use my own gui made in Python. Not sure if it is the best way, but it gives me the control.
Some-Cauliflower4902@reddit
Same here. Build the tools as I need it. It’s more lean and efficient.
Borkato@reddit (OP)
I did the same!
Miriel_z@reddit
Well then, our idea is not so stupid it looks like🤣
philmarcracken@reddit
Late cli has done more work with less headaches for me. little coder is faster and likes it rough.
I'm considering my own using Pi and copying aspects of little coder(model swapping) and late(subagent splitting, plan first). Both have annoyances
Smallcoder just shits the bed for me
Hot-Employ-3399@reddit
From agents I just switch between anything now. After a month it feels all of them can be very bad.
Front-end chat, I use rarely and then it's usually cloud models. Usually several at the same time(Glm, Gemini, chatgpt).
When it's not cloud, it's alt tab to current agent and question is asked to it.
kevin_1994@reddit
Open webui still good. I have no reason to change it.
Alan_Silva_TI@reddit
I mostly use CLI harnesses like PI or Hermes agent.
I also have some self-made tools that connect to the server with their own purposes like brainstorming or text correction.
rmhubbert@reddit
For coding, I use Neovim with Opencode as the backend via https://github.com/sudo-tee/opencode.nvim for agentic & interactive coding, and https://github.com/cursortab/cursortab.nvim for code completion.
For general chat, Open WebUI.
LeMochileiro@reddit
Since the models run on a different machine with a dedicated GPU (a node in a K8s/K3s cluster), I need more flexibility to run different models with different configurations and parameters, without having to access the machine via SSH or create containers/pods all the time.
LocalAI is what has been serving me quite well lately. It deploys different backends (like llama.cpp or Vulkan), download different Hugging Face LLMs, change parameters, check usage, and easily integrate with tools like OpenCode. Everything is done directly through the frontend.
I'm open to exploring alternatives, but it has to be in Docker/a container for me to be able to run it on Kubernetes.
666666thats6sixes@reddit
llama-server --fim-qwen-7b-spectalking to llama.vim in neovim or llama.vscode in vscode is golden for IDE-ish work. Agentic work, opencode/pi/something DIY running inside bubblewrap to do no harm, again talking to llama.cppEmbarrassed-Area4652@reddit
obsidian-copilot
Thick-Protection-458@reddit
opencode in terminal. A bit of hermes for non-coding stuff, but that's a rare situation