I've seen a lot of folks ask "can local LLMs actually do anything useful?"
Posted by NoWorking8412@reddit | LocalLLaMA | View on Reddit | 71 comments
And I'm here to share my experience. The answer is resoundingly 'yes'.
Let me start with the local model I use every day in my AI harness: embedding models. I'm using an embedding model to give my AI's persistent memory system a semantic search protocol that makes its memory recall feel seamless to the human user.
Now my more recent use case:
Lately, I have been trying new applications for Qwen3.6-35B-A3B. I have been experimenting with a flow where Qwen evaluates a database based on criteria I give it on a regular weekly interval. It then sends me an email based on the data that meets my criteria. I respond via email with my choice of which items it found to move forward with. It then takes my choice and runs that against our list of sources and our knowledge base to create a document, which it then pushes to a Google Doc, then emails me said Doc. I then edit the Google doc and leave comments for Qwen to incorporate as feedback. When we are done iterating, I email Qwen and tell it to convert the doc to our PDF template. It then converts the work into a nicely formatted PDF and emails it back to me so I can prepare it to send to the end user.
I'm starting simple and moving to more complex tasks, but so far Qwen3.6-35B-A3 is just knocking down every task I put in front of it. I'll report back as things develop but seriously, verdict is yes. You can do many useful things with local LLMs.
What are you doing with your local LLMs?
ttkciar@reddit
I've been doing a few things:
GLM-4.5-Air: Codegen, physics assistant (mostly critiquing my neutron transport notes and suggesting relevant subjects for further study), and medical assistant (mostly explaining medical journal publications to me).
Gemma-4-31B-it: Wikipedia-backed RAG for general Q&A, creative writing, business writing, language translation, Evol-Instruct pipelines, sometimes debugger for GLM-4.5-Air's code.
Big-Tiger-Gemma-27B-v3: Critiques my Reddit activity and provides constructive criticism, persuasion research, violent creative writing (Murderbot Diary fan-fic; non-erotic but very violent). I'm looking forward to TheDrummer giving Gemma-4-31B-it the Big Tiger treatment so it can take over these tasks.
K2-V2-Instruct: Long-context tasks like system log analysis and IRC log analysis, also what my "actlikettk" (self-clone) script uses, though Gemma4 might be taking over that role, not sure yet.
Qwen3.5-9B: Synthetic dataset upcycling and augmentation.
All models are quantized to Q4_K_M.
GLM-4.5-Air and K2-V2-Instruct are too big to fit in 32GB VRAM, so I use them via pure-CPU inference, which is slow but I adapt my workflow around that, so I'm either working on other things or sleeping while they infer.
The rest of these models fit in VRAM. Usually Gemma-4-31B-it stays resident in my MI60, Big-Tiger-Gemma-27B-v3 stays resident in my MI50, and Qwen3.5-9B stays resident in my V340.
Tccybo@reddit
I am extremely curious about your wiki rag setup, please give a few pointers if you find time.
Silver-Champion-4846@reddit
You sound like you're in Earthly Heaven. If only I had a gpu or even a good cpu, I could just install Pi and build it around learning and cooperating on many things
NoWorking8412@reddit (OP)
I love that! Tell me more about your self-clone? Is that your doppelganger?
ttkciar@reddit
Yeah, it's a doppelganger-type script. I hacked it together in about an hour using bash and .md, mostly from my highest-scoring Reddit comments which I felt were most articulate and best expressed my world outlook.
There's not a lot to it. The ttk.md file is 161KB of writing samples, and the "actlikettk" bash script frames a user-provided prompt and the contents of ttk.md into a synthetic prompt and passes it to
llama-completionfor K2-V2-Instruct to infer upon.It's very, very slow (K2-V2-Instruct is a 72B dense, and it's inferring pure-CPU), but does a good job of responding with my own voice and attitudes. When one of its responses is unlike me, I will write something on the subject and add it to ttk.md, so it's incrementally getting better at emulating me.
It's purely a toy. I've not used it for anything practical, and don't know if I ever will, but it's fun to poke around with from time to time.
Zynbab@reddit
So it writes fear-mongering headlines?
ttkciar@reddit
wat
NoWorking8412@reddit (OP)
That sounds super amusing. I think the art of creating doppelgangers has fortunately/unfortunately already become profitable. I think I have many doppelgangers out there already!
Last_Mastod0n@reddit
Ok you've piqued my interest in GLM but I have a 4090 so only 24gb of vram. My cpu is a ryzen 9800x 3D so I wonder how well it would run with many layers on the CPU (probably not well) lol
ttkciar@reddit
Yeah, pure-CPU inference is slow, and I doubt your 4090 could load enough layers to speed it up too much.
Still, sometimes high quality results are worth the wait, and you can leave it inferring while you're sleeping or working on other things or out running errands or whatever.
There's no harm in giving it a try!
DaMoot@reddit
Absolutely. I'm running Hermes Agent with local Qwen3.6 27B. If AI is getting stuff done for me right now it's running local because I interact with sensitive company and client data.
Not to say I don't still chat with Claude and GPT. I still vibe with Opus on the 20 bucks plan. But Qwen has done plenty of solid coding on its own!
It's actually replacing, slowly, our SaaS SIEM tools for daily alerting, digests and triage diagnosis at work. The agent interacts with tooling that pulls info from an ELK stack. It's been a fantastic addition on getting eyes on server issues the other SIEM wasn't alerting to. Yes it could likely all be scripted, but the LLM adds rich context and just helps demystify stupid windows event log spam. The agent does a really good job of selecting the right tool (often multiple of them) to get the right info asked of it. A lot of these log processes, even heavily truncated, are a 30-40k token payloads and the agent just gobbles it up.
It helps me immeasurably with email and followup; I have a great oversight job that runs to catch potentially missed emails. Even if it's duplicate stuff I tell the agent to disregard, the multiple times a day digest has already raised red flags (in a good way). I don't want a company ingesting my work email and any sensitive info I may get from clients. My agent interacts with MS Graph and keeps it all local. I can tailor it to do anything I want. It does not draft or send emails for me though.
I'm also working on recovering my wrist from moderate carpal tunnel, so I have tools for my agent that can open, close, put time in and interact with tickets for me all through a single typed prompt and confirmation. I can type better than I can mouse these days. No mousing and clicking required. Can do it on my cell phone with voice to speech from anywhere since I use Discord.
NoWorking8412@reddit (OP)
That's incredible. I have been meaning to experiment with the Hermes Agent. I also interact with sensitive client data and that is a huge driver for my explorations.
My persistent memory project is Crow: https://github.com/kh0pper/crow
It's an MCP gateway I'm developing. It has an embedding endpoint to extend the FTS5 SQLite persistent memory with semantic search.
Silver-Champion-4846@reddit
Is it for all kinds of memory?
codehamr@reddit
Same here, local is doing real work now. One thing though, if it is mostly coding I would put the 27B dense ahead of the 35B A3B. Feels way more consistent in long agent loops and the benchmarks back it up. The A3B is fun when you want throughput on easy turns, but for real repo work the dense one is my daily. Been building my own coding agent for a while and the biggest lesson so far is that careful context management beats stuffing the window every single time. A lean context with the right snippets outperforms a fat one with everything thrown in.
Endurance_Beast@reddit
Great use cases. I have been actively doing this lately:
I also use it with my homelab maintenancr, which is a great use case that saved my tons of time.
suicidaleggroll@reddit
The people who ask that aren’t regulars here, and clearly don’t search before posting, so they’ll never see this thread.
ceo_of_banana@reddit
I made a post like that after being on this sub for a while. There's a difference between "it does something useful" and "it justifies the price" which was the question I was trying to answer with that post. Embeddings cost close to nothing and handling emails could be done with Codex with a plus subscription which most people have anyways. If you use their cheaper models I don't see you hitting rate limits easily by delegating it simple tasks like that, unless it's on a large scale like "handle these 300 docs". Agentic coding is what you hit rate limits with easily and typically you'll want a frontier model for that.
I'm sure there are people with use cases were it's necessary but not sure it's that many. Of course if you already have the hardware you don't need to have that discussion with yourself.
Ell2509@reddit
Yes, this is what people are really asking. You just bridged a double gap!
NoWorking8412@reddit (OP)
Lol
SimilarWarthog8393@reddit
I'm a teacher and I use Qwen3.6 35B A3B to lesson plan, generate worksheets and exams, brainstorm, etc. I use ComfyUI to generate custom images for my worksheets or PPTs to better engage students. I also don't use cloud models for web searching anymore, Cherry Studio + Brave MCP + a good system prompt is more than sufficient for many simple research tasks.
GrungeWerX@reddit
What a delightful use case!
NoWorking8412@reddit (OP)
Nice! I work in public education. Not currently a teacher, but taught for many years. I would love to chat with you more about your use of LLMs for education.
SimilarWarthog8393@reddit
Slide into my DM (;
Last_Mastod0n@reddit
It can do so many things. People just expect it to be able to code on the level of claude opus or gpt 5.5 which is just unrealistic.
NoWorking8412@reddit (OP)
Sure it's unrealistic, but even a local embedding model with the persistent memory improves my experience with Opus by a magnitude of 10x. But with the Qwen3.6 models, it really is like witchcraft. The reasoning is too good. It may be a little slow, but it hits it on the head every time. It's not Opus, but it hits like Opus for some reason.
Last_Mastod0n@reddit
I turn reasoning off because it takes too long. My token generation speed is great, but it usually spends around 2k tokens per reasoning response which I dont have the patience for.
But even without reasoning I cant complain. The quality of my model matches and sometimes beats GPT 5.4 mini at coding and vision. But it still is nowhere close to GPT 5.5 or Opus 4.6
GrungeWerX@reddit
I haven’t even turned thinking on my Qwen 3.6 27B yet
NoWorking8412@reddit (OP)
I'm using Qwen3.6-35B-A3B UD-Q6_K. All I can say is it is nailing it pretty much every time I need it to do something. Feels like sonnet or haiku.
EndlessB@reddit
What do you use for embedding?
NoWorking8412@reddit (OP)
I was using nomic, but I switched over to Qwen3-Embedding-0.6B
Miserable-Dare5090@reddit
have you tried the microslop finetune, Harrier?
NoWorking8412@reddit (OP)
No, what can you tell me about it?
No-Mountain3817@reddit
https://huggingface.co/microsoft/harrier-oss-v1-0.6b
-p-e-w-@reddit
Today’s mid-sized local models absolutely crush Opus 4.0 and GPT 5.0 though, which were the frontier not too long ago.
dark-light92@reddit
At this point question should be what can frontier models do that local models can't. Because sub 40b local models can probably take over 80% tasks.
NoWorking8412@reddit (OP)
That's where I am as well. Use the frontier model to set up the task (10-20% of the work). Use the local model to knock it down. That seems to be the way forward. I'm also connecting both frontier and local models to use the same persistent memory using Crow, which helps too.
GrungeWerX@reddit
I typically use frontier models as a second opinion to see if Qwen misses anything in its strategies. Most of the time they just approve its plan. Occasionally, Qwen will find a rare solution that both of them (Gemini, Claude) missed, which is ironic considering they all have internet access.
That emboldened me to start using Qwen deeper for projects and Ive been very happy since.
No-Mountain3817@reddit
this might be better than Crow given your use case https://github.com/ogham-mcp/ogham-mcp
GreenHell@reddit
I'm not going to do a full writeup, but Qwen 3.6 35B found, and fixed, some startup issues in my Debian startup log that Gemini Flash missed, so there's that I suppose.
Southern_Sun_2106@reddit
Qwen3.6-35B-A3 flagged you as self-promoter based on your prior posts about your project called Crow. Just fyi.
cbpn8@reddit
Any useful use cases for small businesses, especially something that justifies 4000 of investment upfront?
marscarsrars@reddit
Why Qwen3.6 35 and not the 27?
hblok@reddit
EducationalGood495@reddit
Hi, I am new to LLMs and planning to buy either 2080Ti 11Gb or 3060 12Gb to run Qwen 35B with offlaoding to cpu. Both are second-hand and good value but 2080Ti has 70Watts more power draw, 1 fewer gigs of vram but has roughly 2x bandwidth. What do you think?
Formal-Exam-8767@reddit
Define useful.
Considering that people are mostly using chats bots the wrong way (as authoritative source of truth), I am not surprised they don't find local LLMs useful.
Sofakingwetoddead@reddit
Local models can do lots of things but it depends how much you pay them.
danishkirel@reddit
What you do is every agentic. What’s orchestrating? From another answer I can see it’s not Hermes. What is it?
NoWorking8412@reddit (OP)
https://github.com/kh0pper/crow
Substantial__Unit@reddit
Every weekend I tell myself to get more into the local setup I started toying around with. I want to build a Alexa clone as well and keep the entire thing in-house.
NoWorking8412@reddit (OP)
I've been making my own "local Alexa" if you will using Crow: https://github.com/kh0pper/crow
Substantial__Unit@reddit
Definitely checking this out. Sounds just what I was aiming for thanks.
NoWorking8412@reddit (OP)
Awesome! Let me know if you jave any questions or requests.
GigiCodeLiftRepeat@reddit
Really cool. Thank you for sharing!
NoWorking8412@reddit (OP)
Thanks!
SuperWallabies@reddit
The question of "useful" usually comes down to the competition with SaaS. It’s a trade off between infrastructure investment costs and SaaS subscription fees.
Most things people try to do are already implemented as SaaS, and often at a very reasonable price. In those cases, there's no real need to invest in local hardware.
However, the specific use cases mentioned by the author are definitely practical and make a strong case for going local.
Impressive.
NoWorking8412@reddit (OP)
Fair points. I think I'm seeing some potential for some SaaS-pocalypse forces from the local LLM here though. It's definitely something to keep on eye on.
Enough_Big4191@reddit
honestly, it's impressive that you're making local LLMs work, but the whole thing sounds like a lot of manual back-and-forth. I get that Qwen is doing some heavy lifting, but isn't it just a fancy automation that’s still mostly static? I mean, email, Google Docs, PDFs aren’t we still stuck in the same old routine, just with a more complex tool? I feel like local LLMs can do cool things, but the real value comes when they stop just doing tasks and start thinking ahead.
NoWorking8412@reddit (OP)
So like your LLM is running OLS regression to predict outcomes or what? You could do that with local LLMs for sure if you have the data set.
jacek2023@reddit
gemma 4 31B and qwen 3.6 27B are coding my project for many days now, I also use Claude Code and Codex for other projects so I can compare the workflows and local models just work, slower, but without any limits
NoWorking8412@reddit (OP)
What does the gap look like between local and frontier for what you are doing?
jacek2023@reddit
I use Claude Code for C++ project (UI, embeeded, etc), I use local models for python project (machine learning), in both cases I can achieve what I want but in both cases it requires some skill.
NoWorking8412@reddit (OP)
Absolutely. What kind of machine learning are you doing with your local models?
jacek2023@reddit
Kaggle competition
Last_Mastod0n@reddit
This is my first time hearing about this. I would be interested after I finish my personal business project. Soon I will be moving onto marketing so ill need a good coding hobby again. Its either that or some open source contribution. Not both because I am quite competitive lol
jacek2023@reddit
I was active on Kaggle many years ago. Every time I come back, it brings back the joy of programming. That’s also why I know how to train models, so all this local LLaMA stuff feels familiar 😄
NoWorking8412@reddit (OP)
Would love to hear more about it.
jacek2023@reddit
If I win the gold medal, I will definitely write that I did it using local models, but the competitions are brutal and there is less than a month left, so I don't know if I will manage
NoWorking8412@reddit (OP)
Hang in there!
swagonflyyyy@reddit
Qwen3.6-27b-q8is so good that I'm creating a self-repairing component on my repo so my collaborator can submit issues on github instead of sending me DMs and the bot can auto-solve them, test them, push and restart the backend and run the updated agent on our discord server out of pettiness.Its the first local LLM I ran with Claude Code locally that I consider trustworthy enough to vibecode indefinitely without supervision on our project, albeit it will take hours to get done. But it will get there.
Before I can do that I need to finish setting up a sandbox environment on my project so I can just send it prompts with
--dangerously-skip-permissionsenabled so it doesn't wreck my PC. I already have a backup of the project just in case so it checks out.Essentially, I'm getting tired of my collaborator sending me DMs for micro-updates every day because he can since he expects me to mindlessly copy-paste his prompts to Codex. Its getting very annoying so to get him off my back I am going to direct him to our private repo to submit issues with a special label that will prompt my vibecoding agent to pull the repo, vibecode the solution, test it extensively and obsessively, then finally push it before restarting the backend and letting it run continuously until the next issue is raised.
It'll be the most passive-aggressive piece of automation I'll ever have created, and its all thanks to
Qwen3.6-27Bsince it actually behaves like a disciplined programmer that does all the things a diligent, focused, patient programmer should do.ducksoup_18@reddit
Take a peek at https://plannotator.ai/
Ha_Deal_5079@reddit
that email roundtrip for iteration is a dope pattern ngl. been running similar local automation and its wild how far these models have come