I've seen a lot of folks ask "can local LLMs actually do anything useful?"

Posted by NoWorking8412@reddit | LocalLLaMA | View on Reddit | 71 comments

And I'm here to share my experience. The answer is resoundingly 'yes'.

Let me start with the local model I use every day in my AI harness: embedding models. I'm using an embedding model to give my AI's persistent memory system a semantic search protocol that makes its memory recall feel seamless to the human user.

Now my more recent use case:

Lately, I have been trying new applications for Qwen3.6-35B-A3B. I have been experimenting with a flow where Qwen evaluates a database based on criteria I give it on a regular weekly interval. It then sends me an email based on the data that meets my criteria. I respond via email with my choice of which items it found to move forward with. It then takes my choice and runs that against our list of sources and our knowledge base to create a document, which it then pushes to a Google Doc, then emails me said Doc. I then edit the Google doc and leave comments for Qwen to incorporate as feedback. When we are done iterating, I email Qwen and tell it to convert the doc to our PDF template. It then converts the work into a nicely formatted PDF and emails it back to me so I can prepare it to send to the end user.

I'm starting simple and moving to more complex tasks, but so far Qwen3.6-35B-A3 is just knocking down every task I put in front of it. I'll report back as things develop but seriously, verdict is yes. You can do many useful things with local LLMs.

What are you doing with your local LLMs?

[-]

ttkciar@reddit

I've been doing a few things:

GLM-4.5-Air: Codegen, physics assistant (mostly critiquing my neutron transport notes and suggesting relevant subjects for further study), and medical assistant (mostly explaining medical journal publications to me).
Gemma-4-31B-it: Wikipedia-backed RAG for general Q&A, creative writing, business writing, language translation, Evol-Instruct pipelines, sometimes debugger for GLM-4.5-Air's code.
Big-Tiger-Gemma-27B-v3: Critiques my Reddit activity and provides constructive criticism, persuasion research, violent creative writing (Murderbot Diary fan-fic; non-erotic but very violent). I'm looking forward to TheDrummer giving Gemma-4-31B-it the Big Tiger treatment so it can take over these tasks.
K2-V2-Instruct: Long-context tasks like system log analysis and IRC log analysis, also what my "actlikettk" (self-clone) script uses, though Gemma4 might be taking over that role, not sure yet.
Qwen3.5-9B: Synthetic dataset upcycling and augmentation.

All models are quantized to Q4_K_M.

GLM-4.5-Air and K2-V2-Instruct are too big to fit in 32GB VRAM, so I use them via pure-CPU inference, which is slow but I adapt my workflow around that, so I'm either working on other things or sleeping while they infer.

The rest of these models fit in VRAM. Usually Gemma-4-31B-it stays resident in my MI60, Big-Tiger-Gemma-27B-v3 stays resident in my MI50, and Qwen3.5-9B stays resident in my V340.

[-]

Tccybo@reddit

I am extremely curious about your wiki rag setup, please give a few pointers if you find time.

[-]

Silver-Champion-4846@reddit

You sound like you're in Earthly Heaven. If only I had a gpu or even a good cpu, I could just install Pi and build it around learning and cooperating on many things

[-]