What are your real life/WORK use cases with LOCAL LLMs
Posted by Adventurous-Gold6413@reddit | LocalLLaMA | View on Reddit | 32 comments
Use case, work, model, hardware
Posted by Adventurous-Gold6413@reddit | LocalLLaMA | View on Reddit | 32 comments
Use case, work, model, hardware
Ok_Appearance3584@reddit
Computer use agent through voice commands. No need to use mouse & keyboard. Get your presentation slides done while having a lunch.
Need to build your own framework though. I'll open source mine next year.
Mkengine@reddit
Which model(s) do you use?
Ok_Appearance3584@reddit
I've been playing around with Qwen3 VL variants and GLM 4.5V.
PatagonianCowboy@reddit
is this an actual thing you do? seems slowish, error-prone and unpractical to me
Ok_Appearance3584@reddit
Have you tried it?
PatagonianCowboy@reddit
yes, that was my experience
Ok_Appearance3584@reddit
Yep, mine as well if you take off the shelf solutions. What you need to do is engineer a proper system around the concept. The projects that I could find were not optimized for real time human in the loop, co-pilot kind of workflow, but rather more independent agentic stuff where it's more important not to make mistakes than to do it fast enough.
In general, especially with the latest Qwen3 VL series, the basic vision and reasoning capacity is there out of the box. But nobody wants to wait 10 seconds between mouse movements and clicks.
One of the first optimizations I did was to separate reasoning and planning from taking action. No reasoning between trivial actions. This already reduces the latency considerably.
Another thing is that people tend to think about stuff in series. It's better to do stuff in parallel. Especially with local inference, if you get 10 - 50 tokens per second, it's better to split thinking into parallel requests. You can get the same stuff out of the model multiple times faster. vLLM offers great batching and you don't hit the memory bandwidth issue so bad.
In general, what you get is unoptimized experience. To make it work you need to engineer a good system for yourself.
Brilliant-Regret-519@reddit
I'm using Gemma for something similar. Whenever I create slides, I let Gemma review them and make suggestions for improvement.
Expert-Highlight-538@reddit
Can you share an example of how good the presentation slides look? and also some material using which someone can try this themselves
Ok_Appearance3584@reddit
Well, I can't share them because they contain sensitive information, but I can describe the quality as better than what I could make them. Keep in mind this is an iterative process where the slides are drafted by the AI based on my description, then I give feedback and we iterate until it's done. Like working with an employee.
As for material, you can look up computer use github projects, like this one https://github.com/trycua/cua
SM8085@reddit
My llm-ffmpeg-edit.bash script has been putting in work. (A bot can explain the script) I have a bunch of videos that I'm having Mistral 3.2 go through.
This video has 7,027 frames I need to go through looking for something. We'll see how much of that 516MB it chops out.
It's basically live footage I was recording and now I need to just get the segments I'm interested in.
Mistral 3.2's accuracy isn't perfect but it's acceptable for me.
unknowntoman-1@reddit
I am a former system engineer and currently land surveyor. Frankly no llm seem to have very good support for the exact purpose of geodesic, or even fully understand purpose of a coordinate system. So I am resorting to privately do custom chatbots. Great fun and beats scrolling the r/ sections of Reddit.
AlgorithmicMuse@reddit
Trying to get a meaningful working agentic system verses just answer an agent .
cromagnone@reddit
I seem to be becoming an aircraft geek in my middle age and have set up a local knowledge graph to build a complete conceptual history of flying machines.
I have the unsloth Q6 of Quen3-30B-A3B on my 4090 currently chugging through a list of about 10,000 things on Wikipedia, parsing the contents for content matching a basic schema, and populating a local neo4j graph database. But it’s also capturing each time any other context it thinks necessary to interpret the data it’s captured in each particular case and storing it in the main node for each entity. When it’s done, I’m going to have it go back across all the extra bits in one pass and augment/modify the original schema with its own important general concepts and then repopulate the database with the new information in each case, before running another Wikipedia pass to fill in any gaps it can find.
I know this is sort of an inefficient graphRAG, but it’s all human readable and the knowledge graph is interpretable. It’s really interesting to see the schema evolve. In testing, the basic schema was just model<-produced by<-manufacturer and engine_type<-has<-model and by the time it finished the second phase, models had designers, maiden flights, ICAO type designations, and it had invented the concepts of crashes (which had dates and locations), conflicts (likewise), operators and engineering challenges as augmentations to the schema. It’s just very cool to see it formalising an ontology in real time.
I think when it’s done for real - a couple of days - it should be able to answer questions like “how long does it take for engineering solutions to be implemented as a result of a fatal crash?” or “why don’t more sailplanes have turboprops?” and provide reasoned and referenced answers.
Next to it is a flutter app that I vibe coded with the instruct version of qwen3-30B to get real time look ups for local planes (I moved to an area near to an airport which is what caused all this) and the end point is that I can operate both through a smaller LLM running constantly and literally just get it to alert me whenever an interesting plane comes past - but where interesting is defined on much more rigourous terms then just being old or numerically rare. If I can get the database to fit, the whole thing can be mobile and my phone just tell me to look up when something interesting is around, wherever I am. And I should add that I don’t actually code much in Python or know the first damn thing about how to use Cypher for querying the database. The whole thing has been done locally and in languages I don’t speak. It’s great fun :)
SuperCentroid@reddit
I’ve pretty much stopped using them because they are bad.
MitsotakiShogun@reddit
Lots of small local LLMs (e.g. Qwen2.5-72B, Qwen3-30B-A3B) were better at translation (at least Chinese -> English) with idiomatic expressions than Google translate.
Me sending a picture of a document in German to Mistral3.2-24B typically results in great results in translation and and extraction.
Feeding a daily feed (security vulnerabilities) into an LLM so it can extract named entities and output them as a list also works out if the box with <30B models.
Writing small bash/Python scripts works pretty well too.
Title generation for articles (e.g. if you work at Bloomberg or something) is a pretty good use case for <10B fine-tuned models.
If your scope is small, small models are good enough.
Adventurous-Gold6413@reddit (OP)
Thanks for this
MitsotakiShogun@reddit
YW! Btw, you can use something like n8n to quickly prototype workflows that use AI to do something. In this case, I have a small workflow that, every morning, looks up CVEs, extracts the affected software, and then compares them to the services running in my homelab to see if any are vulnerable. It then sends an email to me with a yes/no in the first line (and the list of affected software after that, for me to double-check).
wahnsinnwanscene@reddit
Can n8n distribute and coordinate workloads across a bunch of computers?
MitsotakiShogun@reddit
Sorry, no idea, I'm not that big of a user. Maybe you can try asking in r/n8n or doing a search?
json12@reddit
Been always wanting to automate part of my work using n8n(or something similar) but it seems overwhelming on differing things you can do. Any good resources to get started for beginners?
MitsotakiShogun@reddit
I just watched a bunch of random videos on YouTube and then experimented, googled, and RTF'd my way until the pipelines were working.
But I just used it to learn a bit more about how it works, not because I really needed it. I'm a programmer so it's easier for me to code things myself (e.g. using python/bash/js, cron, etc) rather than use n8n. Again, YouTube/Google/docs are your friends.
Medium_Chemist_4032@reddit
Mostly tinkering, and a few so far failed attempts at mapreduce-like jobs for prototyping something I'm tasked at work actually. It's nice to not see 20 USD eaten per run
Red_Redditor_Reddit@reddit
I use mine to make engineering notes better organized.
kevin_1994@reddit
I work as a software engineer and I use local models exclusively (except when my server is down because I've been messing around with config lmao)
Right now I'm using gptoss with low reasoning with 4090 + egpu 3090 with 40 tg/s and 1200 pp/s
No, it's not close to as good as Claude Sonnet 4.5. That means I need to think more, and rely on the local model only for low hanging fruit. Thats good for me!
Horus_simplex@reddit
I've made a small script that compare my CV to job offers on LinkedIn, rate them according to my taste and generate a PDF report every 3 days with the top 10 offers I could apply to. It uses LM Studio, usually qwen 3 5 or 8B is enough, I've also tried with bigger models without any significant improvement. Now I'm also running it with the CV of some friends it came out quite handy !
ekaknr@reddit
Hi, this sounds great! How did you manage to scrape the data on LinkedIn? I think it is prohibitive for bots.
Horus_simplex@reddit
There's a python library that does it for you, it uses your own cookie you have to provide to authenticate, so it's half-scrapping. You need to carefully select your keywords and potentially restrict to a few dozen your sélection but it works well :)
Admirable-Star7088@reddit
Local LLMs have revolutionized computer programming for me, as I can now much more quickly produce code. Previously, I had to google around a lot, scroll through discussion forums with similar questions, read websites, etc, to find out how to tailor a specific piece of code (for example, it can be a function(s) that calculates specific mathematical formulas).
Now (using gpt-oss-120b and GLM 4.5 Air) I can just quickly ask the LLM to write the specific pieces of code I need, without having to do research for \~5-10 minutes or even sometimes hour(s) first.
ac101m@reddit
Slightly unreliable QA machine for learning new things. Great at explaining any concept that's well represented in the training data, but occasionally hallucinates. Occasional code snippets too.
For fun (always enjoyed messing with systems of emergent behaviour).
Research projects which require fine tuning or access to network activations.
Rondaru2@reddit
Mostly for entertainment. Roleplaying, casual chatting and think assisstance.
Fortuntely I'm not at all prone to fall for this new epidemic of "AI companionship". It's very easy to see the "button-eyes" in their "personality" when you're a bit knowledgeable in their underlying tech. Still ... playing with these "friendly puppets" beats doomscrolling the unfriendly web these days.
ResponsibleTruck4717@reddit
Work and hobby project.
Usually my hobby project later help me in work.