Are any of you using local llms for "real" work?

[-]

syncro2008@reddit

I get automated recruiter cold email sequences all the time and I hate having no way to fight back.

Tools like Jukebox, Gem, etc allow recruiters to setup these repulsive “3 email sequence” scripts and then blast them out to 1000s of inboxes.

“Hi {{candidate}}, the founders personally want to wipe your ass with venture $$$”, “Checking in, I know we’re all busy but we raised more $$$$ to wipe your ass with”

“Trying one more time…”

(I’ve received some of these sequences where the {{candidate}} variable wasn’t replaced.) (And yes, this is a first world problem for people who would love to get such inbound in a tough job market, so let’s spare that virtue side quest)

Anyways, Gmail filters that hardcode specific words or phrases always run the risk of false positives.

And I don’t see it in Google’s corporate interests to make solving this problem easy for end users anytime soon.

So I setup the following workflow:

Google app script runs every 5 minutes. Fetches emails from today that have not been labeled as “processed”
Create a batch request with the email title, body and sender email for each element.
Send it to my home server (Beelink ser5 max 6800u) running Qwen 3 8b behind a fast api server. System prompt describes the specific situation of tech recruiter cold email patterns with some few shot examples. Instructed to return a spam Boolean for each element in the batch.
Google app script receives the reply. Applies a Gmail label of “processed” to each email so they don’t get refetched in step 1. For the ones with spam: true, applies a “recruiter-spam” label, and removes it from Inbox.

In this way, when I “choose” to see inbound, I can find it in its right place, in the “recruiter-spam” labeled section of Gmail.

And my Inbox holds on to being human just a little while longer.

“My friend… what you allow, will continue.”

[-]

hmsenterprise@reddit (OP)

Ha as I read this, I anticipated you were going to be more nefarious and have your server engage in long running fake conversations with the recruiters to eat up their time or something like that

[-]

deepaklaksman@reddit

If you are wondering whether your hardware can run llms locally, checkout https://www.fitllms.com/

[-]

SM8085@reddit

The most "real" work I've done is that Qwen3-VL-30B-A3B-Thinking is currently going through videos 10-seconds at a time. Based on the bot's True/False boolean output a wrapping program keeps track of what segments is within. At the end, we're done using Qwen3-VL and the wrapping program uses the segment information to use FFMPEG to make a clipped version where should always be present.

\^--Screenshot from the wrapping program/script. Copying the frames to Yes/No (True/False) directories helps massively for checking the bot's accuracy. Can scroll through the directory and if you see in the No/False directory you know it flubbed. You can see how slow my hardware is, 12 minutes to analyze 20 frames and output JSON. Oddly, while the Thinking model chooses not to think it gave me higher accuracy than the Instruct, so I use it anyway.

A general example is you could have the bot look for every explosion in an action film. Maybe it considers muzzle flash from a gun to be an explosion, not technically wrong as far as I know. So you could prompt that it should only be explosions not from gunfire and try that.

I have a lot of video footage from live sources, so having the bot trim down a small 500mb file to say 30mb clip of what I'm interested in is literally saving NAS space.

If Qwen3-Omni would get llama.cpp support I'd be more open to using Qwen3-Omni to send the audio as well.

[-]

b_nodnarb@reddit

Looks like people want this. Would you consider putting it on AgentSystems? It allows you to discover and run and distribute self-hosted AI agents like they're apps: https://github.com/agentsystems/agentsystems (full disclosure, I'm a maintainer).

[-]

Leopold_Boom@reddit

Oh for gods sake I spent like 7 min on your github and site and I can't find a link to actual example agents. What's the point of a discovery platform without a bit list of "this is our best stuff"?

[-]

b_nodnarb@reddit

Good point. The whole system is federated so would need to consume the index api to surface the agents in the site - not just inside of the UI. I hind sight that’s super obvious but haven’t done that yet. Will reply here once that’s live!

[-]

Leopold_Boom@reddit

👍

[-]

relmny@reddit

I wonder if it could be used to remove scenes from videos? I guess I'll need some workflow wirh agents maybe? and some scripts to take big files, cut then in smaller pieces so qwen can read them, remove frames and then outting it back together aa a single file. And so on?

[-]

SM8085@reddit

Not sure if it's totally related, but ffmpeg does have some basic scene change detection where you can take frames when there's a major percentage of pixels changed. It seems oddly complicated to turn that into a timecode for each image for some reason. I have been meaning to try to tackle that problem eventually.

With subtitles, you could even intertwine the text between the scene images.

[-]

Mayion@reddit

Porn auto-tagging?

[-]

bloomsburyDS@reddit

clockwise rim job, anti-clockwise rim job, gals with dxxk etc.

[-]

PersonalCitron2328@reddit

There's no gals with dxxs just dudes with teets

[-]

Mayion@reddit

Dude seeing your reply notification made me go, "Wtf did I say for someone to reply with THAT"

[-]

Borkato@reddit

This is actually really cool, wtf. I’d love to see this in a git repo

[-]

SM8085@reddit

It's there, it's all free software.

[-]

hmsenterprise@reddit (OP)

But, out of curiosity and I don't mean this critically, what is the point of that workflow even? Like, is it actually a valuable task? I'm just trying to find someone/anyone who is using local LLMs on "consumer"-y hardware for something more than hobby tinkering

[-]

SM8085@reddit

It's valuable to me. When the accuracy is high enough I can delete the original video and save space as I mentioned. My NAS has a bunch of videos I've saved over the years that the bot can go through. That seems "real" to me, but I wouldn't call it "production grade" either.

[-]

GP_103@reddit

That could be incredibly useful to millions of people. I have Gigs of video , that could use this

[-]

spaceuniversal@reddit

Really spectacular. Let’s imagine all this integrated into a platform like YouTube with the possibility of doing such professional search queries as historical research “find me only black and white videos with vintage sewing machines”

[-]

hmsenterprise@reddit (OP)

Nice -- makes sense to me!

[-]

ryfromoz@reddit

Yes, me.

[-]

spaceuniversal@reddit

Really spectacular. Let’s imagine all this integrated into a platform like YouTube with the possibility of doing such professional search queries as historical research “find me only black and white videos with vintage sewing machines”

[-]

mfarmemo@reddit

I used gpt-oss-120b to create a data dictionary for healthcare databases today.

[-]

hmsenterprise@reddit (OP)

Nice -- did you feed it the schema and just let it rip? or how did you rig it up?

[-]

mfarmemo@reddit

Full metadata export for each data source standardized to data types since there are different syntaxes then chunk by schema with the detailed metadata. Used RAG for internal docs that give additional context as well as linked tables with shared keys when applicable. Thinking set to medium. Output to JSON.

I'll still need to verify a sample to reach at least 80% accuracy but overall the model does well interpreting the technical language.

[-]

Adventurous-Date9971@reddit

Big win: add column profiling and clinical code-set mapping so the model stops guessing. Compute per-column stats (null rate, distincts, min/max) and feed masked samples; ask it to map fields to FHIR resources and LOINC/SNOMED value sets, and to link likely FK targets. Gate acceptance on retrieval score; below threshold, return unknown and queue review. Emit JSON with a JSON Schema (name, description, piitag, constraints, fktargets, citations) and auto-generate Great Expectations checks from it. I’ve used dbt for type standardization and profiles, OpenMetadata for lineage/glossary, and DreamFactory to expose a consistent REST layer over mixed databases for the RAG and a light reviewer UI. How are you handling cross-source synonyms and PHI masking in the examples? The profiling plus controlled vocab with citations is what makes it production-safe.

[-]

SituationMan@reddit

I am a tutor. I use local LLM to make worksheets for students.

[-]

-dysangel-@reddit

I tried using Claude for real work for a while, but just overall found a felt quite disconnected from the code, and things ended up going slower because of overly messy implementations that I had to tidy up. At the moment I primarily code by hand, though with auto-complete, and occasionally asking for suggestions or farming out drudgery to the AI. For example I just had it write a test plan for a feature in our app, and I only had to tweak a few things. It's great for just the types of things that are so easy or repetitive that they're boring.

[-]

_murb@reddit

I use quite a bit for boilerplate code, tedius SQL queries, and data analysis (natural language to base script). I'm in the midst of taking processed data and feeding into various LLMs and system prompts to automate weekly/monthly/quarterly reports.

For personal I just bought a strix halo box and going to run gpt-oss and glm air for financial analysis, small agents, batch processing, CVE/patch management, home automation, so on.

[-]

giant3@reddit

I'm in the midst of taking processed data and feeding into various LLMs and system prompts to automate weekly/monthly/quarterly reports.

I have tried to use LLMs for this exact task. Doesn't work and ended up wasting a month on trying to get it working.

What works is 100 line Perl programs.

The biggest problem is LLMs slow down the more tokens they ingest and I haven't found a way to optimize it. Also, if you 100K records that need to be processed and you miss even 1 record, the customer won't accept it.

[-]

_murb@reddit

Luckily I am using bedrock so access to high quality models (sonnet 4.5) is possible and it does work pretty well, just not quite production grade. I'm using deterministic code and taking the outputs into the LLMs to help reduce the hallucinations. My data is many multiples over 100k records/day so load the dataset into the LLM isn't feasible nor do I want to for the performance/quality.

[-]

giant3@reddit

I think a better option is to use LLM to write Perl code. I gave the LLM all variations of the data and asked it to write the code to handle it. Then I did some minor fixing and integration with other tools.

[-]

Lissanro@reddit

I think local models are very capable, but depends on what hardware you have and what model you choose. I have been using mostly Kimi K2 past few months, starting with its first version, and later 0905, along with R1 when I needed thinking, and then Terminus after it was released. Recently I also recently downloaded Kimi K2 Thinking, which is promising, but I need to test and use it more before I can decide if it will replace Terminus for me.

For professional purposes, I use LLMs mostly for coding tasks. Often Roo Code. I use ik_llama.cpp rather than the mainline llama.cpp, since I find ik_llama.cpp has better performance, especially at longer context.

Also, since you mentioned workflows, this remiveds me... I had experience with ChatGPT in the past, starting from its beta research release and some time after, and one thing I noticed that as time went by, my workflows kept breaking - the same prompt could start giving explanations, partial results or even refusals even though worked in the past with high success rate. Retesting all workflows I ever made and trying to find workarounds for each, every time they do some unannounced update without my permission, is just not feasible for professional use. Usually when I need to reuse my workflow, I don't have time to experiment. Not to mention, for serious work I don't even have the right to send anything to a third-party and wouldn't want to sent personal stuff either. Hence why I had to go local.

[-]

ResearcherSoft7664@reddit

Yes. I used qwen 3 vl locally offline to convert images into structured documents. It’s accurate and safe

[-]

relmny@reddit

I do. Every single day at work.

It's like my own assistant/colleague/expert. Multiple times a day.

[-]

Adventurous_Cat_1559@reddit

Yes, I have one integrated with a few mcp servers I’ve written to manage my obsidian notes. Eg., “hey, add this GitHub link to my todo list as urgent for project X” and having it scrape / make the files in the right folder is great. Also ask it “for each open task, check if there’s a GitHub link, if there is summarise any recent comments” or “hey, so and so on slack said this , add it to the notes on project Y”. So when I get to my issues I’ve everything summarised in my personal notes.

[-]

cointegration@reddit

Using qwen3 vl to parse vids to extract car plate numbers snd other vehicle markings

[-]

Photoperiod@reddit

Does this work well on poor quality video? For example a lot of cctv at street intersections is kinda bad quality and plates are far from the camera.

[-]

cointegration@reddit

Lets just say if your eyes can make it out easily, qwen can read it

[-]

alfamadorian@reddit

I want to do that in real time, so when I drive down the streets or into a parking lot, I know if I passed someone I know.

[-]

cointegration@reddit

I'm using qwen3vl coz i have the luxury of time to slowly parse the vids, if you want realtime you gonna need to use opencv/yolo or train your own cudnn

[-]

hmsenterprise@reddit (OP)

Why do you need to do that though? Is it just for fun or actually something important/valuable to you?

[-]

cointegration@reddit

coz that's what the client wants

[-]

oodelay@reddit

You're fishing for ideas?

[-]

hmsenterprise@reddit (OP)

No not at all, surprisingly. I am actually working on a purely cloud based AI writing tool as my main job right now. I just personally am into local AI stuff and just had this growing feeling that I've seen a ton of chatter about fun little hobbyist use cases but nobody is really doing anything super valuable with them yet. I originally tried to make my product support local model workflows but it was a major pain in the ass and people weren't into it (at least at the time -- couple months ago).

[-]

oodelay@reddit

For now it's not changing the work environment for everyone but those who use it now will understand it better when it blows up.

[-]

Pvt_Twinkietoes@reddit

Personal use? Or you work for the police?

[-]

MitsotakiShogun@reddit

I use an LLM, a scraper to fetch security vulnerabilities (CVEs), and an internal API that lists my running services, and the LLM generates a daily report for me about whether any of the software I'm running might have been mentioned.

I've used local LLMs for coding (mostly Qwen-2.5/3 and Mistral/Devstral).

I've used local LLMs (ollama in WSL with a <1B model) for prototyping various stuff in my job, e.g. writing an LLM proxy, or setting up an environment for interview tasks.

And finally, various personal side-projects that have potential to become actual products.

[-]

Pvt_Twinkietoes@reddit

My tax/financial reporting at the end of the year is now automated, and part of it is done by a VLM (parsing the salary certificate PDF

Not using any OCR tool to extract the text to guide the VLM?

[-]

MitsotakiShogun@reddit

Not necessary, the VLM works well enough, an accountant double-checks and does data entry in the actual tax forms, and the tax office checks everything too, and may send back for corrections, so there is no fine or other issue even if the VLM gets it wrong.

Even without the accountant doing data entry / double-checking, it would still have been perfectly safe. Unlike what may be the case in other countries, the Swiss tax office is generally pretty chill about mistakes and they'll happily send you corrections (up to years later, if necessary). >!Although I'm not so sure if they'll be chill with omissions, so definitely be careful about that! D:!<

[-]

No-Consequence-1779@reddit

Perfect reason why ‘ai’ will not be eliminating many jobs.

[-]

Borkato@reddit

I would love to know what model and scraper for that first paragraph!

[-]

MitsotakiShogun@reddit

I wrote the scraper myself mainly because I can, but there probably more than are a few feeds you can you can use that are simpler and more robust, e.g. DuckDuckGo points to: * https://nvd.nist.gov/developers/vulnerabilities * https://cvefeed.io

The model I'm using currently (for all things) is mistralai/Mistral-Small-3.2-24B-Instruct-2506 in vLLM.

[-]

Borkato@reddit

This is super helpful, thank you!!!

[-]

b_nodnarb@reddit

I was actually thinking about deploying something like this to AgentSystems (allows the local AI community you to discover and run self-hosted AI agents like they're apps): https://github.com/agentsystems/agentsystems - might be interesting to package the tax reporting agent on there for others to use. Full disclosure, I'm the maintainer and am looking for people with solid local-first agents. People seem to like this one.

[-]

ryfromoz@reddit

I do the same thing for security vulnerabilities!

[-]

Empty-Tourist3083@reddit

were you fine-tuning the models or using them out of the box?

[-]

hmsenterprise@reddit (OP)

Nice! This is the closest I've seen to "actually valuable/important work being done by my local llm setup" in this thread so far

[-]

No-Consequence-1779@reddit

Absolutely. But you need to know what real work is. This is where experience is a contributing factor.

Working in organizations, with different business apps; both internal and external, and different workflows for information workers: you identify the common bottlenecks, where time can be saved, and other inefficiencies.

Specifically, time is money and anything that provides an ROI via time savings is a candidate.

So many people are hobbyists coders and have no experience to reference. So they think small because that is their domain of knowledge.

In summary, a dummy will struggle for useful ideas.

Llms for coding are immense; and integrated into all enterprise level IDEs.

Llms for template based text or image processing, including meta tagging, classification , and OICR.

Llms for automating non deterministic processes. Monitored by deterministic algorithms.

There is so much.

[-]

CorpusculantCortex@reddit

I have deployed a local 14b model as sentiment/ content summarizer in a local self hosted microservice used by a bot that scrapes select news sources and logs the summaries to a rag I use in another bot that ultimately makes me money. It's not for my job or primary income, but it is actually perpetually doing work for roi.

[-]

hmsenterprise@reddit (OP)

That's pretty cool. What does the other bot do? Is that running on the same machine? Also, is that machine separate than your daily-use PC/laptop?

[-]

CorpusculantCortex@reddit

I would rather not say specifically, but there are a few use cases for the auto populated rag set up. It all runs on the same machine, I run Ubuntu and all components are systemd services that I have set up or dag orchestration depending on frequency of operation. Though the other bot feeds the rag to a cloud llm because the final step is benefitted by power I dont have locally, but all the rag and summarization helps reduce costs by limiting tokens that get sent with my structured prompt so cloud costs are barely anything per month.

Whether the system is separate than my daily driver is... complicated to answer, I have 4 computers on my desk. The tldr is it runs on my most utilized personal system. The longer explanation is: My most used one is my work laptop, that is mostly a client for cloud services/ servers though. I have two pcs the Ubuntu machine (newer higher powered) technically dual boots windows but i rarely boot outside windows, that is the personal machine I use most throughout the day for random tasks as i have a small portable monitor that i can run on the side. My other machine typically is running windows, it's my older system, but also dual boots Ubuntu. That one I use for gaming and photo/design stuff because Adobe doesnt like Linux. I use it somewhat less frequently, mostly to steam link into play games from my phone, or periodic photo download and editing. And I have my laptop/tablet which sits idle unless I have a specific project, but use as a client to ssh into my Ubuntu machine if i am working on a code based project, or (it is a surface pro), so I use it for the stylus if needed for design stuff.

[-]

NoWorking8412@reddit

I find local LLMs to be useful for data analysis related tasks when dealing with sensitive data.

[-]

kc858@reddit

I us NuExtract-2.0-8B to extract structured data from unstructured data, at least weekly.

[-]

sunkencity999@reddit

All of the time. They're great for solving local IT networking and support issues.

[-]

Academic-Air7112@reddit

Yep, we use "local" LLMs to write some of our own systems code for research.

[-]

false79@reddit

Yep your wrong here. I'm using my setup to write boilerplate code, add incremental features, documentation, grouping files for commits.

This technology is very capable. But you have to have a self awareness of what you do most often day to day and breakdown that problem that can be automated.

With that self-awareness is coming to the realization, you don't need triple digit LLMs to do your work if you can give it sufficient context to do the task.

Then it just frees up time to do more important things.

[-]

hmsenterprise@reddit (OP)

Ok how do you switch to local models for just those tasks? Do you have to cognitively evaluate the nature of every task you take on and whether your setup can handle it well?

I have experimented with local models for simple coding tasks (e.g., boilerplate or adding logs to a file or whatever) -- but even just the cognitive load of switching wasn't worth the 10 cents or whatever I'd save.

[-]

I-cant_even@reddit

It is also feasible to build a workflow that has a small fast model evaluate the requirements of the request and route to the appropriate model.

[-]

false79@reddit

The evaluation is near none. I have VS Code + Cline setup. In my prompt I say refer to these files, do it this way. And like 95% of the time, it does it.

Cline is advertised to hit any of the paid models but its also capable of hitting any Open AI compatible webservice.

These things are write to directly to the code base where (if you setup properly) performing linting, run tests, fix imports, try to take it's best guess why the code won't compile, and make an attempt to resolve it before human intervention.

[-]

SkyFeistyLlama8@reddit

Automatically generating commit messages has been a lot of fun. I can write code most days with enough coffee but communicating what I've done is something else entirely. I'm glad to throw it to Qwen or Devstral to help me out.

[-]

Savantskie1@reddit

I am learning this very slowly and have to stop myself from just assuming that the model can just understand my intent. Because if I don’t express what I want every model usually misunderstands and takes a path I didn’t want.

[-]

Borkato@reddit

Have you tried aider? I just started with it and it’s amazing

[-]

Pvt_Twinkietoes@reddit

grouping files for commits

Oh that's actually quite interesting. How do you go about doing that?

[-]

false79@reddit

Do you use Cline workflows? If not read up on it.

/Prep-commit-files-and-commit-message.md

Ask the User what is the commit message prefix to be used. It's typically the ticket number.


Review @git-changes.


Categorize them into groups.
For each group generate, generate a relevant commit message prefixed with the commit prefix.

For example:

ENG-123 Updated the navigation routes

Below the commit message name will be a list of what files need to be part of that commit.

If there is leftover files that are not a part of a group, then treat each one of those as a group of one. Generate the relevant commit message prefixed with the commit prefix. Also prepare the list of one below that commit message.


Once all of the above has been completed, review how many files were in @git-changes and how many have been allocated to group.


If there are any remaining files that don't have a commit grouping + commit message prepared, repeat the process again up to 3 times.

[-]

Pvt_Twinkietoes@reddit

Oh this is pretty cool. Thanks will check out.

[-]

I-cant_even@reddit

I probably am on the edge of what we consider a 'local' system (4x 3090s) but I'm using quantized versions of Llama 3 R1 Distill 70B and GLM 4.6 Air amongst others to great success. The system I've built is designed to handle some of the nuances of working with 'dumber' models but will be able to take advantage of any model, the quality of the final product varies by model quality but commercial viability is there.