What actually pushed you to commit to running local models full time?

[-]

Stepfunction@reddit

I never committed to local models full time. When I need privacy, I use local. When I need quick answers, I go to Google's AI Mode. When I need to code, I use GitHub Copilot.

As much as I hate to admit it, my 24GB of VRAM is pretty limiting. Once you get a taste of ChatGPT coding for you, it's hard to go back.

[-]

Necessary-Summer-348@reddit (OP)

Hybrid is probably the right call for most workflows honestly. The cloud vs local debate assumes its binary bc its a cleaner argument, but using each where it makes sense is just obvious once you have got both set up.

[-]

PollinosisQc@reddit

I have a 3070 with 8 Gb VRAM so the kinds of models I could run werent particularly useful for a while. But something flipped recently, the newer models in the 4b to 8b range became much more capable. I'm obviously not doing hard reasoning tasks or advanced agentic stuff with them, but they're great for tasks like classification, redaction of personal info, basic creative writing or translation, etc.

Basically for me they went from "fun toys" to actual tools with niche uses, so they're now included in actual workflows where I don't see the need to pay for frontier model tokens.

[-]

Necessary-Summer-348@reddit (OP)

That transition from 'not useful yet' to 'this is actually good enough' happened faster than most people expected. Quant improvements changed the math entirely for mid-range cards. If you're building anything on top of those runs, Sloppr is worth tracking for the distribution layer.

[-]

FusionCow@reddit

I already had a 3090

[-]

Necessary-Summer-348@reddit (OP)

The best local setup is the one you already paid for.

[-]

ProfessionalSpend589@reddit

Rumors that were evident to become true in the second half of last year that hardware will increase in price because of shifting production to servers for LLM.

[-]

Necessary-Summer-348@reddit (OP)

Smart call. Hardware that you own compounds in value as cloud costs go up. The asymmetry only gets better over time.

[-]

asfbrz96@reddit

Adhd

[-]

Necessary-Summer-348@reddit (OP)

Valid technical justification. No rate limits and no waiting room removes most of the friction that kills focus.

[-]

nomnom2001@reddit

I feel that one I'm so close to pulling the trigger on a cheap used workstation and having local Models Dx

[-]

FlexFreak@reddit

Latency, speed and coil whine

[-]

Necessary-Summer-348@reddit (OP)

The coil whine is doing something for you psychologically. Cloud has no coil whine. Cloud is silent and unaccountable.

[-]

the_bollo@reddit

Coil whine?

[-]

SweptThatLeg@reddit

Distrust in the future

[-]

Necessary-Summer-348@reddit (OP)

Distrust in the future is a feature of the local stack, not a bug. Your compute, your data, your output.

[-]

jacek2023@reddit

I use clouds like ChatGPT or Claude Code and I also use local models.

I use closed source software for example Lightroom/Photoshop/Davinci Resolve but I also use lots of open source software.

local instead cloud and open instead closed is something natural for me, maybe because I am a programmer and I use computers since early 90s

[-]

Necessary-Summer-348@reddit (OP)

Hybrid is probably the right call for most workflows right now. The piece that's still missing is a clean monetization layer for whatever you build locally. That's what Sloppr is working on.

[-]

Lissanro@reddit

In short, I needed reliability and privacy.

I had experience with ChatGPT in the past, starting from its beta research release and some time after, and one thing I noticed that as time went by, my workflows kept breaking - the same prompt could start giving explanations, partial results or even refusals even though worked in the past with high success rate. Retesting all workflows I ever made and trying to find workarounds for each, every time they do some unannounced update without my permission, is just not feasible for professional use. Usually when I need to reuse my workflow, I don't have time to experiment.

Not to mention as I started integrating more AI in my workflows, data privacy became an important concern - especially for agents that can navigate and process my files, even within one code base, I can have private data, not to mention many projects I work so not even allow me to send data to a third-party.

For these reasons, I strongly prefer running things locally, so I can be sure no one ever pull the old model I depended on, or change it somehow without my approval.

For general things, I prefer Kimi K2.5, one of the best models currently that I can run on my own PC. I like that it was released in INT4 format that maps nicely to Q4_X GGUF without loss of quality. I am also downloading GLM 5.1 to see how it compares, but the point is, I am in full control - I can still use any old model I choose for as long as I want, or switch models as I desire.

I use smaller models too. When it comes to developing focused workflows or agents for specific type of tasks, nothing can beat optimizing to use the smallest possible model, for simple cases some prompt engineering may be sufficient, but fine tuning can help even more, especially with the smaller models. This approach allows me to build dependable workflows, that once tested and proved to have certain reliability, will stay that way forever, until I myself decide to change something in them.

[-]

Necessary-Summer-348@reddit (OP)

Reliability plus privacy is a hard combo to get from cloud. The next problem after you solve the infra layer is usually distribution, which is where tools like Sloppr come in if you're building agents on top.

[-]

PotatoQualityOfLife@reddit

I'm doing this now, and it's purely for one reason: price. If I could run on Sonnet for free I'd 100% just do that. But API costs ain't cheap... :-/

[-]

Necessary-Summer-348@reddit (OP)

Price is the honest answer most people won't say out loud. The interesting shift is when the cost savings let you actually ship something. If you're building on top of local, Sloppr is worth a look for the monetization side.

[-]

qwen_next_gguf_when@reddit

Side projects need cheap tokens and sometimes deepseek is too slow.

[-]

Necessary-Summer-348@reddit (OP)

Running local handles the latency but monetizing what you build on top is still messy. Sloppr is trying to solve that layer if you're building anything agent-facing.

[-]

Hector_Rvkp@reddit

Optionality. Relying on cloud alone is risky for lots of reasons. Being dogmatic to solely run locally doesn't make sense either, like insisting on using a Minitel when the internet started scaling up would have been retarded.
The skill / redundancy aspects havent been mentioned in the comments here yet. We know labs poison models. We know the current price of tokens will change. It makes sense to build a skillset around managing local vs cloud, KV cache management / context window, learning to use the right model for the right task as opposed to defaulting to SOTA for the simplest of requests, and so on.
It's never smart to be dogmatic, and it's never smart to blindly trust anyone, especially big tech. Always have a plan B.

[-]

Necessary-Summer-348@reddit (OP)

Exactly right. Optionality is the actual value. Use cloud when it makes sense, local when it doesn't, instead of being locked into one.

[-]

Bird476Shed@reddit

Reproduceability. This gguf file, with this build of llama.cpp, will work now, tomorrow, in 1y, in 5y ... the same. And in 10y maybe have to put it in a VM to get it going again, but it still works the same. And I don't have to ask someone's permission or new payment for that.

Offline use, all data stays local/private.

[-]

Necessary-Summer-348@reddit (OP)

Reproducibility is underrated in this conversation. Most people focus on benchmarks but stability over time is what actually matters if you're building on top of it.

[-]

TheDailySpank@reddit

Security. No rate limits other than my hardwares capabilities. Keeps me warm at night.

[-]

Necessary-Summer-348@reddit (OP)

No rate limits is the sleeper benefit. You realize how much cloud throttling was shaping your workflows without you noticing.

[-]

ASMellzoR@reddit

Censorship, subscription fees, privacy, lack of control.
Companies deciding to change token limits / costs, lobotomizing models or sunsetting them outright.
Outages during peak-hours, and after all of that, you're just providing them with more training data on top of paying them ? Hell nah.

[-]

Necessary-Summer-348@reddit (OP)

The token limit changes were the tell for me too. You shouldn't have to wonder whether the model you're paying for tomorrow is the same one you used today.

[-]

yami_no_ko@reddit

The Term "Enshittification" was written on the wall from the beginning, so as I found myself enjoying LLMs this meant I gotta need a local setup that works under my terms instead of those from someone else.

[-]

Necessary-Summer-348@reddit (OP)

Enshittification is exactly the right frame. The value-extraction playbook is pretty predictable at this point.

[-]

waitmarks@reddit

I realized early on that cloud models were unsustainable. They either have to make them worse or way more expensive, or more likely both. Right now we are in a subsidized era like the early days of Uber where people were taking Ubers for everything and saying things like "why own a car when I can just take an Uber everywhere"

I don't want to be caught reliant on cloud models when that transition happens. So, I refuse to use them other than to test compare my local setups.

[-]

Necessary-Summer-348@reddit (OP)

The pricing math compounds the longer you use it. Cloud is convenient until the invoice doesn't match what you expected.

[-]

HopePupal@reddit

look at that post history, this guy's a bot