It finally happened, I actually had a use case for a local LLM and it was brilliant

[-]

PassengerPigeon343@reddit

This is a great story and exactly why, even though I run some heavier models locally on my server at home, I always have small on-device models ready to go.

I haven’t had anything this extreme, but I have had a few occasions where I’ve been somewhere without any Internet access and been able to get some useful information.

[-]

reflectivecaviar@reddit

What do you use to run model locally on the go?

[-]

PurpleWinterDawn@reddit

I went it the hard way. Pure llama.cpp compiled from source under Termux. I run the Unsloth LFM2-8B-A1B Q4_K_M at 85pp/30tg, through the llama.cpp WebUI.

Only gripe I have for now is that I can't get the Hexagon NPU running by compiling it on-device.

[-]

Exciting_Variation56@reddit

Goddamn that is the hard way I love this

[-]

Gotta do what you gotta do to keep it local. I don't trust mobile apps that ride on a fad and are either snitching on me with telemetry or potentially stealing my bank credentials 🙃 Another advantage, updates are just a git pull and recompile away!

[-]

thunderboltspro@reddit

Im trying out Google AI edge gallery has Gemma 4. Haven’t played around with it too much.

[-]

PassengerPigeon343@reddit

Just realized you might have meant what app rather than what models. I’ve been using Locally AI and OnDevice-AI for my phone and tablet. LocallyAI is a little cleaner and seems to be well supported with new models coming relatively quickly after release.

[-]

PassengerPigeon343@reddit

Usually the small Gemma models which work surprisingly well, but I just downloaded Bonsai 8B and it’s pretty good so far!

[-]

AnticitizenPrime@reddit

I used offline Gemma on the long (14+ hour) flight to Japan to brush up on basic Japanese phrases. Very handy.

[-]

EntertainerFew2832@reddit (OP)

That's such a cool use case!

[-]

GamerHaste@reddit

You could try setting up telegram integration with your local models being hosted on a vLLM server or something so you can access your bigger models from your phone or laptop. I just recently set this up with Gemma 4 using LM studio/anythingLLM

[-]

PassengerPigeon343@reddit

I have Tailscale set up to access them from anywhere, but the on-device models are for instances where I have no internet access. Rare but it does happen.

[-]

GamerHaste@reddit

Ohh I see makes sense, that's cool!

[-]

SkyFeistyLlama8@reddit

With laptop charging being a thing on flights now, it's great being able to run heavier models like Qwen 35B or 80B on a plane for coding or to chat about a paper.

[-]

thavidu@reddit

What gpu/vram and system ram do you have that 26Ba4B fits on your laptop 😮

[-]

pas_possible@reddit

So cool, it's crazy to have such tiny (I mean it's not huge) model with that much knowledge baked in

[-]

NinjaOk2970@reddit

We used to think 100M params were huge

[-]

techno156@reddit

It's not like you'll ever need more than 3B parameters. Any more is simply excessive.

[-]

bnolsen@reddit

Billions == tiny. I think I want your salary.

[-]

pas_possible@reddit

I mean it's not deepseek kind of size, this is why I mean, it's not Gemma 270M but consumer available if you have enough money to pay for a good computer

[-]

FenderMoon@reddit

It's amazing how smart these models are.

I use them whenever I need medical advice sometimes too, simply because I don't love the idea of all of that being done on a cloud AI in case there were a data breach or something.

[-]

twi6@reddit

Resorting to AIs for medial advice is Darwin at work, I'd say. ;-)

[-]

toptier4093@reddit

As someone who has seem more doctors than I can count, AI is pretty damn good at doing what doctors aren't. The vast majority of doctors I've had appointments with had no clear answers for the debilitating health issues I was experiencing. They would just give me a folder about dealing with my symptoms, then proceed to tell me many of their patients were experiencing what I am. Zero intent on actually trying to figure out what was happening to me.

AI on the other hand has been able to give me various solid leads and ideas to help me figure out more about what I'm dealing with. It's not perfect, but a five minute chat with a good model gives me better info than a one hour appointment with most doctors.

[-]

Zhelgadis@reddit

What models do you use? Many refuse to give medical advice

[-]

DertekAn@reddit

I think it's sufficient to use uncensored models; in principle, anything is possible. Then good answers will come out of it

[-]

DertekAn@reddit

That's kind of interesting. Previously, large AI models were always overwhelmed by my symptoms and gave incorrect answers or even hallucinated, and I'm also someone who's seen more doctors than anything else in my life. I should take another look; maybe I'm getting some better answers now.

[-]

TheRealGentlefox@reddit

At this point not consulting an AI even when you see a human doctor is Darwin at work.

I did a lot of testing using case reports published within the last week (not in training data) and it blew my mind what gemini could solve.

[-]

throwaway2676@reddit

Resorting to overworked, jaded, and often mediocre doctors for medical advice is much more dangerous and stupid

[-]

ProfessionalSpend589@reddit

They do have a lot of knowledge baked in and can search through it based on vague description.

I don’t bother with reading the systemd manual for units, services and timers anymore. I need it maybe once a year and it’s a win to not waste half an hour combing through the dense manuals.

[-]

laapsaap@reddit

literally the ONLY time I use local LLM, is on a flight and feels powerful having so much knowledge without any connectivity.

I have qwen 3.5 and gemma 4 right now.

[-]

National_Meeting_749@reddit

I'm glad it helped. Truly though, this boils down to "I have a medical condition that is debilitating when it happens, was unprepared for it, hadn't seriously researched how to be prepared, and an LLM picked up the slack of me not being prepared."

If a simple technique like that was able to relieve your pain, you not already knowing it, being unprepared, was the problem. Especially a technique that's over 100 years old.

It could have easily told you something that made your pain twice as bad, you're lucky it didn't turn out poorly this time.

Please do not rely on LLM's for medical advice, they will eventually kill you

[-]

EntertainerFew2832@reddit (OP)

As I mentioned, I have had it a few times ever. Never badly. I treat all personal medical advice I get critically, from any source. Appreciate relying on LLMs for medical advice is dangerously in some instances, though I’d argue it’s helped me navigate the health system in the past to get better care and the right specialist. In this case, it isn’t - to my knowledge - possible to die from sipping water while holding your nose.

[-]

National_Meeting_749@reddit

You didn't treat that LLMs advice critically, as you did it with self admitted no verification of the information.

"navigating the medical system" and "getting medical advice from" are two ENTIRELY different things. It's far more useful for the first, though I still wouldn't trust it for either.

"To my knowledge", exactly. To YOUR knowledge. Unless you're a doctor, then you (and me) know next to nothing about how our bodies work. You don't know what could have went wrong, doing it now will make you more inclined to do it in the future. Eventually it will make a mistake that if you follow its advice will kill you.

[-]

pointer_to_null@reddit

"To my knowledge", exactly. To YOUR knowledge. Unless you're a doctor, then you (and me) know next to nothing about how our bodies work.

This is an overreaction. You're attacking OP's claims of critical analysis of a common remedy suggested by an LLM when all they did was hold their nose and drink water. Oh dear.

The advice was benign- otherwise we risk drowning every time we take a drink with a nasal blockage. If that requires critical analysis with verified, reputable medical opinions, then... I'm sorry?

Seem to be getting deja vu of a previous conversation. It's becoming a pattern: first it's those stuck in a desert, those lacking first aid skills, then those living in natural disaster/war zones, and now it's those with pressure sensitivity on airplanes. You're obviously prepared for all of life's challenges- never without internet, and either a medical doctor or always travel with one. I'm truly happy for you.

[-]

National_Meeting_749@reddit

Yes, I do often tell people not to follow LLMs medical advice, because it's a dangerous thing to do without proper knowledge.

I'm not prepared for everything, I'm prepared for everything likely, and specific to me, where I'm going, and what I'm doing. I have asthma, so when I travel, or do things that might trigger it, I keep an inhaler on my person, to not do so would be me being unprepared. When I go mountain biking, I bring a tourniquet and a handful of other useful items that will prepare me for the most likely bad outcomes. Including doing the research for the area as to local dangers.

I live in an area where hurricanes are a possibility, so when I moved here I made a hurricane prep bag and plan to evacuate.

I'm not prepared for the black mamba snake, because I don't live or go anywhere where they exist. I'm not prepared to be stranded at sea, because I haven't done anything where I have the possibility of being stranded at sea. I'm not prepared to identify poisonous/edible plants in the Amazon rainforest. Why? Because I'm not going to the Amazon rainforest, I don't live on the same continent and have no plans on going anywhere in the foreseeable future.

Being a mature, functioning adult and member of society does mean understanding the parts of the world you are likely to interact with, and preparing yourself to avoid the likely worst outcomes. That's just reality. That's just the way the world works. I'm sorry you've been taught anything different.

Failing to prepare is preparing to fail. Flat out. Full stop.

[-]

pointer_to_null@reddit

And yes, OP was not critical of that advice, he verified with exactly zero sources. He self admittedly had nothing to verify it against. That's not being critical. That's blindly following an LLMs advice, which is very dangerous.

To clarify, "critical thinking" implies human inference using simple logic and knowledge. It does not imply total dependence on expert knowledge; on the contrary, it means using available knowledge, tools available (including technology), and logic.

In this instance: LLM tells to try holding my nose and drinking water. Critical thinking brain goes, "Is this unsafe? (insert proof of safety). Then why not attempt this? At worst, I'll still be in the same pain I was before." The benefit/risk calculus becomes infinite, and I'd be stupid to ignore it. The proof of safety stems from knowing that past sinus infections that lasted more than 2 days haven't killed or hospitalized me from severe dehydration.

Your argument's weakness stems from verification and absolute credibility of sources. One can argue that internet searches are barely a step up from LLM outputs (possibly worse if you factor in SEO/ranking bias). It too requires a leap of faith, and plenty of people die from following "expert advice" from the internet. Hell, people have died following advice from MDs.

Like the internet, LLMs are just another tool. Rely on your brain as a filter.

[-]

National_Meeting_749@reddit

This is where your argument falls entirely apart. "At worst I'll be in the same pain as before" You don't understand how to be critical of information it seems, because how do you know that?

You don't, it's pure vibes. An answer popped up from the void of your mind, and you just believed it. The pain could get worse. You could rupture your inner ear. Nothing might happen. An aneurysm might pop.

The real thing is, at worst you die in a way you weren't expecting.

If you argue that info from like mayoclinic is the same level of reliability as an LLM... Then I'm never gonna convince you.

[-]

codeprimate@reddit

Medical advice from degreed professionals kills people every day as well.

The sentiment should be “think critically about ANY information provided by an LLM, it may very well be disastrously wrong”

[-]

kaeptnphlop@reddit

That goes for any information from the internet. Critical thinking is rare these days and a lot of people have never learned and trained to use critical thinking at all. And then people trust LLMs because they sound so darn confident. Which leads to the blanket statement of DON'T USE FOR MEDICAL ADVICE. I'm preaching to the choir here though for sure. :)

[-]

twnznz@reddit

I'm not sure this deserves downvotes.

Exercise reasonable caution using LLMs for medical advice, particularly because even if the model is good diagnostically, it relies on an untrained human for data collection. If it's bad diagnostically, well...

[-]

National_Meeting_749@reddit

Localllama doesn't like to hear anything negative about AI, and will downvote me when I say something negative about AI, in the same way that a bunch of other subs will down otw me if I say anything positively about AI.

[-]

Fair_Ad845@reddit

This is exactly the kind of story that makes local models worth the effort. I had a similar "aha" moment on a train through a tunnel — needed to quickly parse a JSON config for a deployment, no internet, and a local 7B model handled it perfectly.

The real takeaway isn't just "offline access" though. It's that these small models have compressed so much general knowledge into a few GB that they're essentially an offline encyclopedia + reasoning engine. The medical knowledge in Gemma 4 is surprisingly solid for a 31B model.

One tip for anyone who hasn't set this up yet: keep a Q4 quant of a strong small model (Gemma 4 12B or Qwen 2.5 7B) permanently loaded on your laptop. The overhead is minimal and you never know when you'll need it.

[-]

glenrhodes@reddit

The offline thing is underrated. You're on a plane or in a dead zone and suddenly local AI goes from a hobbyist toy to actually useful. I've had similar moments running a model for code review on a flight with no internet. Nothing like a real constraint to make you appreciate having it.

[-]

haberdasher42@reddit

Will that beats the shit out of the first time I had a aerosinusitis. I didn't know what the fuck was happening, thought I was going to die, turned off my phone lock and wrote a fucking will and goodbye thoughts. While I held back from screaming.

My phone was still handy, but much less so.

[-]

PurpleUpbeat2820@reddit

my wife didn't care

Ask AI what you should do about that too. ;-)

[-]

Long_comment_san@reddit

Ai helped me to find the allergy source in my room. I was going bananaz. It's been plaguing me for years.

Also helped me understand my reflux a lot better. It made a large impact.

[-]

gefahr@reddit

Well, don't keep us in suspense. What was it?

[-]

Long_comment_san@reddit

It was invisible mold in my room plant. A while ago I personally examined the ground but found nothing suspicious. I had a lot of suspects - a carpet, latex (mattress) allergy, lactose intolerance or some other food allergy, clothing (which turned out to be true as well - I did have an allergic reaction to a particular cotton item), an old painting (paint, mold or dust), ventilation, washing machine mold, geographical air quality... You know, it's really hard to diagnose something like that. Every third item can be allergy source.

[-]

FatheredPuma81@reddit

The moldy gym socks in his closet obviously.

[-]

Empty-Cake4502@reddit

This is a very meaningful use cases to use local LLMs!

[-]

jduartedj@reddit

This is honestly the killer use case people dont think about enough. I run a small homelab setup and keep llama.cpp with a few models on my laptop for exactly this reason, just having something available when theres no internet.

Had a similar moment camping last summer where I needed to figure out if a plant rash was something I should worry about. No signal obviously, but the local model gave me enough info to calm down and treat it properly.

The privacy angle is underrated too. I'd rather ask a local model about health stuff than have that sitting on OpenAI's servers forever lol

[-]

BringOutYaThrowaway@reddit

Hold on, your wife did not care that you found a quick solution to your pain? I’m sure, if the roles were reversed, she would be disappointed if you did that to her.

[-]

alitadrakes@reddit

I really just love gemma 26b, it’s perfect blend of outputs with speed. I’m using q6 on rtx 3090 and it gives me 30t speed. Amazing

[-]

-dysangel-@reddit

Sharing this here because my wife didn't care

this is a great way to sign off pretty much all posts on here

[-]

Own_Professional6525@reddit

This is a great example of where local LLMs quietly become truly useful in real life. Not flashy use cases, but immediate, practical impact when connectivity or resources are limited. This is the kind of value that will scale adoption over time.

[-]

xrvz@reddit

Gemma 4 E4B, Qwen 3.5 4B, the Bonsai thing and Apple Foundation were useless for this on my phone. Too bad.

And on the other end, ChatGPT 5.3/5.4 doesn't mention Toynbee either. It thinks your problem isn't aerosinusitis.

Both Gemma 4 26B and 31B mentioned it on the first try though.

I have a Macbook Air with 24GB RAM, which just barely doesn't fit these models easily at q4. Didn't necessarily want to upgrade yet, but you just gave me a reason.

[-]

CareAmbitious3233@reddit

I had an almost identical experience. I also developed aerosinusitis while on the flight. The difference is that I had not yet deployed a local LLM at that time, and I only received assistance after landing.

[-]

EntertainerFew2832@reddit (OP)

Sorry to hear that! It’s really hugely painful isn’t it?

[-]

CareAmbitious3233@reddit

yeah....
i have locla llm this time

[-]

Ok_Zookeepergame8714@reddit

Bad wife...😔

[-]

EntertainerFew2832@reddit (OP)

Worth clarifying my wife was concerned about my sinuses, just not about the specific version of Gemma I was running on my MacBook 😂

[-]

DeepOrangeSky@reddit

"Great, so The Terminator fixed your nose. Wake me up if it nukes everyone, bae."

[-]

EntertainerFew2832@reddit (OP)

26B Quant - so far I’ve been really impressed. I also think about the Wikipedia idea a lot, sort of blows my mind that the models can seemingly contain most of wiki while being smaller in overall size in pure storage terms.

[-]

Usr_name-checks-out@reddit

Yeah my wife and I have been working on this ourselves. We each give the other one big fake interest moment a day if needed, but can say, ‘ok I don’t really care’ after the first one if we want.

Like every time I’m super excited about figuring something out tech-wise, or her version which is some friend I don’t remember the name of has an issue with something I’ve never considered anyone would care about, and apparently it’s a huge deal, and I actively listen and offer her blind support and excitement.

But yeah, neither of us really retain any info. And honestly, she’s better at it than I am.

[-]

daynighttrade@reddit

That's so horrible. What kind of wife doesn't care about Gemma's version?

[-]

ProfessionalSpend589@reddit

just not about the specific version of Gemma

So, she’s team Qwen?

[-]

National_Meeting_749@reddit

As she should be. 😂😂

[-]

EntertainerFew2832@reddit (OP)

This cracked me up 👌

[-]

ArtyfacialIntelagent@reddit

Bad wife...😔

[-]

Ok_Zookeepergame8714@reddit

Good wife... 👍😜

[-]

DeepOrangeSky@reddit

To be fair, "Gemma" is definitely a super hot chick type of name. Maybe not as hot as "Eva" or "Sasha", but it's def pretty high up there. They should've named it Mildred or Gertrude or something. It's like they are trying to get everyone's wives to be as angry as possible.

[-]

c00pdwg@reddit

This would get 5 billion likes in a TikTok comment section

[-]

philmarcracken@reddit

And im sure at least half of those are actual people

[-]

IrisColt@reddit

heh

[-]

david_0_0@reddit

did you have to explain the Toynbee maneuver step by step or did Gemma get it right on the first prompt? seems like that's the kind of proprioceptive instruction that small models usually struggle with

[-]

EntertainerFew2832@reddit (OP)

In those terms it was actually probably better than I gave it credit in my first post - it suggested 2/3 ways of improving the pain and walked me through techniques on first reply. It then expanded as I described pain changing (eg on the descent). It was very good at understanding the specific cause (eg pressure reduction on descent) and explaining step by step how to equalise the pressure.

[-]

SkyFeistyLlama8@reddit

I have to do the Riker Maneuver every hour or so on a flight to keep my pants from riding up too far up my waist LOL.

[-]

ObsidianNix@reddit

Thats awesome! Keep medgemma on there too. Its been trained on more medical jargon than a regular llm and its pretty good for its time.

[-]

SkyFeistyLlama8@reddit

Medgemma 4B and 27B are great medical models but I'm not a doctor so I can't evaluate their responses for accuracy. That being said, these smaller models in the hands of a trained medical professional could help improve health care in more remote parts of the world.

[-]

EntertainerFew2832@reddit (OP)

Great tip!

[-]

AlwaysLateToThaParty@reddit

local is good. If you're putting medical questions into a cloud AI supplier, you haven't thought it through. There is absolutely zero reason

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

philiparxist@reddit

Local llms are great and are going to get greater with time. Soon we may not even need the remote ones. Secure, fast, accessible and getting better every day. I am all for locals

[-]

night0x63@reddit

If only they release 1,2,3,4,6 hundred billion parameter gemma. But not gonna happen.

[-]

TutorDry3089@reddit

Local LLMs definitely are super useful on flights or remote locations with no internet. Not to mention the privacy benefits and potentially models niche models.

[-]

MaCl0wSt@reddit

I haven’t had this exact scenario, which is honestly pretty cool (the pain part isn’t) but I relate a lot on the "finding real use cases" part. I also mostly just mess around with local LLMs and keep looking for something that actually sticks, I’ve only found 2 solid ones for me and my resources

One of the better use cases I found was with manga. I saved a few hundred entries with title, synopsis, and tags using the Jikan API, and I wanted to filter stories by specific vibes or archetypes. To avoid spending money on an API, I ran a local LLM statelessly over batches of 10 entries at a time, always with the same prompt and structured output. It would return the ones that matched, plus a short five word reason for each, and then my script compiled everything into a clean list (used gpt-oss-20b for this one on my gaming PC, if I do it again I’ll probably test it with qwen3.5 35ba3b)

More recently I’ve been using a small model on an old laptop acting as a server as the final judge in an RSS deduplication pipeline. An embedding model does the initial ranking, and anything clearly duplicate or clearly unique gets filtered automatically. the LLM only handles the gray area where things are too mixed to trust embeddings alone. Still tuning it, but after running a small custom benchmark over a few weeks with models from 0.5B to 4B, fun fact qwen3 1.7B q4_k_m without reasoning ended up being the best balance of quality and latency so far

Reasoning models (with reasoning enabled) of this size on semantic tasks like these tended to both increase latency and worsen quality outputs in my tests. gemma4 2b has incredibly high quality answers for this task but failed parsing the answers properly sometimes unlike the qwen model, but I’ve heard the llamacpp build was a bit buggy at the time of testing with it so I’m waiting a few weeks for it all to be stable to benchmark it again. I also tested some gemma3 models and they were surprisingly strong for the task, gemma3 1b for example was blazing fast and not too bad considering it spent about a third of the time qwen3 1.7b did, roughly a third of the time for around 15-20% less accuracy

so all in all, having fun xd

[-]

Legitimate-Pumpkin@reddit

Thanks for sharing! We got you bro!

[-]

Feztopia@reddit

Yeah the only problem is the bad advice they can give

[-]

mohelgamal@reddit

This is honestly a great use case for AI, just getting to learn about anything at anytime.

The media always focuses on the job replacement aspect, the super-intelligence aspect or whatever record frontier models are breaking, but 90% of the legitimate use is just to be able to offload small tasks done or get to know the basics about something you don’t know anything about and those are perfectly doable by local LLM.

[-]

EntertainerFew2832@reddit (OP)

No Wi-Fi in this flight for my ticket, even as a purchased add-on.

[-]

jgulla@reddit

Pretty cool use