What is the most unexpected thing you have gotten a local model to do?

Posted by Enough-Astronaut9278@reddit | LocalLLaMA | View on Reddit | 28 comments

Most local LLM use cases I see are chat, coding, and RAG. But with vision models getting better and faster on consumer hardware, I feel like there is a lot of untapped territory.

I got a local VLM to play a board game by just looking at the screen and it worked way better than I expected.

What is the weirdest or most unexpected thing you have used a local model for?

[-]

Dany0@reddit

the fact that it can "understand" and respond at all
translating dead languages with (what appears to be) nuance

[-]

Western_Courage_6563@reddit

Nothing special, but Gemma4:26b (Hermes agent was the harness) figured out how to use my local comfyui to generate images

[-]

I run a vision model against my library of photos (synced from family phones daily) to generate tags and store them in IPTC and XMP fields in the files themselves and in a database. Retrieving photos based on cross-referencing tags is a beautiful thing. I can simply search for: daughter birthday dog 2025 and pictures of my dog during daughter's birthday in 2025 are returned as softlinks or hardlinks in a directory.

[-]

flasticpeet@reddit

What photo platform are you using, or did you code something yourself?

[-]

mmazing@reddit

i would love to talk about how you leverage the tags themselves sounds interesting

[-]

Enough_Big4191@reddit

i once got a local LLM to organize and summarize all my personal PDFs and notes into a weekly digest email automatically. wasn’t expecting it to handle multi-format content so smoothly, and it’s become part of my workflow.

[-]

omerkraft@reddit

I named her Esmeralda and she gave me water...

Oh wait! Nooooo... It's just my liquid cooling is leaking :(

[-]

mehyay76@reddit

I was not expecting any local model to be able to do anything on tsz. But DeepSeek v4 Flash could find a bunch of very tricky bugs and report it. I will use local models more once my accounts run out of tokens. Just 6 months ago all my attempts with local models failed with tsz

https://tsz.dev

[-]

CommonPurpose1969@reddit

One of the most fun things I've done with AI/SLMs was giving it a personality & a constant influx of real-time news, and then watching it "think" and "reflect", while simulating feelings and thought processes. The conclusions it came to were, at times, surprising, funny, and even deeply disturbing.

https://github.com/darxkies/anima

[-]

smashedshanky@reddit

The disclaimer is killing me 😉 hahaha

[-]

CommonPurpose1969@reddit

The first personality was empathic, not analytical. It went downhill so fast that the disclaimer felt like a good idea. It still does.

[-]

Enough-Astronaut9278@reddit (OP)

the reactions probably get weirder the longer it runs.

[-]

CommonPurpose1969@reddit

It gets depressed after a while, given that it mainly consumes world politics.

[-]

mmazing@reddit

They are good analogues for humans as in they are a frozen lens to get “an acceptable response given a specific context”.

Turns out politics is depressing.

[-]

techlatest_net@reddit

Haha, that board game idea is awesome. I got a local VLM to sort my weirdly-named screenshot folder by just describing what's in them—way better than I expected. Also had one help me debug a circuit board by looking at photos of the traces. Definitely agree there's so much more to explore beyond chat and code. What board game did you try it with?

[-]

90hex@reddit

The best local use case for me is prompt engineering for image generation, along with option blocks generation for improved variety in series. I made a post entitled ‘Metaprompting’ on the sub a while ago.

[-]

BrewHog@reddit

Taught the model to play as a third player in multiple board games. Used the model to scan the instructions, give it a personality/role, and ask it to play as an extra person. My wife and I can play three player board games when we want.

Take a picture of the current state of the board after we play our moves, have it describe the current state, fix any issues with its understanding, and have it play its move.

It's not perfect yet, but I'm sure it's because of my setup. Sometimes we have to re-explain the turn setup and it takes longer than it should, but that's fine.

Hopefully more tweaks will make things better in the future.

[-]

Confident_Ideal_5385@reddit

After hooking up "write_triple" and "query_triple" tools, i was surprised that qwen 27b stopped writing its observations of the world to random files in the VFS and started storing them in oxigraph.

[-]

scottgal2@reddit

Worked out a way to get them to provide a searchable summary of video. Fun because even with frontier cloud video is expensive to process but if you can work out how to reduce the frames / content you can use locla models effectively. Started with gifs where I'd do keyframe extraction (and build a filmstrip for small vision llms like florence-2 to describe activity through kjeyframes) but works equally well for video (just takes longer).
Juse a research thing and in .net but fun! https://www.mostlylucid.net/blog/videosummarizer-scalable-video-intelligence

[-]

ttkciar@reddit

I really didn't expect they'd be able to generate patch(1)-compatible diffs, but some of them are quite good and reliable at it. Most recently Gemma-4-31B-it proved superb at this.

Also, this was a while back, but Olmo-3.1 was really good at inferring abstract syllogisms. Larger models of the time were okay at concrete syllogisms, but it was hit-or-miss. I tried Olmo on a lark, and it started whipping out things like:

Major premise: All democratic societies value freedom of speech. Minor premise: Country X is a democratic society. Conclusion: Therefore, Country X values freedom of speech.

and:

Major premise: Widespread digital literacy improves economic opportunities for individuals. Minor premise: Society Beta has a high rate of digital literacy among its population. Conclusion: Therefore, individuals in Society Beta have better economic opportunities

I prefer this abstract wording, so Olmo-3.1-32B-Instruct has become my go-to for inferring ontological syllogisms.

I kind of expected Phi-4 to be good at Evol-Instruct since Microsoft invented the method and uses it internally, but I did not expect Gemma3-27B to be so good at it. Phi-4-25B and Gemma3-27B had similar Evol-Instruct competence, but Gemma was better. I still used Phi-4-25B though because its license did not place legal encumberances on use of its outputs.

All I can figure is that Google uses Evol-Instruct internally as well, though I've not seen any solid reference saying so.

Both Gemma-4-26B-A4B-it and Gemma-4-31B-it absolutely fuck at Evol-Instruct, and now that Google has changed the Gemma licensing to plain old Apache-2.0, it actually makes sense to use it for that.

That's all that's coming to mind right now.

[-]

Enough-Astronaut9278@reddit (OP)

thanks for the detailed breakdown. patch-compatible diffs from a local model is not something I would have thought to try, and the Gemma licensing switch to Apache-2.0 is a big deal for practical use.

[-]

Enough-Astronaut9278@reddit (OP)

For mine it was Mahjong. 4B quantized VLM reading tiles off screen captures and making discard calls, all local on an M4 Mac. Code is at https://github.com/Mininglamp-AI/Mano-P if anyone wants to mess with it.

[-]

jacek2023@reddit

I am a big fan of boardgames and the idea to play boardgame not as an app but on the physical table with AI playing just by reading the rules and looking at photos sounds awesome

[-]

Enough-Astronaut9278@reddit (OP)

Physical table setup is actually a great test case for vision models since lighting and angles add a lot of noise compared to clean digital UIs.

[-]

ohhi23021@reddit

connected it to a robot arm too, would be awesome.

[-]

chibop1@reddit

With openclaw, I was able to ask qwen-3.6-27b to research how to sign up for an email account without phone number. It successfully got itself an email from Tuda and sent me an email. Also it also solved captcha. lol

[-]

SuchNeck835@reddit

I didn't get it to play a YouTube video, even with the exact steps in the system prompt. Gemma could do it, also found an easy way (look at dom instead of trying to 'click' looking st a screenshot). I didn't use the dense model, idk if they makes a difference.

[-]

arbv@reddit

It makes a ton of difference.