Best Use Cases for Small LLMs

Posted by XhoniShollaj@reddit | LocalLLaMA | View on Reddit | 26 comments

Would love to see what the community has been working on and share their experience or use case for small LLMs or VLMs (1b-7b models).

[-]

hagottemnuts@reddit

Download them for the apocalypse so that when WiFi and data goes down, you survive using the small LLM powered by a solar panel, to rebuild civilization.

[-]

FineCradle@reddit

Looking at these applications make me question the need for small models

[-]

OnyxOrator@reddit

Summarization, grammar correction, writing style changes and formatting, basic code completion

[-]

I actually tried to use it as a spelling and grammar checker with llama 1B, I was hoping that for something so simple there wouldn't be much difference between models, but after some tests it worked noticeably worse than with 70B. I wouldn't trust it enough not to monitor it closely.

[-]

Apart_Boat9666@reddit

To be honest, you can rewrite code with minor changes, add comments, and organize code. In a small context like a function, they work absolutely fine. Using the qwen model, they seem to be really great at it.

[-]

XhoniShollaj@reddit (OP)

Thank you for the input!

[-]

brotie@reddit

Tasks in general. No reason to call openai to ask it to summarize the name of a chat for the sidebar, generate an optimal search query to pass to a rag function or basic tts etc cut out the latency and pay nothing

[-]

yukiarimo@reddit

Not the model size matter, but its user

[-]

Temp_Placeholder@reddit

Rewriting (everyone else's) reddit posts so they have better grammar and aren't full of asshole.

[-]

Derefringence@reddit

Email rewriting, translation, spell checking, simple data organization

[-]

dreamfoilcreations@reddit

I've created a app for data extraction / validation of large amount of files, it can extract information that might be hard using parsers because the content of the files isn't well structured. (using 7b model).

[-]

Namarrus@reddit

Which RAG approach do you use and how does your Similarity Search find exactly the right content with such a large number of texts?

[-]

dreamfoilcreations@reddit

The files I use fit the context, but for long files, I process in chunks. Usually, the information is close.

[-]

GTHell@reddit

I can extract NER, detect language and sentiment with a 0.5b model into a JSON structure format.

That’s the best use cases for me so far as I dont need to fine tune BERT and it work on multilingual language.

I was using Qwen2.5 0.5b instruct by the way.

[-]

_donau_@reddit

Converting weird dates to iso format when dateparser couldn't handle the heat

[-]

synw_@reddit

Onboarding colleagues with no gpu in local ai with 0.5 to 3b models that work even on an old potato laptop.

We now have many small models that can be efficient in an area or another, which was not the case like 6 month ago. I use different models depending on the task: summarization, translation, chat with documentation or article, code gen

[-]

msbeaute00000001@reddit

What size of model are you using for translation and code gen?

[-]

synw_@reddit

Code: Qwen coder 2.5 in 3b or 1.5b for the speed
Translation: it depends on the language: Granite 3 2b for english to french, Qwen 3b for russian or chinese to english

These can run in a no gpu environment. Like everyone here I use bigger ones on my gpu for more complex code tasks or precise translations, but I tend to use the small ones more and more for their speed, playing nice in my everyday workflow

[-]

ThinkExtension2328@reddit

It’s not the size that counts but how you use it

[-]

XhoniShollaj@reddit (OP)

Lol fair observation 🤣

[-]

matt23458798@reddit

Many different use cases, specifically simple daily tasks

[-]

XhoniShollaj@reddit (OP)

Care to elaborate? Curious how you apply them, and if they reliable enough for your use case.