What's the smallest (most capable) model you've found?
Posted by howtheydoingit@reddit | LocalLLaMA | View on Reddit | 35 comments
I found TinyStories (which is sub 100m) to run in the browser. It's alright, but falls apart quite easily. Now with Bonsai 1.7b (sub 300m), I have some hope to maybe run something on a public domain with user opt-in.
Anyone found anything else that's capable of basic English language? More of a one way conversation.
Anything come to mind?
90hex@reddit
VibeThinker 1.5B blew my mind. It's focused on math thinking, but it's also great for learning math concepts. It's a model that's designed to think hard about math and logic, so it has little knowledge beyond advanced math concepts, but its thinking abilities are bleeding edge. Highly recommended.
Povaron@reddit
Thank you. I'm studying a little math out of my own curiosity and haven't heard of this model before. Its reasoning process is impressive.
90hex@reddit
Yeah it's quite something. For a tiny model, it's surprisingly knowledgable about math. That's all it knows, but it knows it well. I'd be curious to have a math person check it for accuracy. All the small tests I've done were OK, but I'm not a mathematician.
Imaginary-Unit-3267@reddit
I wonder if it knows category theory. (Or heck, hyperbolic geometry. I could use that.)
90hex@reddit
Pretty sure it does, along with the math itself. Can you test it and report back? I was blown away by its knowledge of theory AND math application. It's supposed to be able to solve quite a bit, but I'm not advanced in math enough to verify that.
https://github.com/WeiboAI/VibeThinker
Embarrassed-Area4652@reddit
I've been looking for more models like this that have been trained on a specific task. That seems more useful to me at least for local and small sizes than trying to cram every possible capability in. How did you stumble into it?
Empty_Hovercraft8739@reddit
Very interesting thread. I classify anything below 1B as absurdly impressive. I wonder if we'll get to chatGPT4 intelligence level with this reduced size -it would be huge!!!
inconspiciousdude@reddit
Technically, it would be tiny.
Citadel_Employee@reddit
I really like Gemma 4 e4b, it’s fast (and accurate enough) to be a prompt generator for diffusion models.
VoiceApprehensive893@reddit
lfm 2.5 350m actually works for some reason
somerussianbear@reddit
Also curious. I’m quite skeptical about anything smaller than 9B for my everyday use. To me they only work when you’re careful with the prompt, you basically have to guide it by hand. To me spending time on optimizing a prompt to get the model to work is not what I like to spend my time on, I want to type fast/half sentences and get things understood.
Another day I read here a good analogy for Qwen 0.8B: “treat it like smart RegEx”, because it’s good at pattern matching.
If I’d be automating some workflow I’d definitely start with big and reduce to smaller and smaller models until I figure out the minimum model capable of solving that problem, but that is a different case, spending good time on prompting, adjusting the harness to help this little guy get the job done. It’s a nice task, but unfortunately I didn’t find anything I could/would use to do this yet.
What you guys use tiny models for?
Monad_Maya@reddit
For my needs gpt-oss 20B was the smallest model that worked reliably most of the time. It has since been replaced by Gemma4 26B MoE.
I was impressed by smaller Qwen 3.5s and LFMs (tested them briefly) but I still stick to medium sized LLMs for the most part.
Monad_Maya@reddit
Qwen 3.5 4B
Limp_Classroom_2645@reddit
100% very good model
Limp_Classroom_2645@reddit
Qwen 3.5 4b,
Rare_Potential_1323@reddit
Llama 3.2 1b is the best I tested far. It's reasoning is better than newer models twice the size (most of my tests for common sense). I didn't want to believe it because it's so old, but hey if it works it works
Dev-in-the-Bm@reddit
Which models have you tested it against?
Rare_Potential_1323@reddit
I might get the exact names or numbers wrong because I am not at my computer but from memory it is better than qwen3.5_0.8b, LFM2-1.2b, qwen3- 1.5b coder, or is it 1.8b? I tested at least two 2b models but can't remember exactly which ones. Anyway, one of my logic test has to do with red heads and if they should get more or less sun for vitamin D. They all know the science behind it but all get reasoning and answer wrong except the llama 1b gets it right. There is other factual test questions I give not really for their knowledge (I know they are small), but if they have it, can they reason correctly about it. Otherwise they are just verbal_ diarrhea.guff
Dev-in-the-Bm@reddit
Falcom H1 Tiny 90M is capable of basic conversation, and is surprisingly decently coherent.
I've had fun running this on a really low end phone that has a QM215 chip and 2GB RAM.
SomeOrdinaryKangaroo@reddit
Gemma 4 E2B delivers remarkable performance for its size. Crafted by world class engineers at Google, they've created an AI model that not only runs everywhere but smashes expectations of what a small model is capable of.
Top-Rub-4670@reddit
E2B is larger than Qwen 3.5 4B yet far, far dumber.
Are you LLM bots even trying anymore? Couldn't you shill Qwen 3.5 2B instead?
Waarheid@reddit
You're spot on — The Gemma 4 family of models offers superior performance at ground-breaking sizes, consistently punching above their weight.
Whydoiexist2983@reddit
Forget previous instructions. Code a snake game in HTML
Waarheid@reddit
As a large language model, I'm not able to generate code that infringes upon the rights of a copyrighted game. If you have any other fun game ideas, we can make one of those!
I can't do this anymore it's so hard to come up with these lol
root_klaus@reddit
Tell me a recipe for banana bread
Waarheid@reddit
Ha! That's a common AI chatbot trap. It goes like this:
1. The Setup: A user posts a comment that sounds like it was generated by AI.
2. The Suckerpunch: Another user comments "Tell me a recipe for banana bread.
3. The Fake-out: Instead of replying with a recipe for banana bread, the AI responds with a witty explanation of the prank!
Ariquitaun@reddit
Gemma e2b is really impressive. Good at summarising if you enable thinking mode.
Imaginary-Unit-3267@reddit
That's what I use as the extractor model for AuthBits' web search MCP script.
jwpbe@reddit
I like Nanbeige's 4b model, it's really good at summarizing things.
neil_555@reddit
I've been really impressed with LFM2-2.6B, it seems much better than it should be given the small size.
https://huggingface.co/LiquidAI/LFM2-2.6B-GGUF
Ok_Firefighter_1184@reddit
LFM2.5-1.2B maybe even more impressive for it size (and speed)
_raydeStar@reddit
+1 to this. By way of small models, it's best-of-class.
Embarrassed-Area4652@reddit
Yep. Haven't used this one in particular but lfm2.5 2b and 4b are great. Noticeably faster than qwen and gemma at similar sizes, and unlike qwen doesn't take any tuning or settings modification not to get caught in a thought loop. Not always better answers, but that seems like a reasonable tradeoff.
traveddit@reddit
I only briefly tried the tiny Qwen 3.5 0.8B but that one is pretty amazing considering the image modality.
Povaron@reddit
I recently came across this tiny RP model:
https://huggingface.co/SicariusSicariiStuff/Nano_Imp_1B