Someone recently ran an LLM on a 1998 model iMac with 32 MB of RAM. How did you push this boundary and found an usable LLM that also scales well on CPU?
Posted by last_llm_standing@reddit | LocalLLaMA | View on Reddit | 13 comments
Which SLM has proven to give the most throughput, does decent reasoning, and can run fast on a 16/32GB RAM machine based on your experiments?
Suitable_Annual5367@reddit
Isn't Bitnet trying to solve this?
last_llm_standing@reddit (OP)
yeah but nowhere usefull unfortunately. Bonsai is getting attention now
pmttyji@reddit
If you're talking about speed, Ling-mini-2.0 gave me best t/s(50+) on CPU-only inference. I'm still waiting for updated version of this model from inclusionAI.
bailingmoe - Ling(17B) models' speed is better now
last_llm_standing@reddit (OP)
nice! was it a quant version?
pmttyji@reddit
Yes, Q4. That link has full details
last_llm_standing@reddit (OP)
thank you kind sir
TyrKiyote@reddit
This is a shotgun of a post.
There are some very small models that will run on cpu. Here is a list produced by opus.
Good options for CPU-only character RP at small sizes:
\~1-3B range (most practical):
TinyLlama 1.1B — surprisingly coherent for its size, lots of fine-tunes available
Phi-2 (2.7B) and Phi-3 Mini (3.8B) — punches well above weight class due to training data quality
Gemma 2 2B — Google's small model, solid instruction following
Qwen2.5 1.5B / 3B — strong for size, good multilingual bonus
SmolLM2 1.7B — Hugging Face's entry, designed explicitly for on-device
Sub-1B (if the CPU is really slow):
Qwen2.5 0.5B — best-in-class at this tiny size
SmolLM 135M / 360M — functional but you'll feel the quality drop hard
BagelRedditAccountII@reddit
All of these models are pretty ancient. We are already on Qwen 3.5 (smallest = 0.6B) and Gemma 4 (smallest = 2B), with the older Embedding Gemma coming in at 308M.
However, this should be prefaced with the fact that the usefulness of ultra-small LLMs is very much dependent on deployment. Namely, what is the scope of its responsibilities? What harnessing is in place for the LLM?
Ok-Type-7663@reddit
*Qwen3.5 (smallest = 0.8B): You mixed previous gen with new gen
BagelRedditAccountII@reddit
Thanks for pointing it out!
last_llm_standing@reddit (OP)
how in the world did it miss LFM models?
Ok-Type-7663@reddit
2023-2024 aah ancient models
last_llm_standing@reddit (OP)
LFM? they released newer ones recently including a thinking/reasoning one.