Someone recently ran an LLM on a 1998 model iMac with 32 MB of RAM. How did you push this boundary and found an usable LLM that also scales well on CPU?

[-]

last_llm_standing@reddit (OP)

yeah but nowhere usefull unfortunately. Bonsai is getting attention now

[-]

pmttyji@reddit

If you're talking about speed, Ling-mini-2.0 gave me best t/s(50+) on CPU-only inference. I'm still waiting for updated version of this model from inclusionAI.

bailingmoe - Ling(17B) models' speed is better now

[-]

TyrKiyote@reddit

This is a shotgun of a post.

There are some very small models that will run on cpu. Here is a list produced by opus.

Good options for CPU-only character RP at small sizes:

\~1-3B range (most practical):

TinyLlama 1.1B — surprisingly coherent for its size, lots of fine-tunes available

Phi-2 (2.7B) and Phi-3 Mini (3.8B) — punches well above weight class due to training data quality

Gemma 2 2B — Google's small model, solid instruction following

Qwen2.5 1.5B / 3B — strong for size, good multilingual bonus

SmolLM2 1.7B — Hugging Face's entry, designed explicitly for on-device

Sub-1B (if the CPU is really slow):

Qwen2.5 0.5B — best-in-class at this tiny size

SmolLM 135M / 360M — functional but you'll feel the quality drop hard

[-]

BagelRedditAccountII@reddit

All of these models are pretty ancient. We are already on Qwen 3.5 (smallest = 0.6B) and Gemma 4 (smallest = 2B), with the older Embedding Gemma coming in at 308M.

However, this should be prefaced with the fact that the usefulness of ultra-small LLMs is very much dependent on deployment. Namely, what is the scope of its responsibilities? What harnessing is in place for the LLM?

[-]

Ok-Type-7663@reddit

*Qwen3.5 (smallest = 0.8B): You mixed previous gen with new gen

[-]

BagelRedditAccountII@reddit

Thanks for pointing it out!

[-]

last_llm_standing@reddit (OP)

how in the world did it miss LFM models?

[-]

Ok-Type-7663@reddit

2023-2024 aah ancient models

[-]

last_llm_standing@reddit (OP)

LFM? they released newer ones recently including a thinking/reasoning one.