What are the best collection of small models to run on 8gb ram?

Posted by Adventurous-Gold6413@reddit | LocalLLaMA | View on Reddit | 24 comments

Preferably different models for different use cases.

Coding (python, Java, html, js, css)

Math

Language (translation / learning)

Emotional support / therapy- like

Conversational

General knowledge

Instruction following

Image analysis/ vision

Creative writing / world building

RAG

Thanks in advance!

[-]

OlegDoDo@reddit

Depends on your use case. For general chat and document Q&A — gemma3:4b (Google, 3.3GB) is probably the best right now, 128K context and runs comfortably on 8GB. qwen2.5:7b technically needs 16GB but I've seen people run it tight on 8GB with nothing else open.

If you want no content restrictions for medical or research work — dolphin3:8b fits in 8GB too, 4.9GB download.

All three run fine on Ollama, no GPU required. CPU-only on a regular laptop — expect 10–30 sec responses but it works.

[-]

SubstantialQuit7139@reddit

8GB VRAM or RAM?

[-]

NegotiationNo1504@reddit

RAM

[-]

Accurate_Reach4980@reddit

Has anyone tried the liquidAI models?

[-]

Psychological_Cry135@reddit

There is no single “best” local LLM. The answer is always contextual and depends to the JobsToBeDone

You can check here:

https://datasapien.com/datasapien-lab-reprwhats-the-best-local-llm/

[-]

Positive-Advance4341@reddit

Cool, what are your top 3 for 8gb ram?

[-]

ikaganacar@reddit

maybe qwen3 1.7b 4b 7b

[-]

ThroatExciting6203@reddit

honestly qwen 2.5 coder 1.5b and 7b are absolute beasts for coding tasks - way better than regular qwen for anything programming related. for math id throw in deepseek math 7b, that thing is scary good at problem solving

phi 3.5 mini is solid for general knowledge and instruction following, runs smooth on 8gb. for creative writing llama 3.1 8b is still king but you might need to quantize it pretty hard

gemma 2 2b is surprisingly decent for conversational stuff and emotional support type things. translation wise nllb models are purpose built but kinda niche - might be worth checking out aya 8b for multilingual stuff

vision models are gonna eat your ram fast but moondream2 1.7b can squeeze in there for basic image analysis

[-]

galjoal2@reddit

Can you give more details about these Code models? How good are they really? How did you test them? Can you give an example?

[-]

crantob@reddit

More than a few people prefer qwen 2.5 coder over later models.

good or bad varies a lot by use-patterns and problem domain

[-]

Western_Bread6931@reddit

i use Incontinence-Mistake-Wendys-Frosty-Consequence-Nightmare-Q8.gguf

[-]

crantob@reddit

Incontinence-Mistake-Wendys-Frosty-Consequence-Nightmare-Q8.gguf

I searched for this and couldn't find it. Am I on some kind of list now? :)

[-]

Specialist_Hand6352@reddit

nanbeige4-3b

[-]

Transcontinenta1@reddit

How do you guys fancy the new liquidAI model?

[-]

synw_@reddit

Qwen 4b: tool calls, general
Lfm 8b a3b: general, very fast
Lfm 1.2b thinking: general, ultra fast
Granite tiny: long context
Gemma 4b: general, writing
Gemma translate 4b: translations
Nanbeige 4b: thinking, summarizing
Vision: Ministral 3b, Qwen vl 4b

And if you have a bit of vram and some ram use Qwen 30b a3b and Qwen coder 30b a3b, Nemotron 30b a3b, Glm Flash

[-]

Background-Ad-5398@reddit

for writing and conversation its hard to beat nemo 12b finetunes, q4km is what you will want for 8gb vram

[-]

cibernox@reddit

The best small model for me is still qwen3-instruct-2507 4B. For its size, IMO, it still hasn't been surpased.

You also have qwen3-VL 4B and 8B, but unless vision is a requirement, I still find qwen3-instruct-2507 better at most things.

One that deserves an honorary mention is LFM2.5-1.2B-Instruct. I think it has 8B total parameters but only 1.2 active parameters and it flies while being quite capable. Something this good running at over 200tk/s even in modest hardware is amazing to see.

[-]