What are the best collection of small models to run on 8gb ram?
Posted by Adventurous-Gold6413@reddit | LocalLLaMA | View on Reddit | 24 comments
Preferably different models for different use cases.
Coding (python, Java, html, js, css)
Math
Language (translation / learning)
Emotional support / therapy- like
Conversational
General knowledge
Instruction following
Image analysis/ vision
Creative writing / world building
RAG
Thanks in advance!
OlegDoDo@reddit
Depends on your use case. For general chat and document Q&A — gemma3:4b (Google, 3.3GB) is probably the best right now, 128K context and runs comfortably on 8GB. qwen2.5:7b technically needs 16GB but I've seen people run it tight on 8GB with nothing else open.
If you want no content restrictions for medical or research work — dolphin3:8b fits in 8GB too, 4.9GB download.
All three run fine on Ollama, no GPU required. CPU-only on a regular laptop — expect 10–30 sec responses but it works.
SubstantialQuit7139@reddit
8GB VRAM or RAM?
NegotiationNo1504@reddit
RAM
Accurate_Reach4980@reddit
Has anyone tried the liquidAI models?
Psychological_Cry135@reddit
There is no single “best” local LLM. The answer is always contextual and depends to the JobsToBeDone
You can check here:
https://datasapien.com/datasapien-lab-reprwhats-the-best-local-llm/
Positive-Advance4341@reddit
Cool, what are your top 3 for 8gb ram?
ikaganacar@reddit
maybe qwen3 1.7b 4b 7b
ThroatExciting6203@reddit
honestly qwen 2.5 coder 1.5b and 7b are absolute beasts for coding tasks - way better than regular qwen for anything programming related. for math id throw in deepseek math 7b, that thing is scary good at problem solving
phi 3.5 mini is solid for general knowledge and instruction following, runs smooth on 8gb. for creative writing llama 3.1 8b is still king but you might need to quantize it pretty hard
gemma 2 2b is surprisingly decent for conversational stuff and emotional support type things. translation wise nllb models are purpose built but kinda niche - might be worth checking out aya 8b for multilingual stuff
vision models are gonna eat your ram fast but moondream2 1.7b can squeeze in there for basic image analysis
galjoal2@reddit
Can you give more details about these Code models? How good are they really? How did you test them? Can you give an example?
crantob@reddit
More than a few people prefer qwen 2.5 coder over later models.
good or bad varies a lot by use-patterns and problem domain
Western_Bread6931@reddit
i use Incontinence-Mistake-Wendys-Frosty-Consequence-Nightmare-Q8.gguf
crantob@reddit
I searched for this and couldn't find it. Am I on some kind of list now? :)
Specialist_Hand6352@reddit
nanbeige4-3b
Transcontinenta1@reddit
How do you guys fancy the new liquidAI model?
synw_@reddit
And if you have a bit of vram and some ram use Qwen 30b a3b and Qwen coder 30b a3b, Nemotron 30b a3b, Glm Flash
Background-Ad-5398@reddit
for writing and conversation its hard to beat nemo 12b finetunes, q4km is what you will want for 8gb vram
cibernox@reddit
The best small model for me is still qwen3-instruct-2507 4B. For its size, IMO, it still hasn't been surpased.
You also have qwen3-VL 4B and 8B, but unless vision is a requirement, I still find qwen3-instruct-2507 better at most things.
One that deserves an honorary mention is LFM2.5-1.2B-Instruct. I think it has 8B total parameters but only 1.2 active parameters and it flies while being quite capable. Something this good running at over 200tk/s even in modest hardware is amazing to see.
rainbyte@reddit
LFM2.5-1.2B is their new model, and the 8B one is LFM2-8B-A1B
cosimoiaia@reddit
Mistral-3-8b, GLM-4.7-flash Olmo-3-7b
Grouchy-Bed-7942@reddit
This guy tested different models for an 8GB VRAM GPU for the code: https://youtu.be/m3PQd11aI_c
Terrible_Aerie_9737@reddit
If you're using Windows, use LM Studio. It will pick the AI models that will run on your system and even allow you to make adjustments on how much if any of the AI you want to offload to the CPU.
Ztoxed@reddit
following
Reasonable_Listen888@reddit
The Qwen 2.5 500M is a 360MB model; it's not that smart, but it runs everywhere.
SlowFail2433@reddit
The small Qwens are fine still