ai model for 12 gb ram 3 gb vram gtx 1050

Posted by Ok-Type-7663@reddit | LocalLLaMA | View on Reddit | 17 comments

[gemini](

[chatgpt](

[claude](

old models = worst thing ever. any good model for 12 gb ram 3 gb vram gtx 1050 linux mint 22.2?

[-]

WhoRoger@reddit

Granite 4 h 7B is perfect for this. Or SmolLM3 3B

[-]

Indigas11@reddit

I run qwen3.6 35b a3b IQ3_XXS on Laptop i7 8th gen 16gb ram + gtx 1050 4gb vram.

pp 15t/s and tg 7t/s (approx) with 96000 ctx (ctv and ctk q4_0)

If you need workflow, that you give it a plan and you come back later, than it is right choice.

You can try qwen3.5 9b, but i get pp 38t/s and tg 7t/s.

[-]

ML-Future@reddit

For your setup I think Qwen 3.5 2b IQ4_NL (1.21gb) would be the best.

Or maybe Qwen 3.5 4b IQ4_NL (2.58 gb)

[-]

HellomyfriendNine@reddit

qwen 3.5 4b the best small model I have ever used(still lacks coding) but great for general reasoning and math

[-]

sagiroth@reddit

For anything sensible you need at bare minimum 8gb vram and 32gb ram tbh and that's only MOE models sadly.

[-]

tomByrer@reddit

A new $500 cell phone would be a better AI server than that computer....

[-]

sagiroth@reddit

On that setup I ran qwen 35A3B with cpu offload to ram at 80tkps and 64k context. Dont think a 500$ phone can do that

[-]

MotokoAGI@reddit

if you have ddr4 system, then qwen3.6-36b at Q4 with cmoe option.

[-]

knselektor@reddit

you can use https://github.com/AlexsJones/llmfit to select a few models to test for your use case

[-]

dreamai87@reddit

Bro for you just go with qwen 3 2507 4b instruct q4

[-]

Literally just prompt it to websearch latest leaderboards and benchmarks, if you don't explicitly point it towards how to find recent information it will pick the lazy route and just go from memory/training which is obviously outdated.

[-]

OsmanthusBloom@reddit

I would try Gemma4 E2B, possibly even E4B. You should be able to fit these if you use llama.cpp, Q4 quants, quantized context (q8_0 or possibly q4_0 if you dare), and either skip mmproj entirely (no image input support then) or at least don't offload it to VRAM.

These are far from the best available models but probably the best you can use with your very limited hardware. Also Qwen3.5 4B might work, or some of the LiquidAI LFM models.

The 1-bit Bonsai models are another option. I've successfully run the 8B model on just 2GB VRAM, see here: https://www.reddit.com/r/LocalLLaMA/comments/1sbnf8y/running_1bit_bonsai_8b_on_2gb_vram_mx150_mobile/

[-]