optimize qwen3 4b
Posted by No-Selection2972@reddit | LocalLLaMA | View on Reddit | 22 comments
how i can optimize qwen-3-2507 for my potato pc, i heard that this was the best model
Posted by No-Selection2972@reddit | LocalLLaMA | View on Reddit | 22 comments
how i can optimize qwen-3-2507 for my potato pc, i heard that this was the best model
Brave-Hold-9389@reddit
You can run it with vllm and open webui to get the fastest results. But you will need awq format of quantization coz gguf dont work with vllm. Here: https://huggingface.co/cpatonn/Qwen3-4B-Instruct-2507-AWQ-4bit
I don't know if your pc supports vllm though
ANR2ME@reddit
i don't think AWQ works on old GPU like gt710 tho š¤
Brave-Hold-9389@reddit
what about gptq?
AXYZE8@reddit
GT710 has 0.44% compute of RTX3090 and 1GB/2GB DDR3 that does 14GB/s.
I dont think it matter if GPTQ could generate a token or not with such GPU, as his CPU with DDR4 is a speed demon compared to it.
No-Selection2972@reddit (OP)
What could I buy with less than 100$. Black Friday in Europe
Kamal965@reddit
16GB MI50?
No-Selection2972@reddit (OP)
This+ 16gb more of ddr4 2666 or a 3060+16
Kamal965@reddit
Here's one for 99 Euro.. Look around some more, maybe you'll find one cheaper. If you can save up a bit more, the 32GB version is like, less than 50 euros more I think? With the optimization the community have been making for the MI50. It's definitely faster than a 3060. But you'll need to buy a fan to cool it because it's a passively cooled card.
No-Selection2972@reddit (OP)
What o can run with the 32gb version, also can I diy the fan
No-Selection2972@reddit (OP)
Where
AXYZE8@reddit
Used GTX1660 6GB is less than $100. This GPU is good enough for Qwen3 4B.
If you need something even cheaper then GTX1060 6GB. It will also work fine for that model.
However if you would manage to raise your budget to $200 then you can get used RTX3060 12GB, that GPU would allow you to use 12B-16B models such as Gemma3 12B which is a lot better than any 4B model.
ANR2ME@reddit
I don't think you can buy decent GPU at $100 š¤ you also lacked of RAM, you will also need to consider the GPU wattage, since your PSU might not have enough power for a decent GPU.
If you're only need to test out stuff, you can use free service like Colab/Kaggle/Modal that gives you free compute daily/weekly/monthly.
Thechae9@reddit
For less than 100$ what could i buy
pokemonplayer2001@reddit
In response to you providing zero info, here's my answer: _____
No-Selection2972@reddit (OP)
Iām so sorry bro, gt 710 š 16gb ddr4, i5-9400f and the fastest sata ssd
AXYZE8@reddit
You should use CPU only inference, that GT710 is useless in terms of compute and you likely have DDR3 variant which makes it a lot slower in bandwidth too.
Sadly the peformance you see right now cannot be improved, it will be that slow no matter what you do.
You can try Granite 4 Tiny or LFM2 8B. They will be way faster, because they activate just 1B params.Ā
tmvr@reddit
Not many options there. With the Q4 version you'll may be able to get around 10 tok/s but that's about it.
Thechae9@reddit
Settings are still settings
ThunderousHazard@reddit
pokemonplayer2001@reddit
Even better.
ThunderousHazard@reddit
#
Vegetable-Second3998@reddit
Make sure you grab the 4-bit quantized 4B parameter version. Adjust your context length to your actual use case (i.e., if you don't need a 262144 context length, you can adjust it in the model settings. You may also want to enable KV Cache Quantization.