Can Qwen3-Coder-Next run on a laptop with the following specifications
Posted by Itchy-News26@reddit | LocalLLaMA | View on Reddit | 15 comments
Can Qwen3-Coder-Next run on a laptop with the following specifications:
RTX 5060 8GB, 32GB RAM, Intel Core i7-14650HX
SimilarWarthog8393@reddit
Yes at Q4 or Q3 with most experts on CPU
kreigiron@reddit
You can use llama.cpp's llama server with mmap enabled, it will magically fit some expert on demand on your gpu when needed, I am using now a 5060 ti 16 Gigs and it seems to be doing that)
EyeAccomplished6887@reddit
Curious how this setup has been working for you - I've got an old 6GB AMD card and was considering picking up a 5060ti for local inference.
kreigiron@reddit
I've moved now to Qwen3.5, both 27b and 35B-a3b works and fit very well on lower quants on the 5060ti
Baldur-Norddahl@reddit
Q4 with no context is 40 GB and the guy has 8 GB VRAM + 32 GB RAM leaving no space for the OS, context or anything else.
Q3 could maybe load, but you would still have no space to actually use it.
Q2 would load, but is not actually useful.
SimilarWarthog8393@reddit
I mostly agree, but --mmap makes it doable though, depends on what the minimum pp and tps OP is willing to accept. I have 64gb RAM and 8gb VRAM on my laptop, I run OSS 120B mxfp4 with --mmap at around 200 pp and 15 tps (64K @q8 context), I can run Q5 Qwen3 Next with --mmap at 400 pp and 20-25 tps with 64K context @Q8. Gotta push your hardware to its limits haha
AgilePhotograph3305@reddit
Was that a laptop or a pc desktop. Can a Macbook pro i9,( 8Gb Vram, 64gb ram) run the model or should i get a zbook Fury laptop with Rtx 4000 8GB VRam and 64GB ram. I would preffer to stay on the mac running linux granted it cooled with the thermal pad mod and a cooling laptop stand. Thoughts?
SimilarWarthog8393@reddit
I'm biased against Apple products, but in your shoes I'd first research which option will have better performance (memory bandwidth of the VRAM, PCIe connection, etc) and ofc consider thermal throttling if the MacBook doesn't cool itself well you'll suffer during longer periods of inferenceĀ
AgilePhotograph3305@reddit
Thanks for your response and opinion. Cheers.
Vaddieg@reddit
and context in swap file
Baldur-Norddahl@reddit
No, Qwen3-coder-next Is a 80b model that requires about 45 GB minimum, preferably VRAM. You could maybe load a 2 bit quantization of it, but it would be braindead and perform horrible.
Try Devstral Small 2 instead. Although even that will be slow, it will at least run.
ClimateBoss@reddit
or qwen3-coder-30b-a3b q8_0
Far_Cat9782@reddit
Slow but will work if your willing to do something else while it writes code
ForsookComparison@reddit
Can confirm a 2 bit version of a model with 3B active params being anything other than silly is science fiction right now.
InvertedVantage@reddit
A quantized model will fit probably, but be too slow to be useful.