Can Qwen3-Coder-Next run on a laptop with the following specifications

Posted by Itchy-News26@reddit | LocalLLaMA | View on Reddit | 15 comments

Can Qwen3-Coder-Next run on a laptop with the following specifications:

RTX 5060 8GB, 32GB RAM, Intel Core i7-14650HX

[-]

SimilarWarthog8393@reddit

Yes at Q4 or Q3 with most experts on CPU

[-]

kreigiron@reddit

You can use llama.cpp's llama server with mmap enabled, it will magically fit some expert on demand on your gpu when needed, I am using now a 5060 ti 16 Gigs and it seems to be doing that)

[-]

EyeAccomplished6887@reddit

Curious how this setup has been working for you - I've got an old 6GB AMD card and was considering picking up a 5060ti for local inference.

[-]

kreigiron@reddit

I've moved now to Qwen3.5, both 27b and 35B-a3b works and fit very well on lower quants on the 5060ti

[-]

Baldur-Norddahl@reddit

Q4 with no context is 40 GB and the guy has 8 GB VRAM + 32 GB RAM leaving no space for the OS, context or anything else.

Q3 could maybe load, but you would still have no space to actually use it.

Q2 would load, but is not actually useful.

[-]

I mostly agree, but --mmap makes it doable though, depends on what the minimum pp and tps OP is willing to accept. I have 64gb RAM and 8gb VRAM on my laptop, I run OSS 120B mxfp4 with --mmap at around 200 pp and 15 tps (64K @q8 context), I can run Q5 Qwen3 Next with --mmap at 400 pp and 20-25 tps with 64K context @Q8. Gotta push your hardware to its limits haha

[-]

AgilePhotograph3305@reddit

Was that a laptop or a pc desktop. Can a Macbook pro i9,( 8Gb Vram, 64gb ram) run the model or should i get a zbook Fury laptop with Rtx 4000 8GB VRam and 64GB ram. I would preffer to stay on the mac running linux granted it cooled with the thermal pad mod and a cooling laptop stand. Thoughts?

[-]

SimilarWarthog8393@reddit

I'm biased against Apple products, but in your shoes I'd first research which option will have better performance (memory bandwidth of the VRAM, PCIe connection, etc) and ofc consider thermal throttling if the MacBook doesn't cool itself well you'll suffer during longer periods of inference

[-]