A simple "hack" to speed up prompt processing for Qwen 3.5/3.6 in LM Studio

Posted by GrungeWerX@reddit | LocalLLaMA | View on Reddit | 14 comments

Increase your CPU Thread Pool Size to your processor's max. In LM Studio, the max is 10. I'm running an i7 12700K, so I set mine to 20. It doubled, and in some cases nearly tripled my prompt processing speed and now things are flying at over 100K context. I'm still getting 25+ tok/sec at high context since I can still max my gpu offload.

For those interested, I'm using Q5 UD K XL quants for both 3.5/3.6.

Sadly, doesn't seem to help with Gemma 4 31B, and your mileage may vary with other models, but it works well with Qwen.

Hope this helps someone else out.