What tokens/sec do you get when running Qwen 3.5 27B?

Posted by thegr8anand@reddit | LocalLLaMA | View on Reddit | 194 comments

I have a 4090 with just 32gb of ram. I wanted to get an idea what speeds other users get when using 27B. I see many posts about people saying X tokens/sec but not the max context they use.

My setup is not optimal. I'm using LM studio to run the models. I have tried Bartowski Q4KM and Unsloth Q4KXL and speeds are almost similar for each. But it depends on the context I use.

If I use a smaller context under 50k, I can get between 32-38 tokens/sec. But the max I could run for my setup is around 110k, and the speed drops to 7-10 tokens/sec because I need to offload some of the layers (run 54-56 on GPU out of 64). Under 50k context, I can load all 64 layers on GPU.