Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s

Posted by Nutty_Praline404@reddit | LocalLLaMA | View on Reddit | 40 comments

Spent a bunch of time tuning llama.cpp on a Windows 11 box (i7-13700F 64GB) with an RTX 4060 Ti 16GB, trying to get unsloth Qwen3.5-35B-A3B-UD-Q4_K_L running well at 64k context. I finally got it into a pretty solid place, so I wanted to share what is working for me.

models.ini entry:

[qwen3.5-35b-64k]
model = Qwen3.5-35B-A3B-UD-Q4_K_L.gguf
c = 65536
t = 6
tb = 8
n-cpu-moe = 11
b = 1024
ub = 512
parallel = 2
kv-unified = true

Router start command

llama-server.exe --models-preset models.ini --models-max 1 --host 0.0.0.0 --webui-mcp-proxy --port 8080

What I’m seeing now

With that preset, I’m reliably getting roughly 40–60 tok/s on many tasks, even with Docker Desktop running in the background.

A few examples from the logs:

So not “benchmark fantasy numbers,” but real usable throughput at 64k on a 4060 Ti 16GB.

Other observations

I did not find a database of tuned configs for various cards, but might be something useful to have.