qwen3.6-35b-a3b-mtp running on GTX 1060 6GB

Posted by xxvegas@reddit | LocalLLaMA | View on Reddit | 11 comments

I have this old 10-year old Dell T5810 workstation with 32GB ddr3(?) memory and a E5-2698v3 (16 cores 32 threads), a GTX 1060 6GB that's used for mining back in the old days (paid itself back many times over). I managed to get the model running with LMStudio in Windows(!). My settings are:

Model: unsloth qwen3.6-35B-a3b-MTP-GGUF UD Q4_K_XL

Ctx length:131072

GPU offload 41

CPU threadpool size 16

Max concurrent 4

Number of experts 8

Number of MOE layers offloaded to CPU 41

MTP max draft 3

KV quantization both Q4_0

prefill 16k about 130-150tps

decode 4k about 16tps

Very usable for chat.