Replace RTX 2060 12G with second RTX 5060 Ti 16G for Qwen 3.6 27B?

Posted by houchenglin@reddit | LocalLLaMA | View on Reddit | 16 comments

Right now I'm running Qwen3-27B-Q4_K_M on a 2060 12G + 5060 Ti 16G with tensor split 15/7. Gen speed sits around 16.5 t/s and prompt eval drops from 653 to 356 t/s as context grows. It works, but I'm thinking about replacing the 2060 by another 5060 Ti to get a balanced dual setup with 32GB total VRAM.

[bench] RTX 2060 12G (PCIe x16) + RTX 5060 Ti 16G (PCIe x 4)

- Model: Unsloth Qwen3-27B-Q4_K_M

- PP: from 653 → 356 t/s as context grows (13K → 29.5K tokens).

- TG: flat at \~16.5 t/s r

 -m Qwen3-27B-Q4_K_M.gguf -ngl 999 -ts 15,7
 -fa 1 --no-mmap -b 4096 -ub 4096
 --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 12 --draft-max 48
 -c 96000 -n 32768 -t 8 -ctk q8_0 -ctv q8_0 --parallel 1
 --temperature 0.6 --jinja --min-p 0.0 --top-k 20 --top-p 0.95

My main question is whether the speed gain is actually worth it. One of the x16 slots on my board is only running at x4, so I'm worried the PCIe bottleneck eats most of the benefit. Anyone running dual 5060 Ti (or similar dual mid-range) for 27B+ models? What kind of gen speed are you seeing?

Also curious about the VRAM side — going from 28GB to 32GB, does that meaningfully change what models I can run, or am I still capped around 27B either way? Net cost is basically one 5060 Ti minus whatever I get for the 2060, so trying to figure out if the jump justifies it.