[FOLLOW UP] Qwen3.6 27b q5_k_M MTP - 256k context - 5090

Posted by No_Mango7658@reddit | LocalLLaMA | View on Reddit | 19 comments

DUAL 5090s!!!

Absolutely amazing results with dual 5090s, basically doubling my tps.

Just ran this test and surprised by the results.

llama-cli-mtp \
-m \~/Downloads/Qwen3.6-27B-Q5_K_M-mtp.gguf \
--spec-type mtp \
--spec-draft-n-max 3 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
-c 262144 \
-ngl 99 \
--flash-attn on \
--verbose \
-p "Write a short Python function that parses a CSV file."

[ Prompt: 1735.6 t/s | Generation: 127.9 t/s ]
Peak GPU total system memory usage is 18+21=39GB

I've done literally nothing besides put in the second GPU and alter my llama command.

llama-cli-mtp \
-m \~/Downloads/Qwen3.6-27B-Q5_K_M-mtp.gguf \
--spec-type mtp \
--spec-draft-n-max 3 \
-c 262144 \
-ngl 99 \
--verbose \
-p "Write a short Python function that parses a CSV file."

[ Prompt: 251.7 t/s | Generation: 119.4 t/s ]
Peak GPU total system memory 22+25=47GB

Sharing more configurations and tests. I haven't evaluated the output of these tests, just sharing speeds.