Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19

Posted by Kindly-Cantaloupe978@reddit | LocalLLaMA | View on Reddit | 126 comments

Qwen3.6-27B is out for a few days and the NVFP4 with MTP is dropped earlier on HF: sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP

Can follow the same recipe I used for Qwen3.5-27B to achieve \~80 tps on a single RTX 5090 at 218k context window via latest vllm 0.19 builds (vLLM 0.19.1rc1)

https://www.reddit.com/r/LocalLLaMA/comments/1sr8gyf/qwen3527b_on_rtx_5090_served_via_vllm_77_tps/