Qwen3.5 27B running at ~65tps with DFlash speculation on 2x 3090

[-]

Opteron67@reddit

Failed: Cuda error /home/_/vllm/csrc/custom_all_reduce.cuh:455 'an illegal memory access was encountered'

[-]

AdamDhahabi@reddit

That looks very cool for builds running multi-GPU on consumer mainboards meaning no tensor parallel due to poor PCIE bandwidth.

[-]

marutichintan@reddit

currently i am running 122b on 4x3090, i am waiting for Dflash

[-]

wullyfooly@reddit

Please update us on the result! Very curious on the performance

[-]

Opteron67@reddit

160 tps 1 concurrency, 620 tps batched on Qwen3,5 27B fp8 dual 5090

[-]

Kryesh@reddit (OP)

Testing out https://huggingface.co/z-lab/Qwen3.5-27B-DFlash to see how it work - pleasantly surprised by the performance after getting ~25tps in llama.cpp.

Command: uv run vllm serve cyankiwi/Qwen3.5-27B-AWQ-4bit --speculative-config '{"method": "dflash", "model": "z-lab/Qwen3.5-27B-DFlash", "num_speculative_tokens": 8, "draft_tensor_parallel_size": 2}' --attention-backend flash_attn --max_num_seqs 4 --max-num-batched-tokens 12288 -tp 2 --gpu-memory-utilization 0.80 --max-model-len -1 --reasoning-parser qwen3 --enable-prefix-caching --enable-auto-tool-choice --tool-call-parser qwen3_coder

[-]

koljanos@reddit

That’s weird, with nvlink I can run 6 bit quant at 170k context window, at the same tps, you want my settings?

[-]

Kryesh@reddit (OP)

There's several reasons I won't get max performance on my current setup. It's a desktop so I need vram for running my ui etc and vllm doesn't do asymmetric offloading so the second card isnt using all available memory. The dflash model is 3.5gb which takes up memory that could be used for context, and I don't have an nvlink bridge for faster tensor parallelism.

[-]

tomz17@reddit

--reasoning-parser qwen3
-tool-call-parser qwen3_coder

are these correct?

[-]

Experiments show that DFlash achieves over 6x lossless acceleration across a range of models and tasks

Jesus Chris, Patron Saint of Typos :0

https://arxiv.org/abs/2602.06036

[-]

Addyad@reddit

Niceeee