Qwen 3.5 35B on LocalAI (Strix Halo): Vulkan / ROCm

Posted by pipould@reddit | LocalLLaMA | View on Reddit | 2 comments

Qwen 3.5 35B on LocalAI: Vulkan vs ROCm

Hey everyone! 👋

Just finished running a bunch of benchmarks on the new Qwen 3.5 35B models using LocalAI and figured I'd share the results. I was curious how Vulkan and ROCm backends stack up against each other for these two different quant/source variants.


Two model variants, each on both Vulkan and ROCm:

Model Type Quant Source
mudler/Qwen3.5-35B-A3B-APEX-GGUF:Qwen3.5-35B-A3B-APEX-I-Quality.gguf MoE (3B active) APEX mudler
unsloth/Qwen3.5-35B-A3B-GGUF:Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf MoE (3B active) GGUF unsloth

Tool: llama-benchy (via uvx), with prefix caching enabled, generation latency mode, adaptive prompts.

Context depths tested: 0, 4K, 8K, 16K, 32K, 65K, 100K, and up to 200K tokens.

System Environment

Lemonade Version: 10.1.0
OS: Linux-6.19.10-061910-generic (Ubuntu 25.10)
CPU: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
Shared GPU memory: 118.1 GB
TDP: 85W

vulkan : 'b8681'
rocm   : 'b1232'
cpu    : 'b8681'

The results

1. Qwen3.5-35B-A3B-APEX-I-Quality (mudler)

(See charts 1 & 2)

On token generation, Vulkan is the clear winner here, consistently outperforming ROCm. At zero context, Vulkan hits ~57.5 t/s compared to ROCm's ~50.0 t/s. As context grows to 100K, Vulkan maintains a healthy ~38.6 t/s while ROCm drops to ~35.7 t/s.

Prompt processing is where ROCm shows its strength, though Vulkan is very competitive. At 4K context, ROCm hits ~885 t/s while Vulkan is at ~759 t/s. The gap remains significant even at higher context depths.


2. Qwen3.5-35B-A3B-ThinkingCoder (unsloth)

(See charts 3 & 4)

This variant follows a very similar pattern. On token generation, Vulkan again takes the lead, starting at ~53.3 t/s (vs ROCm's ~46.6 t/s) and maintaining a lead even at 100K context.

Prompt processing is notably faster on ROCm, hitting ~1052 t/s at 2K context, whereas Vulkan is around ~798 t/s.


Gen Speed Winner Prompt Processing Winner
APEX-I-Quality Vulkan ROCm
ThinkingCoder Vulkan ROCm

Big picture:

For day-to-day use, if you want the fastest response time per token, Vulkan is the way to go. If you are processing massive amounts of text in a single prompt, ROCm might give you the edge.


*Benchmarks done with llama-benchy.