Reality Check on 50 t/s for Qwen3.5-122B-A3B and 3500 USD device

Posted by kuhunaxeyive@reddit | LocalLLaMA | View on Reddit | 67 comments

I found an optimization that achieves 51 tokens/s (48 for very long contexts) for Qwen3.5-122B-A3B, and the guy who did that published a bash script on Github that sets it up automatically:

https://forums.developer.nvidia.com/t/qwen3-5-122b-a10b-on-single-spark-up-to-51-tok-s-v2-1-patches-quick-start-benchmark/365639/71

This optimization was implemented on NVIDIA Spark. The Asus Ascent DX10 shares the same internal hardware (the NVIDIA GB10 Grace Blackwell Superchip), with the main differences being the casing and cooling. It is priced at around USD 3,500 due to having only 1 TB of storage, which is sufficient for my use case. A generation speed of 50 tokens/s for a model of this size would make it practically usable. However, before purchasing the device, I want to verify whether my assumptions place it within a usable performance range.

My questions: