I just bought Asus Ascent : Nvidia GB10 (DGX) and It is slower than my Ryzen Ai Max

Posted by Voxandr@reddit | LocalLLaMA | View on Reddit | 43 comments

It is suppose to be 2-4x faster but i am only getting 6TK/s on Gemma4-31B . What am i doing wrong?

Config:

llama-server  --models-preset /home/dgx/models/models.ini --models-dir /home/dgx/models/ --host 0.0.0.0 --port 8080 --models-max 1 --parallel 1

model.ini:

[*]
threads = 12
flash-attn = on
mlock = off
mmap = off
fit = on
warmup = on
; batch-size = 4096
; ubatch-size = 512
cache-type-k = q8_0
cache-type-v = q8_0
jinja = true
direct-io = on
cache-prompt = true
cache-reuse = 256
cache-ram = 32768
reasoning-format = auto
n-gpu-layers = 999