StepFun 3.7 Flash - Speed Benchmark in M5 Max
Posted by Beamsters@reddit | LocalLLaMA | View on Reddit | 11 comments
Just ran a benchmark with day-0 shipped llama.cpp's branch.
M5 Max: 128 GB - Q4_K_S / memory peak around \~120+ GB making things sluggish but still usable once cmd+tab landed.
Short context < 16k feels fast and very responsive. 32k-64k's speed is not bad, usable.
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 128 | 1 | 128 | 0.000 | nan | 2.038 | 62.80 | 2.038 | 62.80 |
| 2048 | 128 | 1 | 2176 | 1.938 | 1056.65 | 2.115 | 60.52 | 4.053 | 536.88 |
| 8192 | 128 | 1 | 8320 | 9.153 | 895.01 | 2.233 | 57.32 | 11.386 | 730.71 |
| 16384 | 128 | 1 | 16512 | 22.428 | 730.52 | 2.475 | 51.71 | 24.903 | 663.05 |
| 32768 | 128 | 1 | 32896 | 64.539 | 507.73 | 2.818 | 45.43 | 67.356 | 488.39 |
| 65536 | 128 | 1 | 65664 | 178.227 | 367.71 | 3.774 | 33.92 | 182.001 | 360.79 |
Now Pelican bench - very nice one but with quite a long hand lol

ortegaalfredo@reddit
Stepfun also published their own speed benchmarks in Apple, DGX and AMD 395+ on their blogpost.
LegacyRemaster@reddit
Dowloading. Will test on rtx 6000 96gb + w7800 48gb q_4_ks
MikeLPU@reddit
How are you going run it together? Rpc?
LegacyRemaster@reddit
i'm lazy now
Maximum_Parking_5174@reddit
Nvidia Blackwell using Vulcan? Does that work?
Beamsters@reddit (OP)
Yeah lets share some results.
tarruda@reddit
I think the IQ4_XS will be a better choice for 128G. Should have similar performance to Q4_K_S while saving around 6GB of RAM.
rpkarma@reddit
Yep which means you can enable vision!
LegacyRemaster@reddit
ok fast it's fast. We will see long context
Temporary-Mail-4176@reddit
flash numbers always look hot on short prompts but m5 max falls off a cliff once kv cache pressure kicks in past 8k. prompt processing speed is what really hurts these unified memory builds on real work, not the generation tok/s everyone screenshots. one honest 32k haystack run is worth ten more hello-world charts
sagiroth@reddit
The only reliable benchmark