StepFun 3.7 Flash - Speed Benchmark in M5 Max

Posted by Beamsters@reddit | LocalLLaMA | View on Reddit | 11 comments

Just ran a benchmark with day-0 shipped llama.cpp's branch.

M5 Max: 128 GB - Q4_K_S / memory peak around \~120+ GB making things sluggish but still usable once cmd+tab landed.

Short context < 16k feels fast and very responsive. 32k-64k's speed is not bad, usable.

PP TG B N_KV T_PP s S_PP t/s T_TG s S_TG t/s T s S t/s
0 128 1 128 0.000 nan 2.038 62.80 2.038 62.80
2048 128 1 2176 1.938 1056.65 2.115 60.52 4.053 536.88
8192 128 1 8320 9.153 895.01 2.233 57.32 11.386 730.71
16384 128 1 16512 22.428 730.52 2.475 51.71 24.903 663.05
32768 128 1 32896 64.539 507.73 2.818 45.43 67.356 488.39
65536 128 1 65664 178.227 367.71 3.774 33.92 182.001 360.79

Now Pelican bench - very nice one but with quite a long hand lol