I have (even faster) DeepSeek V4 Pro at home

Posted by fairydreaming@reddit | LocalLLaMA | View on Reddit | 39 comments

Few days ago I posted about my DeepSeek V4 Pro at home - now time for an update. Yesterday I finally managed to run this model in ktransformers (sglang + kt-kernel). I followed the tutorial for DeepSeek V4 Flash and tweaked some options (NUMA, cores) for my hardware (Epyc 9374F + RTX PRO 6000 Max-Q). Then I ran llama-benchy with increasing context depth to check the performance. Results:

Depth 0:

| model                       |   test |          t/s |    peak t/s |       ttfr (ms) |    est_ppt (ms) |   e2e_ttft (ms) |
|:----------------------------|-------:|-------------:|------------:|----------------:|----------------:|----------------:|
| deepseek-ai/DeepSeek-V4-Pro |  pp512 | 39.76 ± 0.00 |             | 12878.44 ± 0.00 | 12877.59 ± 0.00 | 12878.44 ± 0.00 |
| deepseek-ai/DeepSeek-V4-Pro |   tg32 |  7.54 ± 0.00 | 8.00 ± 0.00 |                 |                 |                 |

Depth 2048:

| model                       |          test |          t/s |    peak t/s |       ttfr (ms) |    est_ppt (ms) |   e2e_ttft (ms) |
|:----------------------------|--------------:|-------------:|------------:|----------------:|----------------:|----------------:|
| deepseek-ai/DeepSeek-V4-Pro | pp512 @ d2048 | 45.13 ± 0.00 |             | 56726.85 ± 0.00 | 56725.93 ± 0.00 | 56726.85 ± 0.00 |
| deepseek-ai/DeepSeek-V4-Pro |  tg32 @ d2048 |  7.32 ± 0.00 | 8.00 ± 0.00 |                 |                 |                 |

Depth 4096:

| model                       |          test |          t/s |    peak t/s |        ttfr (ms) |     est_ppt (ms) |    e2e_ttft (ms) |
|:----------------------------|--------------:|-------------:|------------:|-----------------:|-----------------:|-----------------:|
| deepseek-ai/DeepSeek-V4-Pro | pp512 @ d4096 | 45.75 ± 0.00 |             | 100729.28 ± 0.00 | 100728.46 ± 0.00 | 100729.28 ± 0.00 |
| deepseek-ai/DeepSeek-V4-Pro |  tg32 @ d4096 |  7.29 ± 0.00 | 8.00 ± 0.00 |                  |                  |                  |

Depth 8192:

| model                       |          test |          t/s |    peak t/s |        ttfr (ms) |     est_ppt (ms) |    e2e_ttft (ms) |
|:----------------------------|--------------:|-------------:|------------:|-----------------:|-----------------:|-----------------:|
| deepseek-ai/DeepSeek-V4-Pro | pp512 @ d8192 | 45.97 ± 0.00 |             | 189354.94 ± 0.00 | 189354.03 ± 0.00 | 189354.94 ± 0.00 |
| deepseek-ai/DeepSeek-V4-Pro |  tg32 @ d8192 |  7.25 ± 0.00 | 8.00 ± 0.00 |                  |                  |                  |

Depth 16384:

| model                       |           test |          t/s |    peak t/s |        ttfr (ms) |     est_ppt (ms) |    e2e_ttft (ms) |
|:----------------------------|---------------:|-------------:|------------:|-----------------:|-----------------:|-----------------:|
| deepseek-ai/DeepSeek-V4-Pro | pp512 @ d16384 | 46.16 ± 0.00 |             | 365997.22 ± 0.00 | 365996.26 ± 0.00 | 365997.22 ± 0.00 |
| deepseek-ai/DeepSeek-V4-Pro |  tg32 @ d16384 |  7.17 ± 0.00 | 8.00 ± 0.00 |                  |                  |                  |

Depth 32768:

| model                       |           test |          t/s |    peak t/s |        ttfr (ms) |     est_ppt (ms) |    e2e_ttft (ms) |
|:----------------------------|---------------:|-------------:|------------:|-----------------:|-----------------:|-----------------:|
| deepseek-ai/DeepSeek-V4-Pro | pp512 @ d32768 | 46.18 ± 0.00 |             | 720687.13 ± 0.00 | 720685.67 ± 0.00 | 720687.13 ± 0.00 |
| deepseek-ai/DeepSeek-V4-Pro |  tg32 @ d32768 |  7.07 ± 0.00 | 8.00 ± 0.00 |                  |                  |                  |

During 64k test (that took over 20 min) llama-benchy did not report the result despite sglang finishing processing the request so I aborted the test. I don't know, maybe there is some kind of timeout happening.

This is all running the original model files, no need for conversion.