To run deepseek v4 flash how much max vram we need? 175 gb or 320gb?

Posted by 9r4n4y@reddit | LocalLLaMA | View on Reddit | 17 comments

As far as i know the weight is of 160gb + 9.6gb needed for max 1 million token window + 5 gigs overhead = 175gb vram.

But vllm and othere sources said "To use the full 1M context, you need 4x A100 80G" --> thats a 320gb vram ?? Am i missing something??

Sources:

  1. https://lushbinary.com/blog/deepseek-v4-self-hosting-guide-vllm-hardware-deployment/?hl=en-GB

  2. Vllm blog of deployment