To run deepseek v4 flash how much max vram we need? 175 gb or 320gb?

Posted by 9r4n4y@reddit | LocalLLaMA | View on Reddit | 17 comments

As far as i know the weight is of 160gb + 9.6gb needed for max 1 million token window + 5 gigs overhead = 175gb vram.

But vllm and othere sources said "To use the full 1M context, you need 4x A100 80G" --> thats a 320gb vram ?? Am i missing something??

Sources:

https://lushbinary.com/blog/deepseek-v4-self-hosting-guide-vllm-hardware-deployment/?hl=en-GB
Vllm blog of deployment

[-]

Conscious_Cut_6144@reddit

The people those instructions are targeting are serving many users. For a home user 192gb should be plenty.

[-]

rebellioninmypants@reddit

Thanks, I think I had one of those laying around somewhere in the pantry

[-]

Expensive-Paint-9490@reddit

vLLM needs n\^2 GPUs to work at its best, so 1, 2, 4, or 8.

Two A100 are just 160 GB, not enough. So they advice on four A100.

But your calculation is correct, you actually need only 170 and few GB. So two Blackwell Pro 6000 would be fine, as well as four A6000 or 6000 Ada

[-]

rebellioninmypants@reddit

Damn that's like 3 weeks of my allowance to put together

[-]

KURD_1_STAN@reddit

I don't understand why are people so against offloading MOE models, how bad is the performance drop really?

[-]

I wonder the same. With engram and active parameters concept, I believe it could be used quite well with maybe 128gb ram, like some models with Ryzen ai max+ 395 have. After all, we will hardly use all experts available during the day, so the idea is that the experts in memory keep quite stable

[-]