To run deepseek v4 flash how much max vram we need? 175 gb or 320gb?
Posted by 9r4n4y@reddit | LocalLLaMA | View on Reddit | 17 comments
As far as i know the weight is of 160gb + 9.6gb needed for max 1 million token window + 5 gigs overhead = 175gb vram.
But vllm and othere sources said "To use the full 1M context, you need 4x A100 80G" --> thats a 320gb vram ?? Am i missing something??
Sources:
-
https://lushbinary.com/blog/deepseek-v4-self-hosting-guide-vllm-hardware-deployment/?hl=en-GB
-
Vllm blog of deployment
Conscious_Cut_6144@reddit
The people those instructions are targeting are serving many users. For a home user 192gb should be plenty.
rebellioninmypants@reddit
Thanks, I think I had one of those laying around somewhere in the pantry
9r4n4y@reddit (OP)
Ohhhh, thanks :)
Expensive-Paint-9490@reddit
vLLM needs n\^2 GPUs to work at its best, so 1, 2, 4, or 8.
Two A100 are just 160 GB, not enough. So they advice on four A100.
But your calculation is correct, you actually need only 170 and few GB. So two Blackwell Pro 6000 would be fine, as well as four A6000 or 6000 Ada
rebellioninmypants@reddit
Damn that's like 3 weeks of my allowance to put together
9r4n4y@reddit (OP)
Thx :)
insanemal@reddit
Or 2X GB10
Fabulous
KURD_1_STAN@reddit
I don't understand why are people so against offloading MOE models, how bad is the performance drop really?
Farenheith200@reddit
I wonder the same. With engram and active parameters concept, I believe it could be used quite well with maybe 128gb ram, like some models with Ryzen ai max+ 395 have. After all, we will hardly use all experts available during the day, so the idea is that the experts in memory keep quite stable
Fit-Statistician8636@reddit
175, or rather 192 GB is enough - once it is supported on consumer/workstation-class GPUs.
9r4n4y@reddit (OP)
But why vllm said 320gb??
relmny@reddit
I guess because of the ""To use the full 1M context..."
Context takes a lot of memory.
9r4n4y@reddit (OP)
Nahh full context need 9.6 gigs at max. Vllm stated that also
TheRealMasonMac@reddit
The article seems to be lazy AI generated crud.
9r4n4y@reddit (OP)
Yeah but the problem is even vllm blog was same
LegacyRemaster@reddit
will be something like
:
Evening_Ad6637@reddit
Well the source doesn't seem to have handled the text and especially the calculations with much precision.
It also states:
I mean come on, that wasn't that hard to calculate...
I think the article was written under time pressure or something. Take it with a grain of salt.