Deepseek v4 flash weird sizes?
Posted by WyattTheSkid@reddit | LocalLLaMA | View on Reddit | 14 comments
So I'm sure everyone is excited about the new deepseek release(s) but I'm a little confused about it's vram requirements. a q4 gguf of it is only 120gb? While being a 284b parameter model? Does anyone understand how this is possible?
Expensive-Paint-9490@reddit
If you post a link it could be helpful. Where did you find this gguf?
WyattTheSkid@reddit (OP)
https://huggingface.co/tecaprovn/deepseek-v4-flash-gguf
Expensive-Paint-9490@reddit
It's a pruned version with 158B parameters.
WyattTheSkid@reddit (OP)
No its not, huggingface automatically reports parameter count based on file size. Which proves my point further. Deepseek v4 has a weird size scheme
Expensive-Paint-9490@reddit
It is literally written in the file name: DeepSeekV4-Flash-158B-Q4_K_M.gguf
Monkey_1505@reddit
The quants say 158b.
Maybe it's heavily reaped?
Doesn't really explain that anywhere I can see.
_lil41@reddit
Because deepseek released it with experts at fp4 and other params at fp8, so the mixed weights make it have a smaller size to begin with.
Thomas-Lore@reddit
That has nothing to do with how large quants are. It just means the initial model is not much larger than fully 4-bit quant.
_lil41@reddit
I may be wrong but I think the q4 quants make the experts go down to 2 bit, so it scales
ImportancePitiful795@reddit
Came out with FP4 Experts and FP8 everything else. Is as small as it gets, before lobotomize it.
At this point laughing at NVIDIA tweets, boasting is running fine while lobotomised to NVFP4 on their GB300 server.... 😂
Thomas-Lore@reddit
It seems a bit low, maybe they quantized more than what is usually done for q4?
Thomas-Lore@reddit
Why is it strange? Q4 quants are usually around half GB * number of params. 120GB is not far from that.
Different-Rush-2358@reddit
Actually, one question, and while I'm at it, is there any word on when Unslouth will release the UD Quants for this model?
pmttyji@reddit
First we need llama.cpp support for the models
No GGUFs for DeepSeek V4-Flash as yet?