Deepseek v4 flash weird sizes?

Posted by WyattTheSkid@reddit | LocalLLaMA | View on Reddit | 14 comments

So I'm sure everyone is excited about the new deepseek release(s) but I'm a little confused about it's vram requirements. a q4 gguf of it is only 120gb? While being a 284b parameter model? Does anyone understand how this is possible?

[-]

Expensive-Paint-9490@reddit

If you post a link it could be helpful. Where did you find this gguf?

[-]

WyattTheSkid@reddit (OP)

https://huggingface.co/tecaprovn/deepseek-v4-flash-gguf

[-]

Expensive-Paint-9490@reddit

It's a pruned version with 158B parameters.

[-]

WyattTheSkid@reddit (OP)

No its not, huggingface automatically reports parameter count based on file size. Which proves my point further. Deepseek v4 has a weird size scheme

[-]

Expensive-Paint-9490@reddit

It is literally written in the file name: DeepSeekV4-Flash-158B-Q4_K_M.gguf

[-]

Monkey_1505@reddit

The quants say 158b.

Maybe it's heavily reaped?

Doesn't really explain that anywhere I can see.

[-]

_lil41@reddit

Because deepseek released it with experts at fp4 and other params at fp8, so the mixed weights make it have a smaller size to begin with.

[-]

Thomas-Lore@reddit

That has nothing to do with how large quants are. It just means the initial model is not much larger than fully 4-bit quant.

[-]

_lil41@reddit

I may be wrong but I think the q4 quants make the experts go down to 2 bit, so it scales

[-]

ImportancePitiful795@reddit

Came out with FP4 Experts and FP8 everything else. Is as small as it gets, before lobotomize it.

At this point laughing at NVIDIA tweets, boasting is running fine while lobotomised to NVFP4 on their GB300 server.... 😂

[-]