What would be the minimum requirement for Llama400B?

[-]

sebo3d@reddit

Let me put it that way. People who have absolute top of the shelf gaming monsters won't even be able to run a Q2. Hell they can just about barely run a q2 70B let alone 400B.

Reply

[-]

somebrains@reddit

Yeah, that's workstation mobo density territory.

Reply

[-]

PlantFlat4056@reddit (OP)

I understand. Thanks for the info.

Reply

[-]

If you want to try the 400B model with the least amount of money and don't mind very slow response times, you can buy a new computer with 192GB of DDR5 RAM to run the Q2 GGUF version. For context, Llama405B's different quantization levels and their respective memory requirements are: Q8 ≈ 445GB Q4 ≈ 245GB Q2 ≈ 135GB.

Reply

[-]

somebrains@reddit

I would avoid anyone dumping a 13th or 14th gen Intel build. My local craigslist is starting to get flooded with cheap suspect cpu/mobo combos.

Reply

[-]

-p-e-w-@reddit

Not even close. Ignoring some fine points (such as that the model has 405 billion rather than 400 billion parameters, and that there are multiple "Q2" type quants), and assuming 400B with 2bpw, just loading the model requires 100 GB of (V)RAM. And at that point you don't even have a KV cache yet. Also, Q2 quants suck. They really do. I looked at a lot of IQ2_XXS samples from the recent llama.cpp survey, and many of them were *broken*, not just bad.

Reply

[-]

PlantFlat4056@reddit (OP)

I know Q2 quants suck, but I just really wanted to try the model. But oh well, thanks for the reply.

Reply

[-]

L-Acacia@reddit

You can try it through api on a service like [together.ai](http://together.ai)

Reply

[-]

YearnMar10@reddit

It’s that simple: number of parameters is (roughly) the amount of ram in GB you need for running it as q8. Divide or multiply if you want to use other quants or fp16 or 32.

Reply

[-]

PlantFlat4056@reddit (OP)

This is great. Thanks for the tip

Reply

[-]

mrjackspade@reddit

You will not be running any version of 400B with an 8GB M2

Reply

[-]

PlantFlat4056@reddit (OP)

Yeap, got it. Thanks.

Reply

[-]

Devy9@reddit

Btw, If you check the model cards of these quantized models you could find the Memory requirements in some cases

What would be the minimum requirement for Llama400B?

Reply to Post

14 Comments

sebo3d@reddit

somebrains@reddit

PlantFlat4056@reddit (OP)

tonsui@reddit

somebrains@reddit

-p-e-w-@reddit

PlantFlat4056@reddit (OP)

L-Acacia@reddit

YearnMar10@reddit

PlantFlat4056@reddit (OP)

mrjackspade@reddit

PlantFlat4056@reddit (OP)

Devy9@reddit

PlantFlat4056@reddit (OP)