TheaterFire

What would be the minimum requirement for Llama400B?

Posted by PlantFlat4056@reddit | LocalLLaMA | View on Reddit | 14 comments

I have mac M2 8gb. Would it be able to run on Q2 with llama.cpp?

Reply to Post

14 Comments

sebo3d@reddit

Let me put it that way. People who have absolute top of the shelf gaming monsters won't even be able to run a Q2. Hell they can just about barely run a q2 70B let alone 400B.
View on Reddit #31000281

somebrains@reddit

Yeah, that's workstation mobo density territory.
View on Reddit #31062192

PlantFlat4056@reddit (OP)

I understand. Thanks for the info.
View on Reddit #31000807

tonsui@reddit

If you want to try the 400B model with the least amount of money and don't mind very slow response times, you can buy a new computer with 192GB of DDR5 RAM to run the Q2 GGUF version. For context, Llama405B's different quantization levels and their respective memory requirements are: Q8 β‰ˆ 445GB Q4 β‰ˆ 245GB Q2 β‰ˆ 135GB.
View on Reddit #31001813

somebrains@reddit

I would avoid anyone dumping a 13th or 14th gen Intel build. My local craigslist is starting to get flooded with cheap suspect cpu/mobo combos.
View on Reddit #31062171

-p-e-w-@reddit

Not even close. Ignoring some fine points (such as that the model has 405 billion rather than 400 billion parameters, and that there are multiple "Q2" type quants), and assuming 400B with 2bpw, just loading the model requires 100 GB of (V)RAM. And at that point you don't even have a KV cache yet. Also, Q2 quants suck. They really do. I looked at a lot of IQ2_XXS samples from the recent llama.cpp survey, and many of them were *broken*, not just bad.
View on Reddit #31000381

PlantFlat4056@reddit (OP)

I know Q2 quants suck, but I just really wanted to try the model. But oh well, thanks for the reply.
View on Reddit #31000748

L-Acacia@reddit

You can try it through api on a service like [together.ai](http://together.ai)
View on Reddit #31032955

YearnMar10@reddit

It’s that simple: number of parameters is (roughly) the amount of ram in GB you need for running it as q8. Divide or multiply if you want to use other quants or fp16 or 32.
View on Reddit #31000868

PlantFlat4056@reddit (OP)

This is great. Thanks for the tip
View on Reddit #31011923

mrjackspade@reddit

You will not be running any version of 400B with an 8GB M2
View on Reddit #31000326

PlantFlat4056@reddit (OP)

Yeap, got it. Thanks.
View on Reddit #31000834

Devy9@reddit

Btw, If you check the model cards of these quantized models you could find the Memory requirements in some cases
View on Reddit #31000393

PlantFlat4056@reddit (OP)

Yeap, will do.
View on Reddit #31000816