LPCAMM2: does 64 or 96GB make sense for LLMs or large models will be too slow?

Posted by duidui232323@reddit | LocalLLaMA | View on Reddit | 7 comments

Hello!

My next machine will have an LPCAMM2 slot, with 32GB or 64GB 8600 MT/s options, and a future option of 96GB 9600 MT/s (probably not very soon). They have a 128 bit bus.

Currently 64GB comes at a huge premium. Does it even make sense to have 64GB instead of 32GB or any large model that doesn't fit 32GB will be too slow? I cannot find any benchmark online, so I guess all we can do now is speculate.

My uses would be coding, RAG and generic chatbot

[-]

MelodicRecognition7@reddit

https://old.reddit.com/r/LocalLLaMA/comments/1rqo2s0/can_i_run_this_model_on_my_hardware/?

[-]

Bird476Shed@reddit

Does it even make sense to have 64GB instead of 32GB

The model size is basically only limited by ram. More ram, larger/smarter models are possible.

will be too slow?

You decide what speed is acceptable for you.

[-]

duidui232323@reddit (OP)

I'd ideally want conversational speed for coding and chatbot. For image gen I'll get an eGPU down the line

[-]

XccesSv2@reddit

running models > 32GB are even on DGX Spark or Strix Halo too slow and they have quad channel RAM. If you really need local AI, then try to get a Workstation PC with a graphics card with more than 32GB VRAM instead on wasting money on normal RAM.

[-]

duidui232323@reddit (OP)

For now I need a laptop, then I plan to get a workstation next year. But I would still like to run some decent local LLMs on a laptop.

[-]

ProfessionalSpend589@reddit

Does it even make sense to have 64GB instead of 32GB

Context size for small models. Full context could take 10GB of RAM or as was the case with Gemma 4 26b A4B - it could take tens of gigabytes of RAM until a fix is implemented (I don’t know if they fixed it yet - I’m still downloading).

[-]

Blindax@reddit

It can help for models that suffer less from ram offloading (mixture of expert models) but unless you will have a separate GPU with fast vram where most of the model’s layers sit, it will still be too slow.