LPCAMM2: does 64 or 96GB make sense for LLMs or large models will be too slow?
Posted by duidui232323@reddit | LocalLLaMA | View on Reddit | 7 comments
Hello!
My next machine will have an LPCAMM2 slot, with 32GB or 64GB 8600 MT/s options, and a future option of 96GB 9600 MT/s (probably not very soon). They have a 128 bit bus.
Currently 64GB comes at a huge premium. Does it even make sense to have 64GB instead of 32GB or any large model that doesn't fit 32GB will be too slow? I cannot find any benchmark online, so I guess all we can do now is speculate.
My uses would be coding, RAG and generic chatbot
MelodicRecognition7@reddit
https://old.reddit.com/r/LocalLLaMA/comments/1rqo2s0/can_i_run_this_model_on_my_hardware/?
Bird476Shed@reddit
The model size is basically only limited by ram. More ram, larger/smarter models are possible.
You decide what speed is acceptable for you.
duidui232323@reddit (OP)
I'd ideally want conversational speed for coding and chatbot. For image gen I'll get an eGPU down the line
XccesSv2@reddit
running models > 32GB are even on DGX Spark or Strix Halo too slow and they have quad channel RAM. If you really need local AI, then try to get a Workstation PC with a graphics card with more than 32GB VRAM instead on wasting money on normal RAM.
duidui232323@reddit (OP)
For now I need a laptop, then I plan to get a workstation next year. But I would still like to run some decent local LLMs on a laptop.
ProfessionalSpend589@reddit
Context size for small models. Full context could take 10GB of RAM or as was the case with Gemma 4 26b A4B - it could take tens of gigabytes of RAM until a fix is implemented (I don’t know if they fixed it yet - I’m still downloading).
Blindax@reddit
It can help for models that suffer less from ram offloading (mixture of expert models) but unless you will have a separate GPU with fast vram where most of the model’s layers sit, it will still be too slow.