Does 2x Dual-Channel improve performance on models?
Posted by Fusseldieb@reddit | LocalLLaMA | View on Reddit | 20 comments
Posted by Fusseldieb@reddit | LocalLLaMA | View on Reddit | 20 comments
Fusseldieb@reddit (OP)
I have recently bought new 2x 3200MT/s 8GB sticks to replace the old 1x 2666MT/s 16GB one. While it probably made a difference, I still have 2 slots left. Would it improve model speed if I put another 2 identical 2x 3200MT/s 8GB sticks to archieve more speed? Or isn't this a thing?
Thanks in advance.
Thellton@reddit
Generally speaking, it's not going to result in an increase in bandwidth for your CPU beyond those two sticks, so I'd leave them empty unless you desperately need the memory to run a larger model and don't mind the response time. furthermore, it may in fact cause a reduction due to signal integrity (or something like that) preventing all of your sticks from reaching their 3,200MT/s speed as I understand it. the specifics are a bit more technical than I can explain so I'll leave that to someone else to explain in detail.
Fusseldieb@reddit (OP)
Would a future upgrade to DDR5 "solve" this issue and let me get more speed, since DDR5 modules have ridiculously high MT/s from what I've seen? Of course I'd need a new PC, essentially, but I'm curious anyway.
Thanks for the explanation!
FluffnPuff_Rebirth@reddit
It will help, but even the fastest dual channel RAM loses to very, very modest GPUs. Fastest DDR5 has the memory bandwidth of some \~60GB/s, while Nvidia 3060's bandwidth is in the mid 300s, around six times faster. Then the likes of 3090 are in the 900s, which is over ten times more.
Fusseldieb@reddit (OP)
If it's cheaper and still read-speed (+-5tok/sec) it could be worthwhile.
Thellton@reddit
depending on the model's size or the size of the active parameters in GB (dense vs sparse model) will determine if you'll be fine. If you want something that has reasonable speed on CPU, I'd look into OLMoE 7B. it's a very interesting Mixture of Experts model with a very comprehensive Arxiv paper that explains an awful lot about the model. some people don't like MoE models because they're not as competent as a same size dense model, but it does come very close to its state-of-the-art Dense model peers whilst having a significant speed advantage. I'm running 2133MT/s and I get 27 tokens per second with that model, the only downside is it only has a 4k native context.
FluffnPuff_Rebirth@reddit
Actual generation is rarely the problem imo, but having to wait for 10 minutes for the 16k token prompt to process gets old very fast. If you mostly just regenerate the same prompt, or use it as an e-mail like conversation, RAM is doable.
ChengliChengbao@reddit
its still two channels, you're getting the same bandwidth if youre using 2 or 4 sticks
so no, i wouldnt expect any performance increase from the ram bandwidth alone, however, getting 32GB of RAM would def allow you to run far bigger models, im talking up to \~22B at reasonable speeds.
Fusseldieb@reddit (OP)
So you're telling me that the computer can't use all 4 sticks at the same time, and it is limited to 32GB 6400MT/s in 2x dual channel?
Regarding the model, if I run a 4 or 5bit quantized one, a 30B should fit, no?
maddogxsk@reddit
If you have 4 slots you may have quad-channels
Fusseldieb@reddit (OP)
Does the 8th gen i9 support this? I'm on a laptop, specifically an ROG G703GX.
Now that people have said that it could even interfere, I'm kinda lost if this is a good idea or not lol
maddogxsk@reddit
Quads are around since ddr3, so it's possible your mobo supports quad
It seems that your notebook mobo only supports dual tho :( (for what i could find)
Fusseldieb@reddit (OP)
Aw that's a shame!
Just_Maintenance@reddit
If you put two sticks on the same channel only one can be accessed at a time so bandwidth stays the same. You get more capacity and that's it.
FluffnPuff_Rebirth@reddit
When you have 2 sticks on dual channel, it's like having two train lines and doubling the number of train carts on the tracks. You can haul more stuff, but the trains won't go any faster, and if the train system is a jumbled mess and poorly thought out, then adding more carts might actually make it slower.
Fusseldieb@reddit (OP)
That's a great analogy ahaha Thanks!
NickCanCode@reddit
Not only it won't speed up. You may get a slow down because consumer grade CPU usually give lower bandwidth when you use 2 pairs RAM instead of 1 pair. You can check the CPU specifications for the details.
For example, here is what intel stated in thier spec sheet. (AMD is the same)
IntelĀ® processors come in four different types: Single Channel, Dual Channel, Triple Channel, and Flex Mode. Maximum supported memory speed may be lower when populating multiple DIMMs per channel on products that support multiple memory channels.
Fusseldieb@reddit (OP)
Oh that's interesting... So they're effectively capping it for consumer grade... Nasty, in a way.
Thanks!
Healthy-Nebula-3603@reddit
That's just half the speed of ddr 5 6400 MHz ...
Fusseldieb@reddit (OP)
Unfortunately I'm not able to upgrade right now due to .. ehemm.. finantial constraints. I gave myself an early present and purchased the two 2x8GB modules instead.