AMD’s new Variable Graphics Memory lets laptop users reassign their RAM to gaming
Posted by TwelveSilverSwords@reddit | hardware | View on Reddit | 90 comments
bubblesort33@reddit
So they added an existing BIOS feature that's been around for a long time to Adrenaline?
Strazdas1@reddit
There was no such BIOS feature. Some specific models had a additional feature you could enable in BIOS but that was on OEM to add and was very rare.
Proof-Most9321@reddit
in the A16 advantage edition this option does not exist in the bios.
996forever@reddit
Smart Access Memory already does that with dGPU laptops.
Proof-Most9321@reddit
What? do You even know for what SAM is?
tsukiko@reddit
IIRC, Smart Access Memory (resizable BAR) is for allowing a larger addressing window for dedicated video RAM, and is not an interface for changing the amounts allocated for a graphics hardware sharing the main system RAM for integrated graphics.
hishnash@reddit
Or they could put in work to have a proper unified memory model were you do not need to pre-empbivly reserver memory for the GPU and you can avoid copying data that is needed by both the cpu and GPU... they are doing it on consoles after all.
lightmatter501@reddit
iommu translation is a non-negligible overhead for a lot of GPU workloads. You need to have dedicated memory or you take a big performance hit. The iommu is non-negotiable.
hishnash@reddit
So long as your GPU and Cpu use the same page table sizes and developers are explicit about knowing that if they mutate the memory then there are concurrency issues they need to deal with expliclty this does not have issues.
Sure if you want cpu and GPU to concurrently mutate memory at a bit level granularity then yes there is a huge cost but that is not what is needed.
Strazdas1@reddit
You failed the moment you expected developers to adhere to any kind of standard.
opelit@reddit
You should ask MS why they don't... 😂 Windows is shit to do things like that. Somehow it's possible on their server chips running Linux heh 😅
wizfactor@reddit
Would’ve been cool if the Steam Deck APU also used Unified Memory via SteamOS.
trololololo2137@reddit
works fine on Intel
opelit@reddit
Idk where Intel had unified memory. They just don't use dedicated one. So they don't take away a certain pool from CPU. But it's still use up to 50% resources as shared one. The same way as AMD.
TwoCylToilet@reddit
Aaaand that's how you get non-upgradable, soldered memory where the base model is 8GB and it costs 3x the actual DRAM chips' price to get a memory upgrade from the manufacturer with no alternative options. If you want unified memory, buy a console, or a MacBook, or any of the X1 handhelds.
PMARC14@reddit
Why would unified memory a software concept require soldered memory. You can totally do this with normal ram if everyone works to support it. Lastly everyone is already soldering memory for the APU's rn not seen any deployments of CAMM2 yet.
octagonaldrop6@reddit
Because RAM is typically slower than VRAM. If you are doing unified you want it as fast and low-latency as possible. This is easier if it’s soldered to the board and closer to the SOC.
PMARC14@reddit
The ram Apple uses though isn't special it is just normal LPDDR5X you can just accomplish the closeness with LPCAMM2 and also it is not a necessity. It isn't like when you aren't using unified memory that the RAM allocated to the GPU is faster
octagonaldrop6@reddit
Huh you’re right. Though it does seem to have higher bandwidth.
Apple also can get away with slower VRAM because their users aren’t typically using applications where it matters. This applies to laptops as a whole but Apple desktops even do this in pretty sure.
PMARC14@reddit
The higher bandwidth is from the bigger bus width on Apple products, which x86 has been slow to pickup though hopefully all the AI stuff will change this. I know atleast Strix Halo is getting a 256 bit bus which can be fed by two CAMM modules so I hope someone is creative in the laptop space to center the chip and flank it with the modules
salgat@reddit
Even if this is the case, I personally don't mind it given the performance improvements. Many folks, including myself, buy the amount of RAM we plan to use the for the life of the hardware, especially if we're performance enthusiasts. I have 64GB of memory, and have no plans of upgrading or changing that until my next CPU upgrade.
monocasa@reddit
There's nothing about unified CPU/GPU memory that would require soldered in memory.
Glebun@reddit
Bandwidth.
monocasa@reddit
This is for iGPUs that already share the same physical RAM as the rest of the system, they simply carve a piece out to use as pseudo VRAM at boot time.
Glebun@reddit
Oh, makes sense
TwelveSilverSwords@reddit (OP)
LPCAMM exists.
Glebun@reddit
How does that compare to GDDR6X?
TwelveSilverSwords@reddit (OP)
An LPDDR5X-8533 LPCAMM2 module has a 128 bit memory bus and can do 136 GB/s bandwidth.
GDDR6X bandwidth depends on the the bus width.
Coffee_Ops@reddit
Also reduces EMF noise which can enable higher data rates.
Obviously it's only one factor here; you can use shielded CAT7 cabling to cut noise but if you're still hooked up to a 100mbps port you aren't getting more bandwidth. But the fastest port in the world won't help you if you're using unshielded CAT5.
Glebun@reddit
So nothing (tech-wise) would stop NVIDIA from making a 4090 with socketed memory without compromising performance?
TwelveSilverSwords@reddit (OP)
Yes. The only issue would be such a hypothetical 4090 will be like double the size because it has to host all the LPCAMM modules.
Glebun@reddit
Oh, interesting! Thanks, for educating me, TIL.
SentinelOfLogic@reddit
That is not a requirement (the low end Apple chips don't have massive bandwidth anyway) and high bandwidth could also be met with more channels.
soggybiscuit93@reddit
More bandwidth could be met with more channels in a technical sense. But in a business sense, how would that work? I know Apple can do it because they're more vertically integrated, but the APUS are all gonna go to 256bit? And then have 4 ram slots on a laptop? And the dies are gonna get larger (and more expensive) to accommodate the extra channels?
And then those CPUs will have different sockets and motherboards from that company's standard dual channel CPU?
I'd like to see 256bit APUs, but it just seems that using LPDDR5X is a more cost effective way if you need more some more bandwidth.
Glebun@reddit
Apple's M2 (non-pro) has 100GB/s bandwidth
hishnash@reddit
Well if you want a high bandwidth memory to provide the bandwidth needed for a GPU yes.
Same reason memory on a dGPU is sodlred, getting the bandwidth needed over socketed memory takes up a LOT more space and power (10x to 100x).
TwelveSilverSwords@reddit (OP)
LPDDR5X-8533 LPCAMM can give 136 GB/s of bandwidth. A decent amount for an APU/SoC.
hishnash@reddit
Depends on the size of the GPU on said SOC. For a small GPU (Nintendo switch level) yes.
But for larger GPUs then no, if you look at AMDs most recent laptop APU in many situations this is memory bandwidth staved on GPU operations.
TwelveSilverSwords@reddit (OP)
Strix Point has quite a weak memory subsystem to feed the GPU.
X Elite.
1 MB GPU L2 + 3 MB GMEM.
6 MB SLC 136 GB/s memory bandwidth.
Lunar Lake.
8 MB GPU L2.
8 MB SLC.
136 GB/s memory bandwidth.
Strix Point.
2 MB GPU L2.
No SLC.
120 GB/s memory bandwidth.
SentinelOfLogic@reddit
And that would only be one module, if you had an other you could double that.
hishnash@reddit
Yes but that is the limit, you cant go to 3 or 4, and even with 2 your still looking at a range low end GPU..
surf_greatriver_v4@reddit
Just need to wait a while for true mass adoption
IC2Flier@reddit
People are asking for the impossible: somehow these guys want a socketed SoC that has both the APU and memory as something you can put in, I dunno, an M.2 slot.
hanotak@reddit
The way to do it is to have a large shared memory, like apple does, and then also have DIMM/CAMM2 slots like a normal laptop. That would add some complexity, breaking up DRAM into tiers like cache is, but it would allow for a best-of-both-worlds design where you have a unified memory architecture which can be expanded with traditional ram if necessary.
TwelveSilverSwords@reddit (OP)
Why are people conflating Unified Memory and on-package memory?
Flaimbot@reddit
because the average redditor ist a 16yr old who knows nothing about either outside of a few buzzwords they heard
TwelveSilverSwords@reddit (OP)
I mourn the closure of Anandtech. When I was a teen, I used to hungrily read their articles. It laid the foundation for much of my hardware knowledge.
hanotak@reddit
On-package and unified often go hand-in-hand, for the more modern SOCs, at least. Not sure why, but it probably has something to do with the benefits of a unified architecture being greater when memory access latency is as small as possible for both the CPU and GPU, which would obviously be when the memory is on-package.
SentinelOfLogic@reddit
Stop spreading disinformation, Current AMD APUs and Intel CPUs with integrated graphics use unified memory (and have done so for about a decade), there is also only a tiny latency benefit from putting the memory on the package.
TwelveSilverSwords@reddit (OP)
Chips&Cheese found that on-package memory in Apple SoCs doesn't give a significant reduction in latency.
Apple uses on-package for other benefits: - Less motherboard complexity.
- Allows to scale to very wide buses (512b<) in a cheaper manner.
- Space savings.
- Minor power savings.
ILoveThisPlace@reddit
I literally just watched a video on a new tech for laptops thats already out for screw in ram modules. It's awesome.
IC2Flier@reddit
link pls
ILoveThisPlace@reddit
Look up LPCAMM2 I believe it was called.
wrecklord0@reddit
And that's why I am considering paying out the ass for an M2 ultra... because PCs are light years away in capabilities. And it sucks because I do not care or need the macOS ecosystem, but there is no alternatives.
TwelveSilverSwords@reddit (OP)
Unified Memory Model ≠ On-package Memory
Glebun@reddit
Not if you want to get proper bandwidth that's comparable to GDDR
TwelveSilverSwords@reddit (OP)
GDDR also comes soldered, so what's your issue?
Glebun@reddit
What do you mean? I'm saying that you can't match SOTA GPU bandwidth with socketed memory.
TwelveSilverSwords@reddit (OP)
Apple also does it, I believe.
techraito@reddit
Funnily enough the PS5 does this, too.
ejk905@reddit
This is a Windows thing, not an AMD thing. Linux has this working much better.
cesaroncalves@reddit
Amd Strix Halo
hishnash@reddit
In HW this is possible but on windows this is not. There are no good apis for a shared data model between GPU and CPU for generic data flows.
Dramatic-Bill-5790@reddit
Will 7840hs will support this ?
Tonybishnoi@reddit
I have a question: why do Intel iGPUs not require reserving a portion of system RAM exclusive to the GPU at all? AMD iGPUs need a minimum of 128MB (configurable in BIOS) system RAM always taken for the iGPU.
Intel has DVMT since core 2 days (back when iGPU was on the motherboards). AMD's approach seems like a waste of RAM.
Ryzen 3000 series laptops (Zen+) had 2 GB allocated to the iGPU out of the box with no option to change it in BIOS and guess what? The majority of cheap laptops came with 8 gigs of RAM, effectively leaving users with 5.5GB of usable RAM.
One more question: If I set dedicated iGPU RAM to 128 MB, windows uses "Shared system memory" to store GPU data which can't fit in 128MB. Can the Radeon iGPU directly access its data in the "Shared system memory" or does it need the CPU to transfer it first to the "dedicated RAM" address before it can utilise the contents? Intel iGPUs seem to work without any dedicated RAM just fine.
I'm interested in learning more about this topic.
b3081a@reddit
The GPU can access shared system memory without being transferred back to dedicated RAM. However it needs to go through an additional layer of IOMMU/IOTLB address translation, which introduces additional overhead.
From my observation with OpenCL benchmarks the IOTLB seems to cover \~16MB of shared memory pretty well. When allocating shared memory it costs \~10% bandwidth when benching large buffers, and the GPU memory latency skyrocketed to \~4x when doing random access at >16MB buffer.
In most games GPU don't do latency sensitive random access stuff like what CPU does, with an exception being ray tracing. You'll see massive ray tracing performance penalty when using shared virtual memory.
KnownDairyAcolyte@reddit
Does that mean that setting the vram up to 1 gig or 2 gig or whatever is more of a default preconfigured state where the cpu can still get at it but the cpu pays the translation cost?
total_cynic@reddit
Thanks for this.
I'd intuited there was likely a performance hit, but figures for how significant it is and what kind of workload would be worst impacted are helpful.
-WingsForLife-@reddit
Yeah my Meteor Lake laptop has 128mb as preallocated(out of 16gb), and whenever I play some games(Zenless Zone Zero recently) it keeps swapping shit in and out of memory.
Either shows up as stutters if I spin the camera faster than it likes or pop-in(with stutters) from approaching objects.
It doesn't show up if I plug it in, but the average framerate is the same.
total_cynic@reddit
Be interesting to look at the 1% lows. This sounds more like different power profiles though, if it is differs between battery and plugged in?
-WingsForLife-@reddit
Yeah, it could be, it's a recent buy(2 weeks) with the only thing I changed is the thermal paste.
I'll try and test out more if I ever have the time to dig in.
dj_antares@reddit
Who told you AMD doesn't? You really think 128MB would be able to run anything?
FalseAgent@reddit
setting it to 128MB is the DVMT for AMD iGPUs, all iGPUs use shared system memory.
AMD just has an additional feature that lets you reserve a portion of the memory for graphics if you need the predictability (and to force VRAM to not go into memory swap), some games/apps will behave better if they understand how much VRAM usage it should target instead of DVMT which is sort of infinite because it can dip into memory swap.
TwelveSilverSwords@reddit (OP)
I am curious how Snapdragon X Elite's iGPU works in this regard.
RedTuesdayMusic@reddit
Another convenient excuse to not extincting 16GB models.
ejk905@reddit
Assinging larger portions of unified memory to dedicated GPU memory on a APU is due to Windows and app compatibility.
On the Windows side shared GPU memory has 4KB page size and is behind two-level page tables due to virtualization based security and IOMMU policy. Dedicated GPU memory can eliminate one level of indirection and the GPU vendor is free to set a more optimal page size like 64KB or larger. This helps memory latency bound situations.
On the compatibility side some apps only consider dedicated GPU memory as reported by Windows as the total GPU memory. These apps then fail/complain when this size is only 256/512MB. The only fix AMD can provide is crank up the dedicated VRAM.
Classic-Study7112@reddit
Hey RAM, sorry for the late meeting invite. Look, I'll just cut to the chase: there's been some budget cuts, and unfortunately you're being reassigned to gaming. No, it has nothing to do with your performance and it was not my decision. The executives have dictated that one person from each team has to go and you've drawn the short straw. HR is here if you have any questions.
KnownDairyAcolyte@reddit
I thought they already did this dynamically ever since the huma effort. Has that not been the case?
ET3D@reddit
Until not long ago (a year or two?) the fixed memory was defined in the BIOS and then extra memory was allocated on Windows as needed. At some point drivers started to consider system memory and usage to allocate a larger contiguous block of RAM if gaming. AMD didn't advertised this. Now AMD gives users the option to configure this themselves, which is best.
nic0nicon1@reddit
+1. Don't all APUs already technically support unified system and video memory via HSA/HMM since GCN since 10 years ago? Both kinds of memory share the same virtual address space, and if it's an integrated CPU, data transfer is zero-copy. Furthermore, isn't an APU a well-known (albeit slow) way to run large large-language models due to its "unlimited" VRAM? Why is reserving video memory still a thing. Is it due to legacy technical limitations? I found the situation is extremely confusing,
iBoMbY@reddit
Couldn't you do that already about 25 years ago?
MeelyMee@reddit
What's new about that?
halotechnology@reddit
Windows already does this just look at your task manager ?
What's the point of this?
I have tested before on windows it does work !
spazturtle@reddit
When Windows does it it uses 1.1GB of RAM to use as 1GB of VRAM due to the mapping overhead and also comes with a latency penalty.
The way the driver does it doesn't have the overhead or extra latency, at the cost of it being strictly reserved for the GPU.
PMARC14@reddit
I think they are just bringing a bios/driver feature into Adrenaline so you can manage it directly. Pretty nice
nutral@reddit
Its more of a stopgap to stop games from working. because all the data would go over pci-e and that is just slow. DDR5 would also be a bottleneck.
Dual channel DDR5 5600 ~ 60-80GB/s
PCI-E 4.0 16x would have 32GB/s
PCI-E 5.0 16x has 64GB/s
an RTX 4060 has 270GB/s bandwidth
The 4090 and 7900XTX even have about 1TB/s
cgaWolf@reddit
Good, now stop building laptops with 16 gigs of RAM (or less).
It's 2024, and the laptop i bought 12 years ago had that.
qywuwuquq@reddit
Why is the 75 magical number? Apple SoC can also let the gpu use up to 75 percent of all ram. Is there a specific reason for this 75?
EETrainee@reddit
The OS still needs to reserve a portion for itself.
ethanjscott@reddit
Nice, hopefully intel catches up