Why VRAM Can Ruin Your Linux Desktop Experience on Thin and Light Laptops

[-]

spaceman_@reddit

This doesn't seem correct. VRAM and GTT on integrated GPUs is the same memory. Only the way it is accessed changes, by going through the GTT mechanism, but it doesn't meaningfully impact GPU performance.

In fact, in the local AI community, people using Strix Halo **recommend** setting VRAM to the bare minimum (512MB) and using GTT for everything. It doesn't seem to negatively impact memory bandwidth, and local LLMs are mostly a memory bandwidth optimization game.

So what real problems were you experiencing when spilling to GTT? Decreased performance? Increased power draw? Or was the actual problem simply your system as a whole running out of RAM and spilling to swap / zram / ...?

[-]

modelop@reddit

Generally GTT memory is significantly slower than VRAM. Especially AMD. Straight VRAM is indeed advantageous.

[-]

natermer@reddit

If you are comparing GTT with Discrete GPUs memory... yes. But that doesn't apply to anything related to iGPU. It is going to be the same either way.

My understanding that the only thing that BIOS settings and similar things set is the minimal amount of VRAM allocates.

So if you are using a modern AMD iGPU with UMA and have a game that requires 6GB of VRAM it doesn't matter in Linux if it the "VRAM is set to 4GB" or "VRAM is set to 512MB". You will get the same performance and memory usage either way.

Which means that if you are putting 8GB in your BIOS settings all that is being accomplished is that you are locking away a bunch of RAM that can't be used for anything else.

From what I can tell of those settings exist and make a difference for some people because on Windows, at least on older drivers or versions, the UMA stuff wasn't fully baked when the newer AMD hardware was released. But on Linux that wasn't ever a problem. And I wouldn't be surprised if the limitations on Windows is gone now as well.

So if you are playing Windows games in Linux it might make a difference?

But even then i doubt it since it ultimately is the Linux drivers that are deciding how much ram to allocate for graphics.

Luckily we have better tools for monitoring GPU usage nowadays so you can confirm all of this yourself. You don't have to guess and you don't have to do back to back FPS testing to get a good idea of what is going on.

I like Mission Center personally. I don't leave it open all the time, but when I am playing a game or doing LLM stuff on my laptop I pop it open just to confirm things.

[-]

shoe_gazin@reddit

Downvoted for speaking the most sense. Love reddit

[-]

Just_Maintenance@reddit

On discrete GPUs, that do actually have their own physically separate memory. On integrated GPUs its all the same physical memory.

[-]

PrefersAwkward@reddit

I've found significantly vetter performance in LM Studio's Vulkan mode when moving a lot of memory to the BIOS allocated VRAM.

In fact it's so much better that it beats ROCm in my experience. IIRC, it's like 30% to 50% higher TPS having BIO-allocated + GTT vs GTT-Only. And ROCm cannot seem to compete with Vulkan due to this advantage. ROCm frustratingly doesn't seem to use the BIOS VRAM in my experience.

I can't say I've noticed a difference in gaming but I rarely game on this machine, let alone benchmark them.

I suspect that GTT has some kind of overhead that impacts some workloads noticeably.

[-]

Unprotectedtxt@reddit (OP)

Yeah, some laptops let you upsize that performant VRAM in BIOS. Looked for it on my ThinkPad and couldn't find the option, unfortunately.

Also found this note here https://rocm.docs.amd.com/en/latest/how-to/system-optimization/rdna3-5.html:

If the GTT size is larger than the VRAM, the AMD GPU driver performs VRAM allocations using GTT (GTT-backed allocations), as described in the torvalds/linux@759e764 GitHub commit.

Because memory is physically shared, there’s no performance distinction like that of discrete GPUs where dedicated VRAM is significantly faster than system memory.

[-]

PrefersAwkward@reddit

oh interesting. Maybe I could steer ROCm to use use VRAM or both and if so, get a better experience

[-]

natermer@reddit

When we get to Linux kernel 7 they have a lot more UMA settings for pre-allocating memory. Playing around with those is probably worth it.

https://docs.kernel.org/gpu/amdgpu/driver-misc.html#uma-carveout

UMA Carveout

Some versions of Atom ROM expose available options for the VRAM carveout sizes, and allow changes to the carveout size via the ATCS function code 0xA on supported BIOS implementations.

For those platforms, users can use the following files under uma/ to set the carveout size, in a way similar to what Windows users can do in the “Tuning” tab in AMD Adrenalin.

Note that for BIOS implementations that don’t support this, these files will not be created at all.

On a side note I have had much better luck using Vulkan interfaces then ROCm for llm stuff. Specifically Llama.cpp. Better performance and stability.

Also with testing LLM tps... you have to be insanely careful with the settings. Like I have seen like a 30-40% drop in TPS just by just adding '-c 0' on certain models despite the documentation showing that it is the default setting.

It is hard to make sure you are actually testing apples to apples.

[-]

HighRelevancy@reddit

Access patterns probably matter a lot. I expect LLM work probably sweeps through large contiguous blocks in very predictable patterns whereas UIs are accessing any random area of the screen that's animating. Maybe it's a latency cost and if you're doing fewer large transactions outright bandwidth isn't a problem. Or maybe it's allocations that choke up, I expect LLMs do most of that up front whereas a web page is going to be allocating new chunks for what you're scrolling down toward all the time.

"This specific workload doesn't have performance problems with this system" has basically never been a significant data point in any modern computing problem.

[-]

Zettinator@reddit

There are some differences. The VRAM allocation is physically contiguous, and that might be important for some things like scanout buffers and video decode acceleration (depends on the GPU). But you really don't need much of it. The memory virtualization needed for GTT has very little overhead on modern systems.

[-]

cyh555@reddit

how true is this

[-]

0-pointer@reddit

0.001

± 0.0005 maybe.

[-]

MrScotchyScotch@reddit

I have the generation right after his, and BIOS UMA option goes up to 8GB.

You can also add a kernel option to expand gtt to whatever size you want (though it's not very stable/predictable). I've used 8GB VRAM and 20GB GTT for LLM's.

It's really dumb that the manufacturer limits the VRAM size on iGPU.

[-]

jermygod@reddit

so.... how that can be a problem for author with "Ryzen 7 6850U CPU and Radeon 680M" but nor a problem for my 12yo i5-4210u with HD4000(or something)?

and why there are no tests, but only statements?

[-]

spaceman_@reddit

I wouldn't be surprised if this is mostly LLM vomit.

[-]

jermygod@reddit

just checked, firefox(3 tabs with youtube and reddit and docs)takes like 270-300MB, accelerated discord \~100, and \~70 for kde, so... i dont think changing those 70MB to whatever gonna be used by some LXQT is gonna do anything good.

[-]

Odd_Cauliflower_8004@reddit

yep. to run models i have to shut down SDDM to recover 2gb of precioys vram, but every time i say it iget shunned.

The question is how does windows use only 100-200mb of vram tops to show the desktop?

[-]

natermer@reddit

I think some of the confusion can be attributable to things like:

https://www.tomshardware.com/software/linux/valve-engineer-shocks-linux-community-with-game-changing-vram-hack-for-8gb-gpus-breakthrough-solution-turbocharges-gaming-by-prioritizing-vram-for-games-while-background-tasks-take-a-back-seat

and related:

https://pixelcluster.github.io/VRAM-Mgmt-fixed/

These posts are talking about the performance penalty of GTT on AMD GPUs. This is very very real.

But the deal here is this with lower end dGPUs, not iGPUs.

Tools like dmemcg-booster and plasma-foreground-booster are used to control what applications have access to dedicated on-board GPU RAM. This way you can push out your background applications while you play games. This way the games have as much access to the fast dedicated onboard GPU RAM as possible.

This can make a big difference if your GPU only has 4 or 8GB of RAM.

But for iGPUs.... see the bottom of the second blog post:

Do iGPUs/APU systems benefit from this too?

I don’t actually know :)

The main problem (system RAM being slower than dedicated VRAM) does not exist on integrated GPUs, because they use system RAM for everything - so effects will most likely be more limited than on dGPUs. Maybe it still has some benefit? It probably requires careful testing to find out.

So, yeah, you will need to test to see if it makes a difference for you. And it is likely to vary by application.

[-]

theschrodingerdog@reddit

Sorry but your article contains many things that are not correct. The first comment on this post already has flagged many of them.

Since you are looking for experiences. Your article mentions that for older iGPUs you should avoid GNOME or KDE. My laptop is a 16 years old Fujitsu with an i7-3632qm, with Intel HD4000 integrated graphics. It runs KDE flawlessly without any hitch.

[-]

snail1132@reddit

That laptop's only 14 years old, no?

[-]

theschrodingerdog@reddit

That's correct, I miss calculated the age.

[-]

HighRelevancy@reddit

The first comment is absolute poppycock. What many things do you have issue with, besides "it works on my machine" (which 1. is very configuration dependent as noted and 2. is very subjective, especially given your probably lower expectations of a very old and basic machine).

[-]

Unprotectedtxt@reddit (OP)

KDE Plasma has indeed improved a ton in efficiency in recent years. It's for sure going to vary case by case. I need to look into replacing some of the heavier Electron apps.

[-]

aloobhujiyaay@reddit

integrated GPUs + low VRAM is such an underrated bottleneck on Linux people blame Linux, but it’s often just memory pressure on iGPUs

[-]

Unprotectedtxt@reddit (OP)

Yeah, once VRAM overflows symptoms can show up on some systems/use cases due to the overhead of the driver migrating allocations between regions under pressure, as well as memory controller contention in cases where the CPU is also active. So this is inherently going to affect some of us, or not, more than others. And from the discussion here so far it seems like this is worse on AMD than on Intel drivers.