AMD GPU Linux Driver Becoming "Really Really Big" That It's Starting To Cause Problems
Posted by uria046@reddit | hardware | View on Reddit | 39 comments
b3081a@reddit
I still don't quite agree with what Linux has been doing regarding device drivers. They're basically trying to integrate every device in the world into a single Linux source tree, so that they can make breaking changes every so often without guaranteeing a stable kernel API. They're also forcing everyone to open source their drivers in this way.
This is nice for relatively simple devices like keyboard, mouse, or even modern wireless cards that implement everything in their on-device firmware. However for a giant device like GPU this could mean millions of lines of code for one single vendor. AMD and Intel are already quite willing to upstream their code. Even if you don't consider NVIDIA's situation, just take a look at what a mess is happening with GPUs on ARM SoCs like Broadcom, Qualcomm, and ARM Mali.
militant_rainbow@reddit
Sorry but this comment results from a poor understanding of both proprietary drivers and the Linux kernel.
Look up what binary blobs are for proprietary drivers. If you’ve actually ever worked with a Linux kernel you would realize that it’s not forcing you to do that.
Tuna-Fish2@reddit
A lot of the kernel development community considers all proprietary Linux drivers to be GPL violations. The kernel team used to routinely break backcompat with all out of tree drivers out of principle.
Strazdas1@reddit
And they wonder why people wont use linux...
Tuna-Fish2@reddit
Doing this strictly made Linux better, and Windows is moving towards the same policy. That is, requiring that the source code of everything that's running in kernel space is available to the kernel devs. MS does this with various certification programs, Linux can't do that because they cannot arbitrarily add license terms, so they took the path of making it impossible to maintain out-of-tree drivers.
waitmarks@reddit
What would you propose as a solution? Make all drivers kernel modules?
Zakman--@reddit
Hybrid kernel ala Windows. Linux isn’t sustainable with an unstable kernel ABI. I can see a microkernel supplanting Linux if it doesn’t make long term engineering decisions.
Tuna-Fish2@reddit
Linux has been sustainable with an unstable ABI for 30 years now. And while this causes pain in some instances, it has been a massive win in others.
Zakman--@reddit
It’s only now getting to a point of asking how well can it scale. Wasn’t a problem until now. The issue is with how big the kernel’s become and how big of an attack surface it represents to proprietary drivers.
broknbottle@reddit
Those proprietary drivers should become free and true open source and then there won’t be any attack surface. Proprietary drivers are those developers issues and not the kernels problem.
Zakman--@reddit
Nowhere near as simple as that. Nvidia for example won't open source their drivers (probably because they believe they'd lose their competitive advantages). Not to mention that Linux changes its kernel ABI all the time so a driver that wants to keep up to date has to also be constantly updated. Microkernel benefits are too large to ignore.
Strazdas1@reddit
No GPU has opensourced their drivers/ They all use lumps.
moofunk@reddit
I’ve Hurd that before.
Zakman--@reddit
Issue with Hurd is that the Mach microkernel it used was (maybe still is?) extremely slow. Sel4 proved that you can have extremely fast microkernels while still being much more secure than monolithic kernels.
broknbottle@reddit
Came here to see this
reddit_equals_censor@reddit
ah yes, let's do what windows is doing.
microsoft has a great stable kernel, that is extremely stable and widely used. the whole os is extremely stable and widely used for its stability and performance. :)
/S!!!!!!!
Zakman--@reddit
The issues with Windows aren’t to do with driver development though. XNU is also a very good example of how Apple have managed to share the kernel between 3 different devices (Mac, iPhone and iPad). I don’t think the small performance advantages a monolithic kernel has (around 5-10% on microkernels, maybe close to 0 for hybrid) outweigh the massive benefits of hybrid/micro-kernels.
Netblock@reddit
I'm not sure how the kernel's developmental philosophy is related. Are you saying that being open source caused this problem?
I doubt that; the problem the OP article describes is due to AMD not putting in effort to de-duplicate code; the problem is going to exist regardless of general kernel API and source openness.
b3081a@reddit
It's not the open source part having problems. It's that they have to cram every generation into a single module within the kernel. On Windows they've split pre-Navi and post-Navi generations into two separated drivers to reduce the size but on Linux that's not the case.
My point is that separating the larger drivers and letting vendors do their work is a better solution.
marmarama@reddit
This is absolutely nothing to do with the driver being open source or for Linux.
AMD is entirely free to split the amdgpu driver into smaller modules, e.g. a common core driver and smaller, more specific drivers that are only loaded for specific hardware. Several Linux driver families already work this way, for example many Bluetooth and WiFi drivers.
AMD has also already split off support for older Terascale and earlier GPUs (
amdgpu
supports GCN and later,radeon
supports pre-GCN GPUs).Carrying on adding to the single amdgpu module is just an engineering decision made by AMD. Probably because refactoring it into multiple modules is engineering time better spent on bringing up new hardware and fixing bugs, and it hasn't been an issue until now.
broknbottle@reddit
nvme-core nvme
Netblock@reddit
They did that once before, radeon vs amdgpu (they intersect for HD7000 to Radeon 200) . It's basically AMD's choice when to draw another line.
Though starting with Navi, AMD moved away from using PCI IDs for hardware-subsystem-specific driver codepaths, and to a discovery system (table in vram tells driver what it has), that kinda behaves like a LUT.
I point this out because I believe AMD is slapping in new code trees, each of them standalone from one another, for every hardware subsystem update. Tons of code duplication with little reuse.
I wonder how much fat LTO could trim off.
spazturtle@reddit
Note that the driver is still much smaller than both AMD's and Nvidia's Windows drivers. If you installed Windows 11 on a PC with a Pentium 4 CPU and 5400RPM hard drive you would also get long boot times.
Strazdas1@reddit
If you installed windows 11 on a 5400 RPM drive you would have a very bad time. Windows since 10 arent designed to run of HDDs anymore and will crap out on you.
throwawayerectpenis@reddit
Switching to Linux made me enjoy using my PC so much more. Maybe its because I've been on Windows for over 20 years, but I absolutely love using Gnome. Going back to Windows just feels like I went back in time, ofc I boot back from time to time when I want to play a game not working on Linux ...but I digress
RedTuesdayMusic@reddit
CachyOS takes me about 11s to boot compared to 7.5 on Win 10 IoT Enterprise ltse. Non issue
AutonomousOrganism@reddit
TLDR: The driver is almost 6 million LOC, large part is auto-generated code for each supported GPU, and takes almost 10 seconds to load at boot on older PCs, resulting in the boot splash screen not being displayed.
picastchio@reddit
Number of lines in source code (which is a different issue) is irrelevant to the boot time. The compiled driver module is <20MB on disk and is not loaded if there is no AMD driver. An old system with a 5400rpm HDD can take seconds to load this 20MB file which is the problem here for this specific boot splash program.
Zamundaaa@reddit
If it's that slow, then taking a few seconds to load the GPU driver is the smallest of usability concerns of such a PC. Most GUI programs have executables and libraries far exceeding 20MiB nowadays.
randylush@reddit
Exactly. Optimizing 20mb because your 5400 RPM hard drive can’t handle it is like rearranging deck chairs on the titanic.
Gl0ckn@reddit
Good guy Nvidia keeping their driver in the kernel minimal 😏
Berengal@reddit
This headline is a lot juicier than the actual story...
--viti@reddit
Now watch as everyone sweeps this under the rug cause it goes against their "AMD is perfect" narrative
Ashratt@reddit
this is just the boot splash screen not appearing in time
all modern gpu drivers are massive and have million lines of code
advester@reddit
10 seconds to load a driver is crazy.
Kryohi@reddit
That's on an old PC with a slow mechanical hard drive and an old cpu.
ranixon@reddit
But it isn't a really big problem, it's not causing kernel panic for example, and can be solved
Therabidmonkey@reddit
Tbh this is more of an open source vs proprietary situation. One of the recommended fixes is to just use the AMD driver.
iBoMbY@reddit
wat?