Linux 6.14 will have amdxdna! The Ryzen AI NPU driver
Posted by GreyXor@reddit | linux | View on Reddit | 65 comments
Posted by GreyXor@reddit | linux | View on Reddit | 65 comments
SweetBearCub@reddit
As a person who uses Linux Mint, is there an approximate schedule for when this particular kernel version might fit into Mint's release schedule?
notam00se@reddit
Mint is based on Ubuntu LTS, so out of the box support might be 26.04, doubt LMDE going to Trixie will catch it.
KnowZeroX@reddit
Mint is now HWE by default. So when 25.10 is released, it will come with that kernel.
notam00se@reddit
ooh, nice!
SweetBearCub@reddit
Thanks!
gmes78@reddit
Assuming you're using the HWE kernel: 6.14 isn't going to release in time for Ubuntu 25.04, so you'll have to wait until Ubuntu 25.10 is released and its kernel is made available to Ubuntu LTS through the HWE kernel.
kansetsupanikku@reddit
Are you using this NPU? You can compile custom kernel anytime you want, perhaps you might even contribute to its development at this stage.
SweetBearCub@reddit
No, but I'm strongly considering upgrading to a system with such an NPU very soon, from my ~2021 system with a Ryzen 59xx.
I'm not sure how to compile my own kernel, but I'm willing to learn.
jmnugent@reddit
This is probably a dumb question,. but is there a list somewhere of what Processors or Motherboards support NPU's ? (I know this is a dumb question.. I haven't hand-built a PC in about 20 years). I'm assuming this is AMD Ryzen related ,. so any recent AMD chip ?
If I was planning to buy or build a PC to take advantage of this,.. any particular thing I'd need to remember or look out for ?
KnowZeroX@reddit
Wikipedia?
https://en.wikipedia.org/wiki/List_of_AMD_Ryzen_processors
Just search for NPU
jmnugent@reddit
Very helpful, thank you !
MorphiusFaydal@reddit
Mobile - the Ryzen AI chips, and some of the mid-high end Ryzen 7600/7800/8800 chips.
Desktop - none. Yet.
Irverter@reddit
The 8600G and 8700G are desktop processor that have NPU.
And the 8700F, but only with certain Radeon GPUs.
YKS_Gaming@reddit
Missed opportunity to call it amdxdma, its worthless ai stuff anyway
afiefh@reddit
There are plenty of valid use cases for this "AI stuff" that are getting overshadowed by the bullshit the industry is pushing.
Noise cancelling during video calls is much better with AI. Adaptive fill for image manipulation is much better with AI. Obviously both of these can be done better by humans who were trained in that field, but when I call my mom, I'm just happy that she can hear what I'm saying without worrying about the noise in the background.
The problem is that the whole world is going crazy trying to push LLMs with the promise that they'll be able to achieve AGI. Unfortunately LLMs are not very useful yet, and nobody really knows if we can get AGI through this stuff.
YKS_Gaming@reddit
And I am willing to bet that the two examples you listed do not require an NPU, or even on device machine learning.
Irverter@reddit
Technically raytracing doesn't require a GPU, it's just so much better with a GPU.
Same logic with things on a NPU.
YKS_Gaming@reddit
Still has to undersample, dither, and smear your way out of reflections
Irverter@reddit
And? part of the process is hardware accelerated rather than be fully software. That's the point.
afiefh@reddit
Which part of "much better" was hard to understand?
Literally nothing requires an NPU. Everything an NPU can do, a CPU can do as well. That is the point of Turing completeness. An NPU is an accelerator for these tasks, just like a GPU is an accelerator for graphics.
Being able to do graphics cheaply and efficiently is what enabled them to be used everywhere (and originally it was shit, remember all the translucency in Vista? All the bloom effects in random places?) before people figured out how to use them well. The same thing is happening with an NPU: Right now companies are using them in crappy ways, but eventually they'll just be there in the background being used for whatever is appropriate.
Without acceleration, you are limited to very rudimentary noise cancellation or adaptive fill. With acceleration you can employ much better techniques because the person on the phone wants to hear what you say with a 5ms delay, not a 5seconds delay.
SchighSchagh@reddit
Magic the Gathering (yes I mean the card game) is technically Turing complete. Being Turing complete, by itself, is not actually very useful I'm afraid.
afiefh@reddit
Nobody claimed it is useful, but it means that you can perform any computation that a CPU/GPU/NPU can perform using MTG. It will be slow as molasses, but that's exactly the point I'm making.
SchighSchagh@reddit
This bit is false though. Doing certain things in real time (or at least reasonable amounts of time), or with low power usage, certainly does require an NPU or other comparable accelerator.
da2Pakaveli@reddit
The game of life is also turing complete
mycall@reddit
To be fair, most people have no idea about this.
teddybrr@reddit
Nothing requires an NPU. It is there to optimize this load using as little power as possible.
MarioGamer06@reddit
Yet again, AMD shows they care about their customers. If only team Green would learn...
cAtloVeR9998@reddit
I mean, support is only landing like nearly 2 years after consumer launch. Support should have been ready a long time ago.
kansetsupanikku@reddit
Right? Imagine NVIDIA having that sort of delay for anything AI-related. The standard is indeed different.
b3081a@reddit
That's mainline support which NVIDIA never had in the first place.
AMD's out of tree module has been their for quite a while (https://github.com/amd/xdna-driver) and a lot of the official Xilinx XRT samples are already usable on this stack. The runtime (and recently device kernel compilers) are fully open source which is also much nicer than what NVIDIA has been doing so far.
SchighSchagh@reddit
Problem is, none of the AI libraries have any support for this. I don't think even ROCm supports this, let alone Tensorflow, PyTorch, ONNX, etc. RyzenAI is still Windows-only. They've vaguely mentioned that Linux version is coming at some point, but... it's vaporware. Also, on Windows RyzenAI doesn't really play well with integrated graphics and/or NPU either. It works on the newer chips I believe, but the 7040 mobile chips have been left out. It's kind of hard to see how exactly AMD is actually better than NVIDIA here in real, practical terms.
b3081a@reddit
XRT & ONNX Vitis EP should work.
spezdrinkspiss@reddit
nvidia have supported CUDA on linux ever since its release, while amd have been dragging their asses for 2 years lol
TheBrokenRail-Dev@reddit
For AI/CUDA/GPGPU tasks, NVIDIA's Linux support has always been exemplary. Mainly because that's where the money is.
Java_enjoyer07@reddit
A new Geforce native App for Steamdeck is dropping and Nvidia is focusing on better Wayland Intergration on the Steamdeck and SteamOS. It seems like they see SteamOS/Linux as a new rising market and are starting to actaully support Linux. THE YEAR OF THE LINUX DESKTOP I SWEAR THIS TIME!!!!
SeriousPlankton2000@reddit
Since I'm out of the loop: What kind of operations do these NPU support? Is there a use case beyond emulating AI "neurons"? How do these compare to GPUs?
DGolden@reddit
High-level Blurbs about the amd xdna architecture in particular, the high-level diagram useful -
https://www.amd.com/en/technologies/xdna.html
https://www.anandtech.com/show/21469/amd-details-ryzen-ai-300-series-for-mobile-strix-point-with-rdna-35-igpu-xdna-2-npu/2
https://images.anandtech.com/doci/21469/AMD%202024_Tech%20Day_Vamsi%20Boppan-12.png
b3081a already linked the amd xdna-driver repo
Poking about idly for my own learning (beware I'm not really in the field, and don't have one of these devices):
That had some interesting info about what executables for the xdna NPUs actually are - https://github.com/amd/xdna-driver/blob/main/src/driver/doc/amdnpu.rst#application-binaries
that links to https://docs.amd.com/r/en-US/am020-versal-aie-ml/AIE-ML-Tile-Architecture?tocId=XpoRtpY5dV0BaKcnMFOq_w
(Note "The XDNA2 AI unit is based on the Versal 2 Dataflow processors from AMD's Xilinx FPGA division.")
Remember amd now owns xilinx, but xilinx stuff continuing not killed off.
https://www.amd.com/en/corporate/xilinx-acquisition.html https://www.xilinx.com/htmldocs/xilinx2023_2/aiengine_ml_intrinsics/intrinsics/index.html
Programming them directly ....may be not for the fainthearted - even if they do seem pretty well documented with open source tools as b3081a says.
If you ARE using them for AI apps and/or wanting higher level, do seem to be steered to doing (py)torch model -> onnx -> (model quantization step) onnx -> amd's provided execution provider for onnx that compiles to executable for their NPU architecture -> run on the NPU
https://pytorch.org/docs/stable/onnx.html
Could do a lot of things within onnx's general graph paradigm, not just compile pytorch models to it. https://onnx.ai/onnx/intro/concepts.html
But (big but) as SchighSchagh also already mentioned in response to b3081a in this thread, not all the bits seem to be out officially for Linux, at least not yet, the relevant download is currently a zip for windows only. (well, there's existing embedded-linux stuff if you drill down, but I mean as a relatively clear "here install the thing" for desktop linux like there is for desktop windows).
They may just do it shortly though?
https://xilinx.github.io/Vitis-AI/3.5/html/docs/workflow-third-party.html?highlight=xdna#onnx-runtime
->
https://onnxruntime.ai/docs/execution-providers/Vitis-AI-ExecutionProvider.html#runtime-options
note "IPU" is same thing as "NPU" https://ryzenai.docs.amd.com/en/latest/getstartex.html
papercrane@reddit
Typically NPUs provide operations for working with matrices. Things like matrix convolutions and multiplication. They can be used for things other use cases that rely on working with matrices, like signal processing.
IAmRoot@reddit
Usually low bit width, though, so don't expect to be able to leverage such hardware for scientific computing workloads and other more general purpose matrix operations, though. The Ryzen NPU apparently uses 8 bit mantissas and a shared 8 bit exponent that's common between all matrix elements. This hardware is designed to do very low precision operations very fast with low power consumption. This means that even for matrix operations they aren't particularly useful outside of AI workloads.
JEDZENIE_@reddit
NPU is old tech, phones in 2017 already had those but as we gonna use more ai in different workloads we need to have better ways to use it efficiently (ai itself is old as well we just hit a next generation which allows as to build new tools using it and some gimmicky stuff as well like AI image generation etc.). NPU is a simply accelerator just like gpus and even thought gpus can do this stuff as well for mobile it will be better for no reasone this chip is (if Im not mistaken about Ryzen AI) a mobile CPU for laptops were power consumption matters even more due to limited battery life as well thermals, it just helps a lot cause you can keep high calculation and low power.
Things that can use this hardware accelerator is for example noise cancelation (during phone calls or filters etc.), image upscalers, in certain places AI performs better than classic algorithms we use today, image generation of course,translators and probably more subtle stuff I don't know about, I think you can use it to do FG.
Also NPUs can't replace the gpus cause the same as GPU it is meant for doing something faster in this case stuff that benefit from neural-processing like AI and stuff. Graphical stuff benefit from ai-tools but unless you Play something like AI-generative-minecraft it can't render things like GPU. (Of course Im not an expert so do extra research on that and others feel free to correct me if I'm wrong)
SeriousPlankton2000@reddit
Thanks. I'm old enough to have bought a 80387 FPU so I do understand what you say.
grigio@reddit
Are there any improvements on ollama?
Freyr90@reddit
What kind of API does it provide? Is there any standard already?
edparadox@reddit
And what will you use it for?
GreyXor@reddit (OP)
Accelerate Neural and AI stuff
exactly like longtime ago, graphical stuff was calculated on CPU, as we got more and more graphical stuff we needed GPU. Same story but for AI
edparadox@reddit
I know what it is for, no need to be condescending, especially if you cannot even read a sentence.
For someone hyped by this, you certainly don't seem like you know how and by what a NPU is used. Nobody used "neural" that way.
Did you at least see the requirements of "only" running a local LLM NPU-wise?
GreyXor@reddit (OP)
Sorry if my response came across the wrong way; I wasn’t trying to be condescending. I was just drawing an analogy to help explain why NPUs are significant for AI.
Ponnystalker@reddit
Ok so NPUs calculare matrixes and tensor operations + alot of other ones in parallel just like the gpu
On the other hand GPUs usually calculate raster, render, shaders, etc for graphical heavy tasks but it can also calculate matrix in parallel just like ( or similar ) the NPUs
SealProgrammer@reddit
You ok there buddy?
stevorkz@reddit
How on earth was his response condescending? I found it to be a perfectly reasonable and non offensive reply to your question.
notam00se@reddit
Intel has plugins for Gimp for their NPU work.
Eventually I assume digikam will start supporting it for things like face detection and recognition (CPU only right now). kdenlive might get local subtitle or ai masking. vlc just announced realtime ai subtitles processed locally, will allow better parity vs windows and macos npu support.
But first is needs to be easily available in LTS distros, something both AMD and Intel need to work on.
mycall@reddit
What can AI be used for? All kinds of things.
RealASF1020@reddit
I'm looking forward to this, hoping that NPUs become something that act as a replaceable part in a PC like a CPU does now (the socket will probably be M.2 like all the current ones are but I could also see alot of PCIe ones come out)
NatoBoram@reddit
I wonder if we're going to end up with NPUs next to our GPUs eventually
KishCom@reddit
Such great news, I have been waiting for this since like September. I've got a laptop with an HX370.
tisti@reddit
If I remember correctly, it's around 16TOPS which is not much. But if software can unload work there instead of CPU or GPU then all the better.
INITMalcanis@reddit
Presumably enough for some basic functions?
5c044@reddit
I run llama on an arm sbc with 6tops and 16gb - rockchip rk3588, it run fine. Just got a hx370 laptop - 50tops and 64gb ram should be good. I was googling about how to use it under linux and didn't find much previously. Ill wait. 6.13 release is in two days, the release candidate for 6.14 wont be long after that.
tisti@reddit
Don't know what the equivalent GPU would be, probably a RX480/GTX1060? Plenty of (power efficient) power for basic stuff.
bouche_bag@reddit
That's pretty good if true. I have a 1050 Ti that runs Mistral 7b well, among other things.
SweetBearCub@reddit
Try 50 TOPS, not 16
cac2573@reddit
First generation
GreyXor@reddit (OP)
Yes, and then 40 or 50TOPS for the second generation, if I remember well
mycall@reddit
50,000,000,000,000 operations per second is pretty amazing on a tiny HX 370.
GreyXor@reddit (OP)
An NPU (Neural Processing Unit) is specialized hardware designed to accelerate AI and machine learning tasks, similar to how a GPU (Graphics Processing Unit) accelerates graphical computations.
It's a Processing Unit for Neural (AI) stuff.