Linux 6.14 will have amdxdna! The Ryzen AI NPU driver

[-]

SeriousPlankton2000@reddit

Since I'm out of the loop: What kind of operations do these NPU support? Is there a use case beyond emulating AI "neurons"? How do these compare to GPUs?

[-]

Typically NPUs provide operations for working with matrices. Things like matrix convolutions and multiplication. They can be used for things other use cases that rely on working with matrices, like signal processing.

[-]

IAmRoot@reddit

Usually low bit width, though, so don't expect to be able to leverage such hardware for scientific computing workloads and other more general purpose matrix operations, though. The Ryzen NPU apparently uses 8 bit mantissas and a shared 8 bit exponent that's common between all matrix elements. This hardware is designed to do very low precision operations very fast with low power consumption. This means that even for matrix operations they aren't particularly useful outside of AI workloads.

[-]

AllanSundry2020@reddit

gdebi downer 😋😋

[-]

DGolden@reddit

High-level Blurbs about the amd xdna architecture in particular, the high-level diagram useful -

https://www.amd.com/en/technologies/xdna.html

[...] tiled array of AI Engine processors. Each AI Engine tile includes a vector processor, a scalar processor, and local data and program memories. [...]

https://www.anandtech.com/show/21469/amd-details-ryzen-ai-300-series-for-mobile-strix-point-with-rdna-35-igpu-xdna-2-npu/2

https://images.anandtech.com/doci/21469/AMD%202024_Tech%20Day_Vamsi%20Boppan-12.png

b3081a already linked the amd xdna-driver repo

Poking about idly for my own learning (beware I'm not really in the field, and don't have one of these devices):

That had some interesting info about what executables for the xdna NPUs actually are - https://github.com/amd/xdna-driver/blob/main/src/driver/doc/amdnpu.rst#application-binaries

AMD XDNA Array overlay, which is used to configure a NPU spatial partition. The overlay contains instructions for setting up the stream switch configuration and ELF for the compute tiles.

that links to https://docs.amd.com/r/en-US/am020-versal-aie-ml/AIE-ML-Tile-Architecture?tocId=XpoRtpY5dV0BaKcnMFOq_w

(Note "The XDNA2 AI unit is based on the Versal 2 Dataflow processors from AMD's Xilinx FPGA division.")

Remember amd now owns xilinx, but xilinx stuff continuing not killed off.

https://www.amd.com/en/corporate/xilinx-acquisition.html https://www.xilinx.com/htmldocs/xilinx2023_2/aiengine_ml_intrinsics/intrinsics/index.html

Programming them directly ....may be not for the fainthearted - even if they do seem pretty well documented with open source tools as b3081a says.

If you ARE using them for AI apps and/or wanting higher level, do seem to be steered to doing (py)torch model -> onnx -> (model quantization step) onnx -> amd's provided execution provider for onnx that compiles to executable for their NPU architecture -> run on the NPU

https://pytorch.org/docs/stable/onnx.html

Could do a lot of things within onnx's general graph paradigm, not just compile pytorch models to it. https://onnx.ai/onnx/intro/concepts.html

But (big but) as SchighSchagh also already mentioned in response to b3081a in this thread, not all the bits seem to be out officially for Linux, at least not yet, the relevant download is currently a zip for windows only. (well, there's existing embedded-linux stuff if you drill down, but I mean as a relatively clear "here install the thing" for desktop linux like there is for desktop windows).

They may just do it shortly though?

https://xilinx.github.io/Vitis-AI/3.5/html/docs/workflow-third-party.html?highlight=xdna#onnx-runtime

For Ryzen™ AI targets which leverage the AMD XDNA™ adaptable AI architecture, the Vitis AI Execution Provider is published here.

->

https://onnxruntime.ai/docs/execution-providers/Vitis-AI-ExecutionProvider.html#runtime-options

The Vitis AI ONNX Runtime integrates a compiler that compiles the model graph and weights as a micro-coded executable. This executable is deployed on the target accelerator (Ryzen AI IPU or Vitis AI DPU).

NOTE: Ryzen AI Linux support is not enabled in this release.

note "IPU" is same thing as "NPU" https://ryzenai.docs.amd.com/en/latest/getstartex.html

In this documentation, “NPU” is used in descriptions, while “IPU” is retained in some of the tool’s language, code, screenshots, and commands. This intentional distinction aligns with existing tool references and does not affect functionality. Avoid making replacements in the code.

[-]

JEDZENIE_@reddit

NPU is old tech, phones in 2017 already had those but as we gonna use more ai in different workloads we need to have better ways to use it efficiently (ai itself is old as well we just hit a next generation which allows as to build new tools using it and some gimmicky stuff as well like AI image generation etc.). NPU is a simply accelerator just like gpus and even thought gpus can do this stuff as well for mobile it will be better for no reasone this chip is (if Im not mistaken about Ryzen AI) a mobile CPU for laptops were power consumption matters even more due to limited battery life as well thermals, it just helps a lot cause you can keep high calculation and low power.

Things that can use this hardware accelerator is for example noise cancelation (during phone calls or filters etc.), image upscalers, in certain places AI performs better than classic algorithms we use today, image generation of course,translators and probably more subtle stuff I don't know about, I think you can use it to do FG.

Also NPUs can't replace the gpus cause the same as GPU it is meant for doing something faster in this case stuff that benefit from neural-processing like AI and stuff. Graphical stuff benefit from ai-tools but unless you Play something like AI-generative-minecraft it can't render things like GPU. (Of course Im not an expert so do extra research on that and others feel free to correct me if I'm wrong)

[-]

SeriousPlankton2000@reddit

Thanks. I'm old enough to have bought a 80387 FPU so I do understand what you say.

[-]

SweetBearCub@reddit

As a person who uses Linux Mint, is there an approximate schedule for when this particular kernel version might fit into Mint's release schedule?

[-]

notam00se@reddit

Mint is based on Ubuntu LTS, so out of the box support might be 26.04, doubt LMDE going to Trixie will catch it.

[-]

KnowZeroX@reddit

Mint is now HWE by default. So when 25.10 is released, it will come with that kernel.

[-]

notam00se@reddit

ooh, nice!

[-]

SweetBearCub@reddit

Mint is based on Ubuntu LTS, so out of the box support might be 26.04, doubt LMDE going to Trixie will catch it.

Thanks!

[-]

gmes78@reddit

Assuming you're using the HWE kernel: 6.14 isn't going to release in time for Ubuntu 25.04, so you'll have to wait until Ubuntu 25.10 is released and its kernel is made available to Ubuntu LTS through the HWE kernel.

[-]

kansetsupanikku@reddit

Are you using this NPU? You can compile custom kernel anytime you want, perhaps you might even contribute to its development at this stage.

[-]

SweetBearCub@reddit

Are you using this NPU? You can compile custom kernel anytime you want, perhaps you might even contribute to its development at this stage.

No, but I'm strongly considering upgrading to a system with such an NPU very soon, from my ~2021 system with a Ryzen 59xx.

I'm not sure how to compile my own kernel, but I'm willing to learn.

[-]

jmnugent@reddit

This is probably a dumb question,. but is there a list somewhere of what Processors or Motherboards support NPU's ? (I know this is a dumb question.. I haven't hand-built a PC in about 20 years). I'm assuming this is AMD Ryzen related ,. so any recent AMD chip ?

If I was planning to buy or build a PC to take advantage of this,.. any particular thing I'd need to remember or look out for ?

[-]

KnowZeroX@reddit

Wikipedia?

https://en.wikipedia.org/wiki/List_of_AMD_Ryzen_processors

Just search for NPU

[-]

jmnugent@reddit

Very helpful, thank you !

[-]

MorphiusFaydal@reddit

Mobile - the Ryzen AI chips, and some of the mid-high end Ryzen 7600/7800/8800 chips.

Desktop - none. Yet.

[-]

Irverter@reddit

The 8600G and 8700G are desktop processor that have NPU.

And the 8700F, but only with certain Radeon GPUs.

[-]

YKS_Gaming@reddit

Missed opportunity to call it amdxdma, its worthless ai stuff anyway

[-]

afiefh@reddit

There are plenty of valid use cases for this "AI stuff" that are getting overshadowed by the bullshit the industry is pushing.

Noise cancelling during video calls is much better with AI. Adaptive fill for image manipulation is much better with AI. Obviously both of these can be done better by humans who were trained in that field, but when I call my mom, I'm just happy that she can hear what I'm saying without worrying about the noise in the background.

The problem is that the whole world is going crazy trying to push LLMs with the promise that they'll be able to achieve AGI. Unfortunately LLMs are not very useful yet, and nobody really knows if we can get AGI through this stuff.

[-]

YKS_Gaming@reddit

And I am willing to bet that the two examples you listed do not require an NPU, or even on device machine learning.

[-]

Irverter@reddit

Technically raytracing doesn't require a GPU, it's just so much better with a GPU.

Same logic with things on a NPU.

[-]

YKS_Gaming@reddit

so much better

Still has to undersample, dither, and smear your way out of reflections

[-]

Irverter@reddit

And? part of the process is hardware accelerated rather than be fully software. That's the point.

[-]

afiefh@reddit

Which part of "much better" was hard to understand?

Literally nothing requires an NPU. Everything an NPU can do, a CPU can do as well. That is the point of Turing completeness. An NPU is an accelerator for these tasks, just like a GPU is an accelerator for graphics.

Being able to do graphics cheaply and efficiently is what enabled them to be used everywhere (and originally it was shit, remember all the translucency in Vista? All the bloom effects in random places?) before people figured out how to use them well. The same thing is happening with an NPU: Right now companies are using them in crappy ways, but eventually they'll just be there in the background being used for whatever is appropriate.

Without acceleration, you are limited to very rudimentary noise cancellation or adaptive fill. With acceleration you can employ much better techniques because the person on the phone wants to hear what you say with a 5ms delay, not a 5seconds delay.

[-]

SchighSchagh@reddit

Magic the Gathering (yes I mean the card game) is technically Turing complete. Being Turing complete, by itself, is not actually very useful I'm afraid.

[-]

afiefh@reddit

Nobody claimed it is useful, but it means that you can perform any computation that a CPU/GPU/NPU can perform using MTG. It will be slow as molasses, but that's exactly the point I'm making.

[-]

SchighSchagh@reddit

Literally nothing requires an NPU.

This bit is false though. Doing certain things in real time (or at least reasonable amounts of time), or with low power usage, certainly does require an NPU or other comparable accelerator.

[-]

da2Pakaveli@reddit

The game of life is also turing complete

[-]

mycall@reddit

Turing completeness.

To be fair, most people have no idea about this.

[-]

teddybrr@reddit

Nothing requires an NPU. It is there to optimize this load using as little power as possible.

[-]

MarioGamer06@reddit

Yet again, AMD shows they care about their customers. If only team Green would learn...

[-]

cAtloVeR9998@reddit

I mean, support is only landing like nearly 2 years after consumer launch. Support should have been ready a long time ago.

[-]

kansetsupanikku@reddit

Right? Imagine NVIDIA having that sort of delay for anything AI-related. The standard is indeed different.

[-]

b3081a@reddit

That's mainline support which NVIDIA never had in the first place.

AMD's out of tree module has been their for quite a while (https://github.com/amd/xdna-driver) and a lot of the official Xilinx XRT samples are already usable on this stack. The runtime (and recently device kernel compilers) are fully open source which is also much nicer than what NVIDIA has been doing so far.

[-]

SchighSchagh@reddit

Problem is, none of the AI libraries have any support for this. I don't think even ROCm supports this, let alone Tensorflow, PyTorch, ONNX, etc. RyzenAI is still Windows-only. They've vaguely mentioned that Linux version is coming at some point, but... it's vaporware. Also, on Windows RyzenAI doesn't really play well with integrated graphics and/or NPU either. It works on the newer chips I believe, but the 7040 mobile chips have been left out. It's kind of hard to see how exactly AMD is actually better than NVIDIA here in real, practical terms.

[-]

b3081a@reddit

XRT & ONNX Vitis EP should work.

[-]

spezdrinkspiss@reddit

nvidia have supported CUDA on linux ever since its release, while amd have been dragging their asses for 2 years lol

[-]

TheBrokenRail-Dev@reddit

For AI/CUDA/GPGPU tasks, NVIDIA's Linux support has always been exemplary. Mainly because that's where the money is.

[-]

Java_enjoyer07@reddit

A new Geforce native App for Steamdeck is dropping and Nvidia is focusing on better Wayland Intergration on the Steamdeck and SteamOS. It seems like they see SteamOS/Linux as a new rising market and are starting to actaully support Linux. THE YEAR OF THE LINUX DESKTOP I SWEAR THIS TIME!!!!

[-]

grigio@reddit

Are there any improvements on ollama?

[-]

Freyr90@reddit

What kind of API does it provide? Is there any standard already?

[-]

edparadox@reddit

And what will you use it for?

[-]

GreyXor@reddit (OP)

Accelerate Neural and AI stuff

exactly like longtime ago, graphical stuff was calculated on CPU, as we got more and more graphical stuff we needed GPU. Same story but for AI

[-]

edparadox@reddit

Accelerate Neural and AI stuff exactly like longtime ago, graphical stuff was calculated on CPU, as we got more and more graphical stuff we needed GPU. Same story but for AI

I know what it is for, no need to be condescending, especially if you cannot even read a sentence.

For someone hyped by this, you certainly don't seem like you know how and by what a NPU is used. Nobody used "neural" that way.

Did you at least see the requirements of "only" running a local LLM NPU-wise?

[-]

GreyXor@reddit (OP)

Sorry if my response came across the wrong way; I wasn’t trying to be condescending. I was just drawing an analogy to help explain why NPUs are significant for AI.

[-]

Ponnystalker@reddit

Ok so NPUs calculare matrixes and tensor operations + alot of other ones in parallel just like the gpu

On the other hand GPUs usually calculate raster, render, shaders, etc for graphical heavy tasks but it can also calculate matrix in parallel just like ( or similar ) the NPUs

[-]

SealProgrammer@reddit

And what will you use it for?

I know what it is for

You ok there buddy?

[-]

stevorkz@reddit

How on earth was his response condescending? I found it to be a perfectly reasonable and non offensive reply to your question.

[-]

notam00se@reddit

Intel has plugins for Gimp for their NPU work.

Eventually I assume digikam will start supporting it for things like face detection and recognition (CPU only right now). kdenlive might get local subtitle or ai masking. vlc just announced realtime ai subtitles processed locally, will allow better parity vs windows and macos npu support.

But first is needs to be easily available in LTS distros, something both AMD and Intel need to work on.

[-]

mycall@reddit

What can AI be used for? All kinds of things.

[-]

RealASF1020@reddit

I'm looking forward to this, hoping that NPUs become something that act as a replaceable part in a PC like a CPU does now (the socket will probably be M.2 like all the current ones are but I could also see alot of PCIe ones come out)

[-]

NatoBoram@reddit

I wonder if we're going to end up with NPUs next to our GPUs eventually

[-]

KishCom@reddit

Such great news, I have been waiting for this since like September. I've got a laptop with an HX370.

[-]

tisti@reddit

If I remember correctly, it's around 16TOPS which is not much. But if software can unload work there instead of CPU or GPU then all the better.

[-]

INITMalcanis@reddit

Presumably enough for some basic functions?

[-]

5c044@reddit

I run llama on an arm sbc with 6tops and 16gb - rockchip rk3588, it run fine. Just got a hx370 laptop - 50tops and 64gb ram should be good. I was googling about how to use it under linux and didn't find much previously. Ill wait. 6.13 release is in two days, the release candidate for 6.14 wont be long after that.

[-]

If I remember correctly, it's around 16TOPS for the first generation which is not much. But if software can unload work there instead of CPU or GPU then all the better.

Try 50 TOPS, not 16

[-]

cac2573@reddit

First generation

[-]

GreyXor@reddit (OP)

Yes, and then 40 or 50TOPS for the second generation, if I remember well

[-]

mycall@reddit

50,000,000,000,000 operations per second is pretty amazing on a tiny HX 370.

[-]

GreyXor@reddit (OP)

An NPU (Neural Processing Unit) is specialized hardware designed to accelerate AI and machine learning tasks, similar to how a GPU (Graphics Processing Unit) accelerates graphical computations.

It's a Processing Unit for Neural (AI) stuff.