ARM or AMD cpus for LLM / AI on Linux
Posted by Elegant_Fold_7809@reddit | LocalLLaMA | View on Reddit | 11 comments
I have two needs first is the usual using a nvidia gpu and LLMs and image generation diffusion models. For this I know an AMD x86 cpu is good enough. But what about ARM64 cpus? Are they comptaible for the latest advancements in tech?
Second I plan to use only the cpu with intergrated graphics, on a light laptop for text to speech, speech to text, simple text classification like BERT, may be image tagging and using it in general on a daily basis. What would you recommend for this type of stuff?
Does Linux run smoothly on ARM machines or still have to wait a bit?
s101c@reddit
As for AMD, it would interest me more from the integrated graphics standpoint rather than pure CPU.
Pure CPU is slow and should be considered only if you want to run very large models for low price (where tok/s speed doesn't matter).
For the rest of the usecases, get a dedicated GPU, or an integrated GPU. Between Intel and AMD, the latter has more powerful integrated graphics which can be a nice combo with DDR5 RAM.
Mac would still be a better choice with its unified RAM, and a discrete GPU is the best choice of them all.
Elegant_Fold_7809@reddit (OP)
But rocm will cause problems? As others have said that because of rocm even with the most powerful integrated graphics it causes problems and I may not be able to run some stuff.
s101c@reddit
From what I've read (didn't have the chance to try myself), ROCm at this moment is supported by Llama.cpp, LM Studio, many of the TTS/STT backends, and ComfyUI (used for Stable Diffusion/Flux). So basically everything important.
The recent problems that I have read about, include the messed up installation when ROCm is updated. Like, to be sure that everything works correctly, you must 'freeze' the installation and not update any part of it. This is probably solved by containers/virtual environment.
Considering the amount of progress compared to just a year ago, even the current state seems to be pretty good. It works, but will require maintenance.
To make a decision, you might want to test it yourself or lookup recent (only recent!) comments by those who own an AMD GPU and know what they are doing.
Elegant_Fold_7809@reddit (OP)
Thank you for such a detailed comment helped me a lot. Not to be too much of a bother but can you point me in the direction where I can read about the latest information on AMD / rocm?
Also for TTS I plan to use f5-tts, not sure how much it's supported by amd. for stt nvidias canary, I have a feeling it's not supported at all. NV-embed-v2 also nivdia. But honestly the other models are not that far behind so won't mind using them at all the only thing is that will they actually run on AMD. I don't want to spend time configuring a machine then realizing something in my AI pipeline is broken.
You said it supports flux but what about florence or joy caption are those also supported?
If AMD linux drivers are better then that's another plus I don't game but I only pretty much linux and don't really plan to use windows other than through live boot.
Wrong-Historian@reddit
Just get an Intel+Nvidia laptop. It's mega. My Lenovo Yoga Pro 7 with RTX4060 with 8GB Vram together with a Core Ultra 9 185H with 32GB LPDDR5x 7467 does 4T/s on Qwen2.5-32B-Instruct-IQ4. All the while using less than 40W of power.
poli-cya@reddit
Any reason you think jetson laptops are coming? Was there an announcement or something?
No-Refrigerator-1672@reddit
You can do LLM inference on ARM64 just fine. There's entire community that runs their AI on macbooks and mac studios. Linux by itself works perfectly on ARM64 too. However, good luck making Nivida drivers to run on ARM64, that's going to be a real challenge.
Answering the second question: AMD has the best iGPUs, seek those. However, to utilise them, you need a ROCm compatible software.
Some_Endian_FP17@reddit
ARM64 Linux on Graviton maybe but Snapdragon X1 laptops and Apple MacBooks don't play well with Linux.
For Snapdragon you can run llama.cpp on Windows or WSL using Q4048 quant GGUFs for accelerated CPU inference.
For MacBooks use whatever inference runner that supports Metal and MLX.
Elegant_Fold_7809@reddit (OP)
Hey thanks. For first scenario I guess I go with AMD + Nvidia.
For second scenario, the problems of Nvidia + ARM transfer over to the cpu only too, and I can't use the AMD either because of ROCm problems, so I'm stuck with intel?
Or you think for the second scenario I should just spend a tiny amount more and get the cheapest Nvidia gpu with AMD?
M34L@reddit
I thought ROCm hasn't supported any iGPUs really and you were limited to Vulkan, or does it now?
No-Refrigerator-1672@reddit
Officially, any APU is unsupported by ROCm. However, every now and then I see tutorials about how to make it work, i.e. this one. Never tried it myself, but given how I've seen those posts by multiple authors on multiple sites, I think it's totally doable.