What would be the best OS to run LLMs?

Posted by Manaberryio@reddit | LocalLLaMA | View on Reddit | 26 comments

Hi there,

I've ordered a mini PC with 128GB of RAM and the AMD AI Max 395. I intend to use it with Proxmox (like my actual machine), where I run Windows for some gaming and macOS for my music library server. I also want to run LLMs on it.

Main purpose would be local agent coding and some text refining. I'm quite new and it's quite overwhelming to be honest. It evolves so fast I can't keep track of what works best.

What would be the best OS for LLMs?
What would be the best software to run LLMs?
Any compabitility issues with my choices to be aware of (such as graphic drivers on linux)?

Thank you for your help!

[-]

DelKarasique@reddit

Linux + vllm for maximum performance.

Windows + llama.cpp for ease of use.

[-]

bernzyman@reddit

Does vllm in Linux still need more vram than Llamacpp as is the case in windows?

[-]

XccesSv2@reddit

Its not about about Windows vs Linux, its because vllm handles LLMs differently. Its uses more VRAM but has better throughput

[-]

bernzyman@reddit

Yes I know that part. I simply wondered whether it ran more efficiently on a LInux setup compared to a Windows setup

[-]

Th3Sim0n@reddit

Windows + LM studio for even easier use for console haters

[-]

DelKarasique@reddit

llama.cpp > llama.cpp wrappers IMO

[-]

Th3Sim0n@reddit

Fully agreed, but for clickops LM Studio is just way friendlier

[-]

jwpbe@reddit

half of the replies will be bots telling you old advice

find a distribution of arch linux like cachyos or endeavour that is user friendly and use that so you get rolling releases

[-]

VoiceApprehensive893@reddit

linux distro war thread

i use cachyos btw

[-]

Edenar@reddit

i have a framework desktop (128GB/395 max) : i first installed Ubuntu but i recently switched to fedora (native podman, more stable at least coming from Ubuntu 25.10).

i wouldn't use windows for llm. Also unless you to play some esport game with kernel level anticheat (LoL, valorant,..) , gaming works well (steam require 0 efforts, i used herouc launcher for games from GOG and epyc and it was almost 0 efforts too)

[-]

VoiceApprehensive893@reddit

gaming on linux still requires effort

[-]

NNN_Throwaway2@reddit

I use windows with WSL and docker for vLLM. I also have a dual-boot Ubuntu install but I just don't have any reason to use it.

[-]

lemondrops9@reddit

Linux Mint + LM Studio for an easier setup then move to Llama.cpp for some extra speed.

[-]

Thunderstarer@reddit

I run my LLMs in NixOS LXCs. Ubuntu would probably be best if you're not already familiar with Nix.

[-]

Evening_Ad6637@reddit

In my personal experience the best OSes to run LLMs and all are Debian, OpenSUSE and Artix

[-]

turtleisinnocent@reddit

TempleOS, of course. Holy C makes it fast because it all runs on ring 0.

[-]

Evening_Ad6637@reddit

[-]

ImportancePitiful795@reddit

IF you use W11 IOT Enterprise with Lemonade server (llama.cpp wrapper with FastFlowML etc added to it), there is absolutely no need need to switch to Linux for the few % extra perf. Just stick to the Windows, play your games, run your Windows application. No need to switch OS.

If you play BF/COD games, also stick to Windows. There is no Linux DRM for those games so they become unplayable. Same applies to all EA games using EA AntiCheat (EAAC). (I refuse to play any EA games even on Windows).

Otherwise Linux with Lemonade or vLLM depending your needs. vLLM is better if you run agents due to better concurrency performance.

Which distro? Depends. Fedora is great for workstation usage, but if you plan to run LLM as services, or God forbid try to setup remote desktop to it, better use Ubuntu...

Unfortunately nobody in here can give you a definite answer if AMD adds MLX support on the Windows Lemonade or only on Linux Lemonade. (currently AMD MLX support is in close beta testing by the Lemonade team).

[-]

jikilan_@reddit

If pure llm then linux, if gaming then windows especially streaming with moonlight

[-]

jikilan_@reddit

Windows sometimes the power profile won’t go down.

[-]

XccesSv2@reddit

If you need official guides then AMD has natively Ubuntu in their guides. Thats a good start. But in my case, I used a few months Fedora because it has ROCm in his Repos integrated but now I switched to CachyOS because their repos are even more actual. They already have ROCm 7.2.2 official in their repos.

BUT: It doesnt really matter. Instead of installing everything natively you can also use toolboxes and docker container and can use what every distro you want to get vllm or llama.cpp running.

You can also install proxmox with a LXC container and passthrugh the GPU/NPU devices for an isolated LLM instance

[-]

RG_Fusion@reddit

The best OS for running LLMs would be a Debian install of Linux, but if you're already feeling overwhelmed you should stick to Windows. You can always make the change at a later date when you're feeling comfortable. The performance loss is notable, but not game-changing.

What I operate on and view as an idealized system is running the LLMs on a Linux server dedicated for inference. The server just accepts and responds to requests from other computers. All of my python scripts that utilize LLMs are on my gaming PC, and they interact with the LLM over the local network.

[-]

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)