My Linux/Fedora Local Ai performance is trailing Windows massively? Are there specific ROCm environment variables or memory management tweaks for RDNA3 that I'm missing?

Posted by Optimal_Guava5390@reddit | LocalLLaMA | View on Reddit | 2 comments

My Linux/Fedora Local Ai performance is trailing Windows massively? Are there specific ROCm environment variables or memory management tweaks for RDNA3 that I'm missing?

Fedora 44 Workstation AI Performance

Issue: Sub-optimal AI throughput on 9950X3D/7900 XT (worse than Windows baseline).

1. Hardware Environment

CPU: Ryzen 9 9950X3D (Zen 5, 16c/32t, 3D V-Cache on CCD0)
GPU: Radeon RX 7900 XT 20GB (RDNA3, native gfx1100)
RAM: 64GB DDR5 5600MHz
OS: Fedora 44 (Kernel 6.19.10-300.fc44.x86_64)
Stack: Wayland / amdgpu / ROCm (bare-metal)

2. Current AI Stack Configuration

The system uses CLI Ollama and with a Podman-based Open WebU both return similar performance small improvements in Terminal.

Ollama Environment Overrides (/etc/systemd/system/ollama.service.d/override.conf):

Ini, TOML

[Service]
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KV_CACHE_TYPE=q8_0"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_CONTEXT_LENGTH=8192"

Model Strategy:

Primary Model: Gemma 4 26B (17GB)
Target Performance: 90+ tok/s eval (GPU-resident) ( Windows is already 95-99)

3. Applied Kernel & Hardware Tunings

V-Cache Optimizer: Active service biasing scheduler to CCD0 (cache mode).
CPU Driver: amd-pstate-epp with performance governor/EPP.
Sysctl: vm.swappiness=10, vm.vfs_cache_pressure=50.
GPU Power: Reaches \~2850MHz / \~225W+ under ROCm load.

4. Known Constraints (Explicitly Not Applied)

mitigations=off: Not applied for security reasons.
Transparent Huge Pages (THP): Set to madvise default.
Ollama is running bare-metal to avoid container overhead on the ROCm path.

Comparison Data

Metric	Current Result

AI Throughput (Eval)	75.87 max tok/s (Gemma 4 26B)
AI Throughput (Prompt)	2,437 tok/s
Geekbench 6 Multi-Core	22,692

Any help or suggestions? Feel more and more I may have picked the Wrong Distro for AMD?

[-]

sine120@reddit

Switch from Ollama to Llama.cpp, and use Vulkan. Ollama doesn't play nice with ROCm and Vulkan is still a little faster. Ollama uses Llama.cpp under the hood, but it won't be the most up-to-date for recently released models.

Optimal_Guava5390@reddit (OP)

You win again the Llama / Vulkan absolutely crushed it Holy Smmmmmm

[ Prompt: 145.5 t/s | Generation: 118.8 t/s ] 26b Gemma 4 Q4 , exat same prompt. Thank you !!!