My Linux/Fedora Local Ai performance is trailing Windows massively? Are there specific ROCm environment variables or memory management tweaks for RDNA3 that I'm missing?
Posted by Optimal_Guava5390@reddit | LocalLLaMA | View on Reddit | 2 comments
My Linux/Fedora Local Ai performance is trailing Windows massively? Are there specific ROCm environment variables or memory management tweaks for RDNA3 that I'm missing?
Fedora 44 Workstation AI Performance
Issue: Sub-optimal AI throughput on 9950X3D/7900 XT (worse than Windows baseline).
1. Hardware Environment
- CPU: Ryzen 9 9950X3D (Zen 5, 16c/32t, 3D V-Cache on CCD0)
- GPU: Radeon RX 7900 XT 20GB (RDNA3, native gfx1100)
- RAM: 64GB DDR5 5600MHz
- OS: Fedora 44 (Kernel 6.19.10-300.fc44.x86_64)
- Stack: Wayland / amdgpu / ROCm (bare-metal)
2. Current AI Stack Configuration
The system uses CLI Ollama and with a Podman-based Open WebU both return similar performance small improvements in Terminal.
Ollama Environment Overrides (/etc/systemd/system/ollama.service.d/override.conf):
Ini, TOML
[Service]
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KV_CACHE_TYPE=q8_0"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_CONTEXT_LENGTH=8192"
Model Strategy:
- Primary Model: Gemma 4 26B (17GB)
- Target Performance: 90+ tok/s eval (GPU-resident) ( Windows is already 95-99)
3. Applied Kernel & Hardware Tunings
- V-Cache Optimizer: Active service biasing scheduler to CCD0 (cache mode).
- CPU Driver: amd-pstate-epp with performance governor/EPP.
- Sysctl: vm.swappiness=10, vm.vfs_cache_pressure=50.
- GPU Power: Reaches \~2850MHz / \~225W+ under ROCm load.
4. Known Constraints (Explicitly Not Applied)
- mitigations=off: Not applied for security reasons.
- Transparent Huge Pages (THP): Set to madvise default.
- Ollama is running bare-metal to avoid container overhead on the ROCm path.
Comparison Data
| Metric | Current Result |
|---|---|
| AI Throughput (Eval) | 75.87 max tok/s (Gemma 4 26B) |
| AI Throughput (Prompt) | 2,437 tok/s |
| Geekbench 6 Multi-Core | 22,692 |
Any help or suggestions? Feel more and more I may have picked the Wrong Distro for AMD?
sine120@reddit
Switch from Ollama to Llama.cpp, and use Vulkan. Ollama doesn't play nice with ROCm and Vulkan is still a little faster. Ollama uses Llama.cpp under the hood, but it won't be the most up-to-date for recently released models.
Optimal_Guava5390@reddit (OP)
You win again the Llama / Vulkan absolutely crushed it Holy Smmmmmm
[ Prompt: 145.5 t/s | Generation: 118.8 t/s ] 26b Gemma 4 Q4 , exat same prompt. Thank you !!!