My Linux/Fedora Local Ai performance is trailing Windows massively? Are there specific ROCm environment variables or memory management tweaks for RDNA3 that I'm missing?

Posted by Optimal_Guava5390@reddit | LocalLLaMA | View on Reddit | 2 comments

My Linux/Fedora Local Ai performance is trailing Windows massively? Are there specific ROCm environment variables or memory management tweaks for RDNA3 that I'm missing?

Fedora 44 Workstation AI Performance

Issue: Sub-optimal AI throughput on 9950X3D/7900 XT (worse than Windows baseline).

1. Hardware Environment

2. Current AI Stack Configuration

The system uses CLI Ollama  and with a Podman-based Open WebU both return similar performance small improvements in Terminal.

Ollama Environment Overrides (/etc/systemd/system/ollama.service.d/override.conf):

Ini, TOML

[Service]
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KV_CACHE_TYPE=q8_0"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_CONTEXT_LENGTH=8192"

Model Strategy:

3. Applied Kernel & Hardware Tunings

4. Known Constraints (Explicitly Not Applied)

Comparison Data

Metric Current Result
AI Throughput (Eval) 75.87 max tok/s (Gemma 4 26B)
AI Throughput (Prompt) 2,437 tok/s
Geekbench 6 Multi-Core 22,692

Any help or suggestions? Feel more and more I may have picked the Wrong Distro for AMD?