OCuLink dGPU for AMD: RX 7600 XT vs RX 7800 XT for LLM — worth the price gap? Also llamacpp + Vulkan vs Ollama + ROCm?

Posted by Pablo_Gates@reddit | LocalLLaMA | View on Reddit | 10 comments

Planning a homelab with a GMKtec K12 (Ryzen 7 H255, 780M iGPU, OCuLink). Phase 1 runs Ollama on the 780M. Phase 2 adds an OCuLink dGPU specifically for LLM (Ollama + Open WebUI), freeing the iGPU for Frigate object detection only.

GPU choice: RX 7600 XT vs RX 7800 XT

For LLM use on home hardware, is the RX 7800 XT worth the \~€80-100 premium? My primary use case is Qwen 2.5 14B and eventually Qwen 2.5 32B at Q4. No image generation.

Stack: llamacpp + Vulkan vs Ollama + ROCm

I've seen recommendations to use llamacpp with pre-built Vulkan binaries instead of Ollama for AMD, especially with an OCuLink setup. The binaries are on the llama.cpp GitHub releases page so no compilation is needed.

Questions:

  1. For AMD OCuLink dGPU + Linux, is llamacpp + Vulkan noticeably better than Ollama + ROCm in practice?
  2. Any specific flags for the llamacpp Vulkan build on AMD that make a real difference? I've seen mention of a "fit flag" that simplifies layer allocation.
  3. OCuLink bandwidth: is there any measurable throughput loss for LLM inference vs a native PCIe slot? The K12 uses OCuLink which is PCIe 4.0 x4.
  4. Dual GPU scenario: 780M iGPU (Frigate) + dGPU via OCuLink (Ollama) — any complications with ROCm or Vulkan seeing both devices and picking the wrong one?

Running Linux (Ubuntu 24.04 LTS).