AI models on RX 5500 XT (8gb vram)
Posted by Adventurous_Abies347@reddit | LocalLLaMA | View on Reddit | 4 comments
I recently installed Proxmox in my old PC for testing and created a Ubuntu server VM with GPU passthrough. I'm looking for advice on the best models to run on this setup.
Will I be able to do any training/fine-tunning or only the inference?
The rest of the hardware is: Ryzen 3 2200 g and 16 gb DDR4
Desperate-Body-5462@reddit
With an RX 5500 XT (8GB VRAM), you’re mostly looking at inference, not training. AMD support is still behind CUDA, so you’ll likely be using ROCm (if supported) or falling back to CPU/Vulkan, which can be hit or miss. For models, stick to quantized 7B or smaller (like Qwen2.5/3 7B, LLaMA 3 8B GGUFs with Q4/Q5) those should run decently. 13B might technically run but will be slow and memory-constrained. Your 16GB RAM is also a limiting factor, so avoid large context sizes. Fine-tuning is realistically not worth it on this setup unless you do very lightweight methods (like LoRA on CPU, which will be very slow). Overall, treat this as a solid local inference plus experimentation setup, not a training rig.
Adventurous_Abies347@reddit (OP)
Okey, very helpful thanks
Monad_Maya@reddit
Those model recommendations are very sus. Looks like an old LLM response.
Try Qwen 3.5 9B at the largest 4 bit quant you can fit - https://huggingface.co/bartowski/Qwen_Qwen3.5-9B-GGUF
If you want a larger model then Gemma 4 26B A3B MoE - https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF, run this via partial offload to the CPU/RAM.
You can use llama.cpp directly or a LM Studio.
- https://github.com/ggml-org/llama.cpp
- https://lmstudio.ai/
No idea about fine-tuning.
ps5cfw@reddit
You've got barely any RAM + VRAM to do anything useful with inference, how do you expect to fine tune something with such a limited hardware?