$6k AMD AI Build (2x R9700, 64GB VRAM) - Worth it for a beginner learning fine-tuning vs. Cloud?
Posted by buenavista62@reddit | LocalLLaMA | View on Reddit | 34 comments
Planned Build:
- GPU: 2x AsRock Radeon AI PRO R9700 (2*32GB = 64GB Total VRAM)
- CPU: AMD Ryzen 9 9950X
- Motherboard: Gigabyte B850 AI TOP
- RAM: Corsair 2x48GB DDR5 6000 MHz (96GB Total)
- Storage: Lexar NM990 2TB SSD (OS/Apps) + Lexar NM1090 Pro 4TB SSD (Models/Data)
- PSU: be quiet! 1200 W
- Cooling: Corsair NAUTILUS 360 AIO
My Situation & Dilemma:
I'm a beginner to local LLMs. My goal is to learn/study fine-tuning concepts and tinker with diffusion models. I'm torn because:
- Building: It would be a cool experience, and I'd have a powerful local machine for experimentation.
- Cloud: I could rent compute only when needed, potentially saving money upfront.
I'm aware of the current software disadvantages of ROCm compared to CUDA, but I'm betting on AMD's future improvements.
What would you do in my shoes? Is the hands-on learning experience worth the \~$6k investment, or would I be better off putting that money towards cloud credits? Do you see other advantages/disadvantages between these two options? I'm also open to alternative build suggestions at a similar or lower price point.
Any recommendations or shared experiences are highly appreciated!
Thanks in advance!
woahdudee2a@reddit
get any PC with 64 GB RAM then if you're feeling feeling rich buy an rtx 5090. otherwise grab a 3090 from ebay. that's it
thebadslime@reddit
If I had 6k to buy training shit with, I would get two asus ascent gx10.
buenavista62@reddit (OP)
Isn't the memory bandwidth a bit problematic?
StardockEngineer@reddit
Are you on this to learn? Because nothing will teach you more than having a mock cloud environment
Also, I own one and fine tune all the time. It’s plenty fast because there is a lot less shuffling of data in and out of VRAM. This means larger batch sizes, fewer gradient updates, more stable gradient estimates, better convergence.
You can also fine tune larger models.
buenavista62@reddit (OP)
Sounds promising! I really wanna learn. Is the more expensive GB10 a better option than the Strix Halo 128GB counterpart?
Inference speed is not as important to me as the ability to train models. The more the better
StardockEngineer@reddit
The Strix Halo just can’t be cluster today. Not for inference (heavy performance loss). Not for training. Whereas the Connect X in the Spark related systems are plug and play.
Watch this video. While not strictly on this topic, it does cover a lot and he connects two together. https://youtu.be/sx6ANedcIfI
thebadslime@reddit
at 273 it's not great, but it's a cuda system with a ton of ram and tops. One would likely outperform the dual gpu setup
Defilan@reddit
Interesting build! A few thoughts:
On the AMD bet: The 64GB unified VRAM is genuinely compelling for fine-tuning since you can fit larger batch sizes and full-precision weights. But ROCm support for diffusion models specifically has been rough. Stable Diffusion and ComfyUI work, but expect to spend time debugging. Fine-tuning frameworks like axolotl have better ROCm support now, so that side is more viable.
Build vs cloud for learning: Owning hardware changes how you learn. You'll experiment more freely when there's no meter running. That said, $6k is a lot to drop before you know what workflows you'll actually use long-term.
Alternative to consider: I built something similar for around $2,400. Dual RTX 5060 Ti (32GB total VRAM), Ryzen 9 7900X, 64GB DDR5. Gets \~44 tok/s on 13B models, handles 70B quantized, and CUDA just works out of the box. Leaves you $3,500 for cloud credits when you need to scale up for actual training runs that need more VRAM. For learning fine-tuning concepts and tinkering with diffusion, 32GB VRAM is honestly plenty. You'd only need the 64GB if you're training larger models at higher precision or running very large batch sizes. I run MicroK8s on mine with LLMKube for orchestration, which makes it easy to swap models and manage inference endpoints. The hybrid approach might be worth considering: cheaper local rig for daily experimentation, cloud burst for the occasional big training job. Best of both worlds without betting $6k on ROCm improving.
What size models are you planning to fine-tune?
buenavista62@reddit (OP)
Man that might be the best advice yet! I really haven't thought at which models to fine-tune. I would start small for sure. The hybrid approach makes really sense. Once I'm experienced enough I can still fine-tune larger models in the cloud.
Which motherboard do you use if I may ask?
Nearby_Truth9272@reddit
I will 2nd others on here, that if you are just going to learn and practice with diffusion models and what not, if you are not training but using for inference then maybe consider something for much less and that is quiet, doesn't eat your power costs. I have owned these types of comptuers for at least a decade -- in ML, AI and blockchain works -- towers can be loud. Best way to get them quiet is liquid cooled.
I did this in 2021 for new build with my child, similar costs. Very quiet, very cool, total waste of money and the GPU... I used for all sorts of things. All I could find were dual RTX 3080 10GB for GPU mining at the time, single 3090 would have been smarter. What I am saying here is, I doubt you truly will ever use more than one of those GPU in 99% of what people do.
Both AMD and Nvidia cater to just this. For AMD, maybe consider the Corsair AI Worksation with AMD AI Max 395 or IMHO, if you can live with ARM and will not be gaming, try GB10. The DGX Spark at my local Microcenter, is quiet and small. If you want to save $1k, look at ASUS or MSI GB10 models. $2k-$3k will more than likely do everything you need it to do. Case in point, I can run a whole local LLM, with full TTS inputs and outputs on GPU, with MCP integrations on 16GB of VRAM. I can run an SLM and LLM with full congnitive processing, LIDAR point cloud mapping, etc. with 8GB.
It is these crazy things that are truly driving our AI bubble and new generations of chips every year...
buenavista62@reddit (OP)
I would like to train models as well. I see your point, thought. Maybe it's smarter to build something for half the price
sleepingsysadmin@reddit
That cpu isnt enough for those 2 cards. Pcie lanes of \~24, less some lanes for crossfire; whereas you want 32x for just the gpus, and need even more for other pci connectivity. You'll end up in like 16x 4x or 8x 8x for those gpus and that'll be a performance problem in many cases.
My theorycraft recently when the R9700s were only $1500 was much like your build,
Here's the thing, how about just 1 card? 32gb lets you run all the small models. The ddr5 ram there will be enough to tackle Q4 medium sized models like gpt 120b. Lets say that's still a $4000 computer though. The $20/month cloud options will go a long way and you get quality models you cant even run on that 64gb of vram.
buenavista62@reddit (OP)
The CPU with this Mainboard can allocate PCIe gen5 x8 for each of the GPU, that's like PCIe Gen4 x16 for each card, right? Should be enough I guess
But I see. Probably using the cloud is the best option economically.
AppearanceHeavy6724@reddit
R9700 have relatively low bandwidth. Might be not as important for finetuning but certainly an issue for inference.
jacek2023@reddit
You won't learn anything by purchasing the gear. This is true for photography and this is true for AI. You can learn with minimal equipment. These posts are always "help me burn some money"
buenavista62@reddit (OP)
Thanks! I am also leaning towards not buying such a rig. It really feels like a "waste of money", when it's just about learning.
And sorry for the dumb question, but: What is minimal equipment for you? Is a Laptop with a iGPU minimal equipment already?
Coldaine@reddit
Take all this money, and put it towards cloud computing time.
I'm not kidding. It was a pretty big hill to climb when I did it (with a math background, nobody I knew/no experience with VM or LLMs) and I had a fair amount of gear already, but now being able to spin up any size VM from any provider to do anything from lora training to serving me oss120b at a firehose pace for an hour.
insulaTropicalis@reddit
No, having a GPU is very important if you want to finetune. You definitely want one. Then, even if it's a laptop GPU, you can try whatever you want, you can finetune a very small model or even train from scratch a tiny one (with millions parameters instead of billions).
Arli_AI@reddit
I would learn on an nvidia gpu unless you want to learn to get things running on RoCM instead. And I disagree with others saying you don’t need gear to learn. You absolutely do in order to do a lot of the more advanced stuff (abliteration, quantization, finetuning, etc) that might not be widely supported and take you a long time to even get running in the first place.
buenavista62@reddit (OP)
Thanks. Generally, I really want to learn about the advanced stuff as well and I am not afraid or disencouraged by tweaking and looking for solutions for a few hours till something runs.
So you would rather buy a used PC? Or maybe my suggested build with only 1 GPU and cheaper motherboard, only 1 SSD etc... would cut the costs down 3000 USD easily
Arli_AI@reddit
I would get a RTX 5090 and as much RAM as you can afford, that's basically it.
jacek2023@reddit
I was able to "win" (gold medal) Kaggle competition with 2070. It was model to identify images in 2019
Smooth-Cow9084@reddit
As they say, start smaller... 16gb nvidia GPU, no ddr5, modest motherboard... Once you know how to do things, upgrade appropriately
Prestigious_Thing797@reddit
I think this is fine, but I would probably go for a cheaper cpu/mobo setup. A few gens old epyc system would let you have some extra slots for more GPUs if you go more into it.
I have been in the space since before LLMs and while renting the cloud can be economical, I much prefer/enjoy having my own hardware. In the cloud I always worry about leaving an instance on accidentally for a long time and getting charged a zillion dollars, and locally if I need to transfer big files around I can do it over my local network or even a USB stick way faster!
The cloud is a great tool but it requires more mental overhead/worry and having a computer sitting under a desk completely sidesteps those and makes more fun IMO.
Own-Lemon8708@reddit
2x rtx 8000 will get you 96gb vram for ~$3500. Cheaper CPU and you're well under budget with more vram and cuda. Add an nvlink bridge(sli) if you can take advantage too.
Django_McFly@reddit
In virtually all hobbies in life, I would recommend a beginner trying to figure things out (so they may not even know if they enjoy this yet), do not take the option that requires multiple thousands of dollars in upfront costs.
Desperate-Sir-5088@reddit
Start with MI50(ROCM) or 3090(CUDA) and with EPYC board
ImportancePitiful795@reddit
Well the above system doesn't seem to cost $6000.
2 R9700s are $2600, the rest can be done with $1100.
And Imho you will be better off with different base platform.
Either 3960X/3970X ($600ish for 3970X) with a board supporting 4 PCIE x16 like the Asrock TRX40 Creation ($280) or even reduce the cost further Huanaznzhi X99-F8D PLUS (the one with the 6 PCIe slots) and 2 x E5-2699 V4 (220+ 8x85).
And go for 4 R9700s.
buenavista62@reddit (OP)
The above system is built on PCIe gen5 and DDR5-RAM. That's maybe I get to 6000 USD. The RAM prices are exploding atm as well. Plus 200 USD for the case.
I will check your recommendations right now. The pieces don't seem to easy to get, at least where I live.
But generally, should I rather aim for PCIe gen4/DDR4 to reduce costs?
Long_comment_san@reddit
Holy shit I wish I could blow 10k for fun experience. I can't do it in my country currency and it's 80 times cheaper than dollar.
MannToots@reddit
You can learn a lot cheaper with an open router account imo
buenavista62@reddit (OP)
I can only do inference with OpenRouter, right?
MannToots@reddit
Oh God no I'm dumb and missed that. Ignore me lol
buenavista62@reddit (OP)
All good :)