$6k AMD AI Build (2x R9700, 64GB VRAM) - Worth it for a beginner learning fine-tuning vs. Cloud?
Posted by buenavista62@reddit | LocalLLaMA | View on Reddit | 48 comments
Planned Build:
- GPU: 2x AsRock Radeon AI PRO R9700 (2*32GB = 64GB Total VRAM)
- CPU: AMD Ryzen 9 9950X
- Motherboard: Gigabyte B850 AI TOP
- RAM: Corsair 2x48GB DDR5 6000 MHz (96GB Total)
- Storage: Lexar NM990 2TB SSD (OS/Apps) + Lexar NM1090 Pro 4TB SSD (Models/Data)
- PSU: be quiet! 1200 W
- Cooling: Corsair NAUTILUS 360 AIO
My Situation & Dilemma:
I'm a beginner to local LLMs. My goal is to learn/study fine-tuning concepts and tinker with diffusion models. I'm torn because:
- Building: It would be a cool experience, and I'd have a powerful local machine for experimentation.
- Cloud: I could rent compute only when needed, potentially saving money upfront.
I'm aware of the current software disadvantages of ROCm compared to CUDA, but I'm betting on AMD's future improvements.
What would you do in my shoes? Is the hands-on learning experience worth the \~$6k investment, or would I be better off putting that money towards cloud credits? Do you see other advantages/disadvantages between these two options? I'm also open to alternative build suggestions at a similar or lower price point.
Any recommendations or shared experiences are highly appreciated!
Thanks in advance!
Defilan@reddit
Interesting build! A few thoughts:
On the AMD bet: The 64GB unified VRAM is genuinely compelling for fine-tuning since you can fit larger batch sizes and full-precision weights. But ROCm support for diffusion models specifically has been rough. Stable Diffusion and ComfyUI work, but expect to spend time debugging. Fine-tuning frameworks like axolotl have better ROCm support now, so that side is more viable.
Build vs cloud for learning: Owning hardware changes how you learn. You'll experiment more freely when there's no meter running. That said, $6k is a lot to drop before you know what workflows you'll actually use long-term.
Alternative to consider: I built something similar for around $2,400. Dual RTX 5060 Ti (32GB total VRAM), Ryzen 9 7900X, 64GB DDR5. Gets \~44 tok/s on 13B models, handles 70B quantized, and CUDA just works out of the box. Leaves you $3,500 for cloud credits when you need to scale up for actual training runs that need more VRAM. For learning fine-tuning concepts and tinkering with diffusion, 32GB VRAM is honestly plenty. You'd only need the 64GB if you're training larger models at higher precision or running very large batch sizes. I run MicroK8s on mine with LLMKube for orchestration, which makes it easy to swap models and manage inference endpoints. The hybrid approach might be worth considering: cheaper local rig for daily experimentation, cloud burst for the occasional big training job. Best of both worlds without betting $6k on ROCm improving.
What size models are you planning to fine-tune?
Wide-Ad-1349@reddit
This is the way!
buenavista62@reddit (OP)
Man that might be the best advice yet! I really haven't thought at which models to fine-tune. I would start small for sure. The hybrid approach makes really sense. Once I'm experienced enough I can still fine-tune larger models in the cloud.
Which motherboard do you use if I may ask?
Defilan@reddit
MSI B650 Gaming Plus WiFi. Picked it for the dual PCIe 5.0 slots (x16 + x8) so both GPUs get proper bandwidth, plus it has 2.5 GbE built in which is nice for serving models over the network. Runs around $170.
Starting small is the right call. You'll learn way more iterating quickly on 7B/8B models than waiting forever for a 70B fine-tune to finish. And yeah, once you have the workflow down, spinning up an A100 on Lambda or Vast for the big jobs is painless.
Good luck with the build. Let us know how it goes!!
buenavista62@reddit (OP)
First of all: thank you so much for your support! Helps a lot.
I've checked the specs of the motherboard and it says it only supports x16 + x4? https://www.msi.com/Motherboard/B650-GAMING-PLUS-WIFI/Specification
Or am I misunderstanding something? That's why I targeting the Gigabyte B850 AI TOP, which should support x8/x8
Defilan@reddit
Ah, typo on my part (my fault for using the phone lol). I meant the B850, not B650! You're right that the B650 Gaming Plus is x16/x4. The B850 AI TOP with x8/x8 is exactly what you want for balanced dual-GPU setups. Good catch and good choice on your end!
jacek2023@reddit
You won't learn anything by purchasing the gear. This is true for photography and this is true for AI. You can learn with minimal equipment. These posts are always "help me burn some money"
buenavista62@reddit (OP)
Thanks! I am also leaning towards not buying such a rig. It really feels like a "waste of money", when it's just about learning.
And sorry for the dumb question, but: What is minimal equipment for you? Is a Laptop with a iGPU minimal equipment already?
insulaTropicalis@reddit
No, having a GPU is very important if you want to finetune. You definitely want one. Then, even if it's a laptop GPU, you can try whatever you want, you can finetune a very small model or even train from scratch a tiny one (with millions parameters instead of billions).
Wide-Ad-1349@reddit
Absolutely agree. But you can learn a lot on almost any recent computer before you take that expensive plunge. I trained models from scratch on an Intel 1260p with Iris Xe. it was quite fun to push it. I think you can spend months or more on such a device and still have a lot to learn :)
Coldaine@reddit
Take all this money, and put it towards cloud computing time.
I'm not kidding. It was a pretty big hill to climb when I did it (with a math background, nobody I knew/no experience with VM or LLMs) and I had a fair amount of gear already, but now being able to spin up any size VM from any provider to do anything from lora training to serving me oss120b at a firehose pace for an hour.
Arli_AI@reddit
I would learn on an nvidia gpu unless you want to learn to get things running on RoCM instead. And I disagree with others saying you don’t need gear to learn. You absolutely do in order to do a lot of the more advanced stuff (abliteration, quantization, finetuning, etc) that might not be widely supported and take you a long time to even get running in the first place.
buenavista62@reddit (OP)
Thanks. Generally, I really want to learn about the advanced stuff as well and I am not afraid or disencouraged by tweaking and looking for solutions for a few hours till something runs.
So you would rather buy a used PC? Or maybe my suggested build with only 1 GPU and cheaper motherboard, only 1 SSD etc... would cut the costs down 3000 USD easily
Arli_AI@reddit
I would get a RTX 5090 and as much RAM as you can afford, that's basically it.
jacek2023@reddit
I was able to "win" (gold medal) Kaggle competition with 2070. It was model to identify images in 2019
sleepingsysadmin@reddit
That cpu isnt enough for those 2 cards. Pcie lanes of \~24, less some lanes for crossfire; whereas you want 32x for just the gpus, and need even more for other pci connectivity. You'll end up in like 16x 4x or 8x 8x for those gpus and that'll be a performance problem in many cases.
My theorycraft recently when the R9700s were only $1500 was much like your build,
Here's the thing, how about just 1 card? 32gb lets you run all the small models. The ddr5 ram there will be enough to tackle Q4 medium sized models like gpt 120b. Lets say that's still a $4000 computer though. The $20/month cloud options will go a long way and you get quality models you cant even run on that 64gb of vram.
Compilingthings@reddit
Running them x8 x8 is fine for fine tuning. Pcie will not be the bottle neck.
buenavista62@reddit (OP)
The CPU with this Mainboard can allocate PCIe gen5 x8 for each of the GPU, that's like PCIe Gen4 x16 for each card, right? Should be enough I guess
But I see. Probably using the cloud is the best option economically.
AlwaysLateToThaParty@reddit
Local hardware is a requirement for privacy but it will always be more expensive than a cloud provider. Especially if you're learning, cloud providers are better.
Mkengine@reddit
I can recommend this book if you are interested in finetuning.
buenavista62@reddit (OP)
Thanks!
xxPoLyGLoTxx@reddit
I wouldn’t invest into an AMD AI rig personally. I say this as someone currently using an AMD GPU. They are much better value but CUDA is far far ahead. I hope I’m wrong but Nvidia has always had better drivers and support versus AMD.
ThatOneGuy4321@reddit
If you're a beginner, it's a lot more worthwhile to rent that infrastructure via cloud providers on a per-hour basis than it is to buy local hardware for it. It will take you a long time to get up to $6K in costs that way.
A top-of-line PC with 64GB of VRAM is unlikely to do anything that, say, a Macbook Pro with the same amount of unified memory (or 48GB, \~$2400) can do. A Mac Studio with 256GB of unified memory is also less expensive than your build, at $5,600.
VRAM (or unified memory) is far and away the most important spec for local inference because you can only read a response so quickly. More important for it to be an accurate response (more parameters) than for it to have a high tokens/sec speed.
ImportancePitiful795@reddit
Well the above system doesn't seem to cost $6000.
2 R9700s are $2600, the rest can be done with $1100.
And Imho you will be better off with different base platform.
Either 3960X/3970X ($600ish for 3970X) with a board supporting 4 PCIE x16 like the Asrock TRX40 Creation ($280) or even reduce the cost further Huanaznzhi X99-F8D PLUS (the one with the 6 PCIe slots) and 2 x E5-2699 V4 (220+ 8x85).
And go for 4 R9700s.
buenavista62@reddit (OP)
The above system is built on PCIe gen5 and DDR5-RAM. That's maybe I get to 6000 USD. The RAM prices are exploding atm as well. Plus 200 USD for the case.
I will check your recommendations right now. The pieces don't seem to easy to get, at least where I live.
But generally, should I rather aim for PCIe gen4/DDR4 to reduce costs?
Miserable-Dare5090@reddit
Ignore this, your use is not slow ass inference but training. You’ll see many recs here that are either
1, not thinking about how expensive RAM is already, because they have gobs on hand,
2, not thinking about training, and usually use models for some 1 shot stuff or very small contexts,
3, Run models inferencing at 5tkps and “it’s fast enough for me”,
4, Wish they had Ngreedia cards so they push AMD, but in reality you will need CUDA to really learn.
Not trying to offend. It’s just what I see commonly. People come asking for a business set up that will run huge models and go really fast and people are recommending a strix halo, etc.
buenavista62@reddit (OP)
So you would rather get 2 5060 Ti than 1 Radeon AI Pro R9700, right? Don't you think that AMD can catch up in terms of software capability in the next few months?
Miserable-Dare5090@reddit
Hardware wise I think AMD GPUs are great.
It’s the software stack, scalability, and ubiquitous use of Nvidia CUDA that I base my comment on.
It’s like Windows in the late 90s vs Linux. No one wanted to use Windows bc it was a hellhole code monstrosity but more than 90% of the world’s computers at the time ran windows.
woahdudee2a@reddit
get any PC with 64 GB RAM then if you're feeling feeling rich buy an rtx 5090. otherwise grab a 3090 from ebay. that's it
thebadslime@reddit
If I had 6k to buy training shit with, I would get two asus ascent gx10.
buenavista62@reddit (OP)
Isn't the memory bandwidth a bit problematic?
StardockEngineer@reddit
Are you on this to learn? Because nothing will teach you more than having a mock cloud environment
Also, I own one and fine tune all the time. It’s plenty fast because there is a lot less shuffling of data in and out of VRAM. This means larger batch sizes, fewer gradient updates, more stable gradient estimates, better convergence.
You can also fine tune larger models.
buenavista62@reddit (OP)
Sounds promising! I really wanna learn. Is the more expensive GB10 a better option than the Strix Halo 128GB counterpart?
Inference speed is not as important to me as the ability to train models. The more the better
StardockEngineer@reddit
The Strix Halo just can’t be cluster today. Not for inference (heavy performance loss). Not for training. Whereas the Connect X in the Spark related systems are plug and play.
Watch this video. While not strictly on this topic, it does cover a lot and he connects two together. https://youtu.be/sx6ANedcIfI
thebadslime@reddit
at 273 it's not great, but it's a cuda system with a ton of ram and tops. One would likely outperform the dual gpu setup
Nearby_Truth9272@reddit
I will 2nd others on here, that if you are just going to learn and practice with diffusion models and what not, if you are not training but using for inference then maybe consider something for much less and that is quiet, doesn't eat your power costs. I have owned these types of comptuers for at least a decade -- in ML, AI and blockchain works -- towers can be loud. Best way to get them quiet is liquid cooled.
I did this in 2021 for new build with my child, similar costs. Very quiet, very cool, total waste of money and the GPU... I used for all sorts of things. All I could find were dual RTX 3080 10GB for GPU mining at the time, single 3090 would have been smarter. What I am saying here is, I doubt you truly will ever use more than one of those GPU in 99% of what people do.
Both AMD and Nvidia cater to just this. For AMD, maybe consider the Corsair AI Worksation with AMD AI Max 395 or IMHO, if you can live with ARM and will not be gaming, try GB10. The DGX Spark at my local Microcenter, is quiet and small. If you want to save $1k, look at ASUS or MSI GB10 models. $2k-$3k will more than likely do everything you need it to do. Case in point, I can run a whole local LLM, with full TTS inputs and outputs on GPU, with MCP integrations on 16GB of VRAM. I can run an SLM and LLM with full congnitive processing, LIDAR point cloud mapping, etc. with 8GB.
It is these crazy things that are truly driving our AI bubble and new generations of chips every year...
buenavista62@reddit (OP)
I would like to train models as well. I see your point, thought. Maybe it's smarter to build something for half the price
AppearanceHeavy6724@reddit
R9700 have relatively low bandwidth. Might be not as important for finetuning but certainly an issue for inference.
Smooth-Cow9084@reddit
As they say, start smaller... 16gb nvidia GPU, no ddr5, modest motherboard... Once you know how to do things, upgrade appropriately
Prestigious_Thing797@reddit
I think this is fine, but I would probably go for a cheaper cpu/mobo setup. A few gens old epyc system would let you have some extra slots for more GPUs if you go more into it.
I have been in the space since before LLMs and while renting the cloud can be economical, I much prefer/enjoy having my own hardware. In the cloud I always worry about leaving an instance on accidentally for a long time and getting charged a zillion dollars, and locally if I need to transfer big files around I can do it over my local network or even a USB stick way faster!
The cloud is a great tool but it requires more mental overhead/worry and having a computer sitting under a desk completely sidesteps those and makes more fun IMO.
Own-Lemon8708@reddit
2x rtx 8000 will get you 96gb vram for ~$3500. Cheaper CPU and you're well under budget with more vram and cuda. Add an nvlink bridge(sli) if you can take advantage too.
Django_McFly@reddit
In virtually all hobbies in life, I would recommend a beginner trying to figure things out (so they may not even know if they enjoy this yet), do not take the option that requires multiple thousands of dollars in upfront costs.
Desperate-Sir-5088@reddit
Start with MI50(ROCM) or 3090(CUDA) and with EPYC board
Long_comment_san@reddit
Holy shit I wish I could blow 10k for fun experience. I can't do it in my country currency and it's 80 times cheaper than dollar.
MannToots@reddit
You can learn a lot cheaper with an open router account imo
buenavista62@reddit (OP)
I can only do inference with OpenRouter, right?
MannToots@reddit
Oh God no I'm dumb and missed that. Ignore me lol
buenavista62@reddit (OP)
All good :)