$3000: What GPU for my use case? Will my setup work? Dedicated machine instead?
Posted by CrayCJ@reddit | LocalLLaMA | View on Reddit | 21 comments
Hi! My PC has the following specs:
| GPU – to be replaced, see below* | AMD Radeon RX 5700 XT, 8GB (Powercolor Red Devil) |
|---|---|
| CPU | AMD Ryzen 5 3600X, 3.8 GHz, 6-core |
| RAM | 2x 8GB, 3600 MHz, DDR4 (G.Skill Trident) |
| Motherboard | MSI B450 Gaming Pro Carbon AC |
| Power supply | Corsair HX 750 Platinum |
| Storage | 1TB – Adata SX8200 |
| Operating system | Windows 11 |
My LLM use case: Academic research based on text and creation of teaching materials. I would like for it to be able to handle 50–100 mainly text pdf-files i.e. entire books, understand, search, compare and summarize their content, assess and comment on the content across all files according to specific questions, output quotes incl. exact page numbers. Also, the LLM should help with creating teaching materials based on a different stack of documents, mainly textbook pdfs – teaching materials like worksheets and even dossiers, if possible powerpoint presentations, incl. pictures searched from the internet, ... In that regard, a certain creativity is welcome. No video, audio or picture generation. Occational and limited statistical work and coding. Autonomous use of the computer's UI – like antigravity seems to be able to – would be very helpful, but is probably not possible yet? My LLM knowledge is not quite extensive yet...
My questions:
- I cannot estimate what size of LLM or which i need for this usecase, but my *current GPU is not good enough. Claude recommends an LLM size of 70B, at least 34B and thinks a RTX 3090 to be enough and is happy with my other components. I have a budget of about $3000, approx. the price of a RTX 4090/5090 here. What is your recommendation? Any specific GPU model? Do I need a better power supply?
I would also consider AMD, but nvidia seems to be clearly recommended by this community. I don't know if a 4090 or 5090 is overkill or whether there would be other bottle-necks with those...
-
As this machine is also a hackintosh, that needs an AMD GPU, I'd like to keep using my current GPU in the lower, slower PCIe 2.0 slot (x4) of my motherbord to connect to the monitor while the new GPU is run in the upper PCIe 3.0 slot (x16) for the LLM. Of course, I will run the LLM on Windows only. Is this dual GPU an issue regarding the LLM?
-
Or, do you recommend me to build a dedicated LLM machine, which hosts the LLM on the network to be accessed? This of course limits my budget for the GPU.
-
Can you recommend a specific model? In the end of year thread, I've read good things about Qwen3-Coder-30B-A3B:Q4 and Q8.
Many thanks for your help!
Little-Tour7453@reddit
I would say find a used Mac Studio with 128GB RAM.
CrayCJ@reddit (OP)
Huh, thanks. A used Mac Studio (M4 Max 16 core CPU 40 core GPU with 128 GB RAM 1TB SSD) seems to be definitely in the ballpark of my budget (even new) and would be a complete, hassle-free, small form factor solution... Do you know what size of LLM I could run on that?
Little-Tour7453@reddit
I would say 96GB RAM one can run Q4 90B if you squeeze it hard. MLX Swift would be the best framework for that.
But question is do you really need 90B? Even 35B Qwen is pretty much head to head with Sonnet when it comes to reasoning.
Yeah Mac Studio’s are hard to find. I’m on a waiting list for two months for 256GB one.
CrayCJ@reddit (OP)
Thanks for all your replies! You're right, maybe I can settle on less RAM on the Mac Studio or even a powerful Mac Studio altogether for my needs: The unsloth version of Qwen3.5 35B is 70GB on BF16 and just 38GB at 8-bit quantisation and 30GB at 6-bit...
The "cheapest" (new, I cannot find any used or older models) Mac Studio that is actually available, is a M4 Max 14/32 cores CPU/GPU on 36GB of RAM with 2TB, shipping in 2 months @ $2300. Now, the same with 16/40 cores CPU/GPU and 64GB RAM, also shipping then, is @ $2900, within my budget... The previously mentioned Ultra 28/60 cores CPU/GPU is $3900 but with 2TB as you recommend...
What do you think, is the least GB of RAM that I need?
Little-Tour7453@reddit
For 35B, 96GB RAM would chew it raw. Less than that you will be in ‘not enough headroom for KV Cache’ territory.
My honest take;
If you can get the best setup you can afford. By doing that you can guarantee next 4-5 year instead of having to replace what you have in just 2 years.
Because I think we are in sweet spot with local models that are around 30-70B. Bigger than that doesn’t really add any benefit to do coding, research, tool usage locally and as we can see on Gemma, model makers focus on context management rather than making bigger models.
CrayCJ@reddit (OP)
I see your longevity argument...! On another note: Why do you recommend 2TB storage instead of 1TB?
Little-Tour7453@reddit
Because you download one 35B then another one pops up then Opus distilled comes in then Google releases Gemma 16373747 and you end up with dozens of models that you switch between.
This is the fun part, don’t miss it.
CrayCJ@reddit (OP)
:) Alright, I see. Thank you for all your replies and time! I'll consider all of this. Have great day!
Little-Tour7453@reddit
No worries. Enjoy whatever new setup you get.
Little-Tour7453@reddit
Many will say the opposite but Apple Silicon + MLX + shared memory performs better than running models on a GPU. Especially for long running tasks when you need to stack KVCache up.
And as a bonus, you can utilize Apple Intelligence for whatever tool using tasks on neural engine that costs almost nothing as it’s a standalone hardware.
Little-Tour7453@reddit
Oh go with 2TB SDD. You will thank me later.
Middle_Bullfrog_6173@reddit
The cheapest way to get ok inference is to add a GPU to your current system. But the low RAM and relatively slow CPU will limit what you can do. Even when your model fits fully on the GPU you need the CPU to move stuff around, run the drivers etc. And do the actual work causing the inference calls.
I would not put a $3k GPU into that system unless you plan to upgrade and move it soon. A system upgrade with fewer bottlenecks, even with lower raw GPU performance, would be an overall better experience IMO.
CrayCJ@reddit (OP)
Thank you for your response. I understand your point regarding the bootlenecks of my current system. I think upgrading the RAM to at least 32 GB would be necessary regardless of the GPU. Can you recommend a $1500-2000 GPU in that regard so that I could afford the necessary other upgrades (RAM, further CPU, maybe motherboard)? Do you have a decent CPU in mind? Thank you!
Middle_Bullfrog_6173@reddit
I would personally just buy either the B70 that another poster mentioned or an AMD/Nvidia GPU of similar caliber (more expensive) if you are less adventurous. Then see if you run into bottlenecks in practice or can live with your system.
The next step would be a full platform upgrade to AM5 and DDR5. I don't think it makes sense with current prices to invest in your existing platform unless you find e.g. a good deal on used memory.
CrayCJ@reddit (OP)
OK, I see either a decent, but not overkill gpu with my current system or a dedicated system with is way more expensive...
jikilan_@reddit
Try upgrade the ram to the max.
U r being constrained by old platform. U can get a good gpu first then later upgrade the rest of it.
Based on what I see here. You are a value build seeker. Maybe a modern Nvidia 16gb gpu that you can afford first. Otherwise just r9700 pro.
CrayCJ@reddit (OP)
Thank you! I heard you regarding the bottle necks of my system. Others have also suggested the Pro R9700. Do you have a specific Nvidia GPU in mind?
TuskNaPrezydenta2020@reddit
Buy a Radeon Pro R9700 and spend the rest on getting a decent cpu/storage
CrayCJ@reddit (OP)
Thank you for recommending the R9700. I heard you (and others) reagarding a RAM upgrade! Any CPU in mind? Do I need more SSD storage?
Puzzleheaded_Base302@reddit
if you plan to spend $3000 on GPU. I would recommend RTX PRO 4500 32GB Blackwell (not Ada Lovelace). It will run qwen3.5-27@q4 with up to 115K context length on LM Studio (llama.cpp). You must have 32GB memory, or LM Studio keeps crashing due to VRAM out of memory when you run long context. It will give you 36 token/s from the dense model. reasonably good tool calling capacity. You do not need to upgrade anything else of your PC.
Buying 3090, 4090, or 5090 online is risky. I see a lot of scammers on ebay selling them at 1/3 of the market rate, the accounts are all created within the last months with 0 sales. RTX PRO 4500 can be purchased from legit channels new and they are in stock. So, it is the safer bet.
If speed is not a problem. Intel Arc Pro B70 can run all the models RTX PRO 4500 can, but at 1/3 of the token rate. It also costs only 1/3 of RTX PRO 4500 at $950.
when you evaluate options only about token rate, pay attention that most review only talk about high concurrent use case, because the number looks good. But your use case is single query, no concurrency, it is different.
CrayCJ@reddit (OP)
Thank you for your real quick (!) and comprehensive answer! Greatly appreciated, also the buying warning! I will consider RTX PRO 4500 - what you say regarding the xx90s seem to be true here too. Also thank you for the model recommendations!
Yes, it's about fun and our OWN thing. So, let the spice flow...
(Regarding the macOS – which is irrelevant in this sub, of course – no there are no drivers for most nvidia cards under macos, that's why I'd keep the current card on a slower pcie slot. But hopefully macos should therefore ignore that gpu and just continue running as is. I will ask this on r/hackintosh to make sure.)
Have a great day!