AI server under 5k?
Posted by Last_Bad_2687@reddit | LocalLLaMA | View on Reddit | 58 comments
I have a framework desktop 128GB and a 3080 12GB running qwen 7b
I want to move to a proper server rack + switch but not sure how to move from desktop PC to server rack.
Any advice on what GPU/Server to get under 5k? Or at that price just stick to workstation?
Moscato359@reddit
There is nothing "proper" about server racks.
They actually are generally less efficient about cooling, due to constrained size.
The purpose of a server rack is to maximize the amount of hardware per volume, but that doesn't make sense when you have a single machine
Last_Bad_2687@reddit (OP)
Got it. So if space (rather, density) isn't a concern workstation is "better"?
cab938@reddit
Yes, for several reasons: a) racks are often so thin they are space constrained and use small fans. A small fan needs to move at a higher speed and thus creates a ton of noise. A workstation chassis doesn't have this same limitation. b) the extra room in a workstation gives you options for cards (eg max q rtx 6000 pro, which limits current to 300w), allowing you to potentially save money or plan to be able to support near future expansions (say buying a 5k card and going to a dual setup).
Workstations are bad choices if you lack space and need density, and/or you have excellent cooling available and want other data center values like high speed networking, redundancy, disaster management, etc.
Moscato359@reddit
You can have high speed networking, redundancy, and disaster management with workstation
Moscato359@reddit
Yeah desktop is better if space is not a concern
AuditMind@reddit
Absolutely. Workstations are built exactly for this purpose.
d4nger_n00dle@reddit
Yes
alex20_202020@reddit
Same as people here start with one GPU, this person may start with one server and end with full rack.
expressly_ephemeral@reddit
Similar thing happened to a girl friend of mine who got addicted the breast augmentation surgery.
the-username-is-here@reddit
You are lucky man you. Regular boating trips!
Moscato359@reddit
A rack mount is just a pc case
You can just rebuild into it in the future if you want
alex20_202020@reddit
Yes, one can place motherboard on the table and use it this way. Later if she / he wants then will put it into the case. Only people rarely do that - they want hardware to look nice.
Moscato359@reddit
They already have a case...
El_90@reddit
That single sff also needs a full switch ;)
Lissanro@reddit
There is no reason to use server rack unless you are putting your rig into a datacenter. Up to two GPUs, normal PC case is usually sufficient. If you plan to have more GPUs than that, mining frames work the best by providing better airflow and plenty of space for additional PSU, HDDs/SSDs and other hardware.
For GPU-only inference with up to two GPUs, usually even a gaming motherboard is fine, assuming it has a place for the second GPU. For low budget, 3090 still remains one of the best options, or if you really want something newer, consider 16GB from 5xxx series may make sense. 5090 currently way overpriced in most places, so my opinion is either buy less expensive card(s) now or save up more for RTX PRO 6000.
Also, I suggest trying Qwen3.6 35B-A3B, even if you have to offload to RAM, it may work very well. Especially if you buy an additional GPU with 16GB or 24GB VRAM.
Last_Bad_2687@reddit (OP)
The consensus in the comments seems to be 2x 3090 in workstation format for the budget
kmouratidis@reddit
If you can find them at decent prices! Don't pay >$1k for them. At that price, either go for single Nvidia GPU but more VRAM if you care about image/video generation, or dual R9700 for more VRAM.
FusionCow@reddit
with 5k, (assuming that the 5k includes selling the 3080 and framework), you should buy a 3090, 64gb ram, and probably a 7950x cpu. I don't even think this will quite cost 5k, but in my opinion a 4090 is not worth the jump over a 3090. either go all the way to a 5090, or go to a pro 5000 or 6000
CalligrapherFar7833@reddit
For 5k your best perf option is probably to just upgrade the 3080 to something with bigger vram
YourNightmar31@reddit
2x 5090 or 3x3090 with new psu maybe
I1lII1l@reddit
technically the truth, OP just wrote 5K, he might have meant 5K grams of gold, or 5K diamonds....
cleversmoke@reddit
In today's prices, I'm seeing 1x 5090 = 3x 3090 😭
Last_Bad_2687@reddit (OP)
So keep workstation configuration and just throw in cards right? No real benefit to server config?
vasimv@reddit
In server you'll get ECC memory, hot-swap HDD/SSD and hot-swap power supply. First one is good to have but no real point for LLM stuff (no ECC in video card anyway without losing memory and performance). Second and third are nearly useless for home builds too. Also, noise level will be much higher.
the-username-is-here@reddit
If you want low-maintenance and compact option, DGX Spark would fit into price range. It runs 120B-class models with acceptable performance, don't think any server would match that for the money.
__JockY__@reddit
RTX 5000 Pro and a junker to put it in. You get Qwen3.6 27B FP8 with over 200k tokens of KV at BF16. It’ll do prefill at 4400 tokens/sec and inference at 80 t/s. Works perfectly with Claude cli, is fully multi-modal, and the 5000 runs quiet, cool enough, and is 300W.
There isn’t a better deal around.
c_pardue@reddit
i built a 64gb vram open air server for 2,500.
spec out a threadripper, eatx board, and some GPUs.
look for higher gen pcie slots, i'm on four pcie gen3 8x and it is a bottleneck of sorts.
XN8DY8VBMU4E3DP4LXBT@reddit
This is the way
Last_Bad_2687@reddit (OP)
Sick thank you
grabber4321@reddit
3x r9700 pro should fit that budget and a bigger PSU.
AMD is not the best, but the amount of VRAM you get for it is more than anything you can purchase with any other company.
Last_Bad_2687@reddit (OP)
ROCm? I an using ollama and want to switch to llama.cpp this seems like the deep end lol
grabber4321@reddit
if you want the proper amount of VRAM to run good models, its the way.
This guy runs 2x 9700: https://www.youtube.com/@donatocapitella/videos
Lots of videos on 9700 pros
Last_Bad_2687@reddit (OP)
Awesome pretty sure that's the strix halo toolbox maintainer for vlm
sooki10@reddit
Get 3 or 4 rtx pro 2000 blackwells, bandwidth isnt highest, but acceptable.
jikilan_@reddit
Able to share any performance numbers?
DataScientist305@reddit
Just find GGUF version of models
Clear-Ad-9312@reddit
For 5k? RTX pro 5000 48GB or DGX Spark (Asus GX10).
or if you have enough pcie lanes and love to be experimental, then 4x Intel b70 cards (32GB x 4 = 128GB)
but prices are crazy, only realistic option is to try to double it for an RTX pro 6000, which in my spreadsheets wins in terms of performance per cost. You only need 1 pcie lane, and it is 96GB. Everything else aside from 3090 @ $1k (getting more rare) and Intel's cards are going to be cheap enough to run a large model fast enough.
DGX Spark is pretty convincing though for the large memory needs, but it has weak token generation that MTP seems to help with.
lebbi@reddit
I just put the framework motherboard into a 2u rack mount case and now its a server
1ncehost@reddit
I built a 4x MI100 (128 gb vram total), 48 core epyc server in January for $5.5k. It is ATX form factor in a tower case. I've been happy with its performance. I run it on low power profiles so it uses about 700w full load.
I have a strix halo laptop and due to pcie constraints the gpu server isn't as fast in tg but is much faster in pp. So generally if you are planning to run large contexts go GPU but if you are planning to run small context go with one of the lpddr5 solutions. I recently saw the asus gb20 box on amazon for $3.5k which is a great option.
FearFactory2904@reddit
Look at your tower.
Say "this is my server."
Now turn it sideways.
Slide it into a rack shelf.
Now say "this is my rackmount server."
Alternatively if you need to fit a dozen of them in the rack, the normal rackmount servers are more condensed. A cheap rackmount server is basically like a flattened out desktop pc. An expensive rackmount server can have things most desktops cant, like over a TB of ram, 20+ physical disks, dual cpu and redundant power supplies, etc.
Coincidentally, all that density makes it hard to just get an old server and try to shove in multiple GPUs. I can fit more GPUs in my threadripper desktop case than I can in my poweredge rack servers.
You can get larger rackmount servers specifically made for having many enterprise GPUs but those would cost your firstborn, a kidney, a second mortgage, and then some. Alternatively there are also just really tall rackmount cases can stick a normal pc motherboard and such in if you want to stick with pcie cards GPUs. But thats only more convenient than a desktop case if you have to stack a bunch of them on top of each other. Hopefully that helps give a little perspective.
no_witty_username@reddit
What you are describing is a 5090 pc territory no "proper enterprise level server " necessary.
Saraozte01@reddit
I am using a mac studio M3 Ultra @ 256gb. Works very well for inference and cost me slightly over $5k
mat_le_mat@reddit
dgx spark is pretty good
ttkciar@reddit
I'm a big fan of Supermicro servers based on either X10DRU-i+ or X10DRI-T4+, which are LGA2011-3 systems (E5 v3 and v4 Xeons). They're not fast, but they're cheap ($1K or less) which leaves you plenty of budget for your GPU(s), which is what does the heavy lifting for inference anyway. Go with the twelve-bay servers even if you're not going to fill it with hard drives, because you'll want the extra room for the GPU, the GPU's PCIe power cables, and airflow.
You should be able to find a 32GB RTX 5090 for about $4K if you shop around a bit. There are also 32GB MI50 to be had on eBay for about $600, but your prompt preprocessing time would be very long with those.
Since RAM is so expensive you'll probably want to use RDIMMs and only fill half of the server's memory channels anyway, but if you decide to fill all of the server's memory channels be warned that you will have to use LRDIMMs. Be sure to download the owners manual for the server and read about memory configuration before you order the memory.
Annual_Award1260@reddit
I have a x10dri-t with 1TB ram. Pci 3.0 is kinda a dea breaker
a_beautiful_rhind@reddit
The numa and the slow procs are more of a deal breaker these days. For all that money you spend on DDR4 it will hybrid like a dog.
ttkciar@reddit
Really? PCIe-3.0 hasn't been an issue for me at all. The slowness of its DDR4-2133 is more significant.
SailbadTheSinner@reddit
What is your ultimate goal? What does the power situation look like where this is going to live? How noise tolerant are you? How does the money for this project come in (one big blob in your possession now, smaller repeating amounts like a paycheck, or infrequent larger amounts like quarterly bonuses)? That may determine if/how you scale out. For example, my AI rig is an open frame, ASRock Rack romed8-2t, 512GB, EPYC 7F52, 2x1600W power supplies and 6 3090 FEs. I have enough money in that rig that I could technically have fewer higher-VRAM GPUs in a better form factor, or a Mac ultra, or a DGX Spark cluster, but for me the money wasn’t coming in a pattern that facilitated any of that. I sunk my quarterly bonus checks from work into it over time. I wasn’t about to go into debt for the project, but I also knew I would fall behind if I didn’t get started, so my pattern was to use less expensive older technology (EPYC Rome and 3090s) and scale up over time in chunks that I could afford.
_angh_@reddit
Won't going mac mini be the best option here? For this budget it could be fine.
Another option, get a pc and Chinese 96gb gpu.
Last_Bad_2687@reddit (OP)
Super hard to find. Would love the 512GB Mac studio but the ones on eBay are most likely scams. I found one for "3k"
Kal-LZ@reddit
Get 2 Radeon R9700, Core Ultra 270K, Z890 board with 8x bifurcation. You will run 27-35B models with high context.
Last_Bad_2687@reddit (OP)
Is that rocm or vulkan? I use the strix halo toolboxes for the framework desktop but the cuda stuff for my 3080. Not sure how Radeon cards worj
etaoin314@reddit
spend the money on a rtx 5000 with 48gb of ram or if you are feeling really spicy, 3 r9700 from amd if you have the pcie lanes for it.
Last_Bad_2687@reddit (OP)
I think my mobo only has 2 its a b450
jojotdfb@reddit
You can just buy server cases. I have a nice Rosewell 3u that I threw an n100 motherboard into with a bunch of hard drives as a nas. You could do the same and take your current desktop build and just put it in a case. An Ikea Lack end table and you're good to go.
Last_Bad_2687@reddit (OP)
Sick! Thank you
alex20_202020@reddit
If you want "proper server rack + switch", start with selecting a rack that will fit nicely.
BTW, "switch" - what do you mean exactly? Network Switch?
Fine_Nectarine9328@reddit
just upgrade to bigger gpu best option