I Put a Datacenter GPU in My Gaming PC for £200

[-]

looktotheson@reddit

Incredibly well written blog. Thanks for sharing! Might give this a shot later this year as well

Reply

[-]

cafedude@reddit

Makes me wonder what we'll be able to get for fairly cheap when the current generation of datacenter GPUs is retired.

Reply

[-]

This topic feels like the elephant in the room. Companies with no profits are taking on enormous debt to built datacenters with gpus that have a 6 year life at 24/7 use. Now with the right cooling etc, maybe they push to 8, but even at 8 years, replacing all these gpus will almost be the same capex spend as it is to build the DCs right now. It looks to me like a 6 year long scam, because there is no way they can make a market out of what they are doing now. Divided down into per token cost, including all operating cost and sunk cost, over a 6 year period, will just be wildly too expensive for most people to use. I suspect that they are using this 6 year period and all the conpute to train models which they otherwise wouldn't be able to train. After that, I bet AI is a whole lot less available to the public, and the rug will be pulled from under anyone who built up reliance on it. One massive scam. So a few people can end up with a wildly powerful AI for their own use.

Reply

[-]

stonktraders@reddit

And 6 years old GPUs won’t be considered as obsolete when they decommissioned. Just look at how slow NVIDIA is rolling out their products these days and how little raw performance gain in each generation and even cut back on memory bandwidth

Reply

[-]

etaoin314@reddit

performance gain from ampere to Ada=2x. Ada to Blackwell=2x+ that seems pretty good to me, so I am not sure what you are talking about there. Yes the next gen is a bit slow to come out and who knows how fast vera will actually be, but nothing i have seen suggests that it is going to underperform.

Reply

[-]

stonktraders@reddit

the 2x you are talking about comes from fp8 and fp4 support. The raw performance in each generation is just around 20-30% without grame gen

Reply

[-]

TimeSalvager@reddit

...DCs are a lot more than than just the hardware. When they do a hardware refresh, all the physical infrastructure that houses the hardware is still fine. The capital outlay for labor, concrete pours, power distribution and everything that isn't the hardware is substantial, but still completely valid during a hardware refresh.

Reply

[-]

Objective-Picture-72@reddit

Right but the GPU hardware cost is like 60% of a data center cost right now. So the 40% residual has a long useful life but 60% of the cost is essentially deprecated in 6 years. Not sure if the major AI labs get a return on that 60% over 6 years consider how low the pricing is AI tokens right now.

Reply

[-]

CorpusculantCortex@reddit

Yes but in 6 years the hardware will be more performative and cheaper. Also for general purpose models there are multiple companies making model on chip accelerators that are a fraction of the cost to produce, and much faster processing. So once the growth phase levels out and there are stable performative models, available hardware will adjust to reduce costs. Plus 6 years is a lot of time for revolutionary thinking in an essentially brand new sphere. Every 6 months or less a paper is published that makes models smaller, faster, more reliable in the next generation. And the biggest thing is tokens arent that cheap. Like for agentic professional flows it is serious money, so the other side of it is that user will necessarily find more efficient ways to leverage the tools and connecting mcp to claude and saying pull out xyz data and make me a report. Because that is wildly token inefficient and long term a company can't justify spending 3$ every time an internal report needs to be run. But that is how people are using it because people who don't know how to design data and sw systems are being given a tool that let's them think they can. There is no way to predict where it will go.

Reply

[-]

Objective-Picture-72@reddit

I have no idea what you're trying to say tbh

Reply

[-]

XeNo___@reddit

Datacenter hardware is also more than just the compute nodes. Think of all the Networking stuff like Switches, NIC's, ... Just the switches (depending on the topology) can get \*very\* expensive very fast. All of that remains untouched when a HW refresh comes around.

Reply

[-]

thehpcdude@reddit

Absolutely do refresh switches, especially on back end and InfiniBand fabrics. Even the front end Ethernet stuff gets replaced most of the time.

Reply

[-]

Aphid_red@reddit

Given that NVidia sells its chips for $50K a piece or more, the rest of it is a rounding error.

Reply

[-]

Emotional-Dust-1367@reddit

> Divided down into per token cost, including all operating cost and sunk cost, over a 6 year period, will just be wildly too expensive for most people to use. Do you have the math for that cost-per-token divided down? I’d love to see it because I don’t think the GPU cost is the largest thing affecting it

Reply

[-]

quantgorithm@reddit

even the efficiency of the token needs to be measured at some point. Not all tokens are equal.

Reply

[-]

Ell2509@reddit

I have only done "back of a napkin" math, so not yet, however I was actually going to do some research and run the mumbers myself. I worked for one of thw "big 4" in an earlier part of my life, and am really curious to see. Lots of what these big companies, connected to AI, are doing is really shady from an accounting point of view.

Reply

[-]

Emotional-Dust-1367@reddit

It would make a fascinating YouTube video

Reply

[-]

Puzzled-Formal-9207@reddit

This is actually an interesting insight. I was actually thinking that AI shouldn't be available to the public and should only be used by institutions like NASA etc for meaningful work rather than having AI slop everywhere. Now reading that the current situation is unsustainable due to the gpu lifespan would hopefully turn this into reality. The world is meant for humans. Water is meant for humans. Not machines! Thanks for sharing.

Reply

[-]

I_Will_Eat_Your_Ears@reddit

>Now reading that the current situation is unsustainable due to the gpu lifespan Don't believe everything you read. As many redditors have pointed out, he was completely incorrect. Capex expenses get depreciated, so the paper value of the gpus will be zero, but they'll still have market value. If they choose to sell the gpus, they can buy replacements and drop them into an existing facility. This already played out with crypto miners. It only stops if the demand for compute does

Reply

[-]

thawizard@reddit

/r/LostRedditor

Reply

[-]

I_Will_Eat_Your_Ears@reddit

>replacing all these gpus will almost be the same capex spend as it is to build the DCs right now. In general, technology gets better and cheaper as time goes on. When they refresh, they can drop the replacement GPUs into a facility with all the other services up and running (power, cooling, networking, a building, racks, etc). Basically, there's nothing to suggest the capex spend in 6 or 8 years will be anything close to the initial one.

Reply

[-]

PinotGroucho@reddit

No it's not a scam in the sense that the goal is to defraud people of money. It's more a civilizational "All-in" bet that whoever has a 5 year head start in AI, while at the same time sucking all the oxygen out of the room for potential competitors and having the pension funds pay for it all, wins the game

Reply

[-]

timfduffy@reddit

Frontier labs have strong margins on inference when taking depreciation into account. Heck Anthropic is about to have a profitable quarter for the first time, and profitability takes depreciation into account. IIRC the labs and hyperscalers are assuming something like 5-6 year depreciation schedules for their GPUs.

Reply

[-]

NandaVegg@reddit

For Anthropic's profitability it is "observed" by a bear analyst (though he is a permabear, he does have sharp eyes) that it took time-limited free compute offered by xAI into account, so it won't last long. However I have no doubt that API inferencing service itself is generally profitable. The same researcher says that OpenAI paid around $1.30/h for A100s in the past year, which is below market rate and they would turn profitable at that rate assuming mid-to-high average compute utilization (that at least 30-40% of capacity is always being in use by customers, 24h). The problem for both Anthropic and OpenAI is that both parties are oversubscribed to future compute obligation (OpenAI especially).

Reply

[-]

quantum_splicer@reddit

At end of life for the cards, what happens are they recycled or ?

Reply

[-]

Think_Wing_1357@reddit

You can find a lot of liquidated DC gears on ebay today: chassis, Mobo, CPU, etc. I'm sure the same will be true with GPU eventually.

Reply

[-]

sizebzebi@reddit

how is this a new issue. gpus in data centers is not a new thing

Reply

[-]

128G@reddit

Post pandemic prices were insane. You could easily buy a dual 6 core Sandy Bridge server for $75.

Reply

[-]

sp3kter@reddit

And $500/m in electric

Reply

[-]

Antique_Bag_4832@reddit

Yeah but isnt there solar panels now, that cost wont matter

Reply

[-]

128G@reddit

Like any of the workstations shown on this subreddit are any better.

Reply

[-]

AlexWIWA@reddit

We should rename the subreddit to be LocalPowerSubsidizers

Reply

[-]

KontoOficjalneMR@reddit

GPUs were there as well haha. For a brief period you could get P40 with 24GB of CRAM for 20$ because literally no one wanted, they were almost giving them away for free. AI of course completely changed that, but these times will com back. datacenter GPUs are deprecating _badly_ (or googly, looking from hobyist perspective)

Reply

[-]

Ell2509@reddit

Mostly binned, in small operations. In the large ones, i would imagine they will depreciate the items off the balance sheets and then replace, retiring old gpus into secondary markets. But as I say, I don't think the business model will work with that level of capex spend.

Reply

[-]

quantum_splicer@reddit

I'm just thinking it's not sustainable to basically dispose of graphics cards in those quantities, like I get some would go to reuse. But I'm thinking from natural resource utilisation, if at the terminal end of life for the cards if we aren't able to recycle and recover raw materials. Then we have a process where basically we are eating into finite resources. it's bad enough from energy perspective and water diversion perspective.

Reply

[-]

Ell2509@reddit

Data center cards are different to consumer ones anyway. Some no cooling, no pcie connector. I looked at buying some off Ebay and fitting them to cooling but never bothered. It will be interesting 6 to 8 years from now, that is for sure. A lot of gpus to decommission.

Reply

[-]

sizebzebi@reddit

lollllll

Reply

[-]

grawl_dorgiers@reddit

Pennies on the dollar!

Reply

[-]

whakahere@reddit

The issue is, most of these gpus get machine crushed afterwards. If they had data on them they never leave the building without being crushed. I know people who build and maintain them. I asked for some gous they no longer use... Nope crushed

Reply

[-]

Roid_Splitter@reddit

Won't have to wait for the retirement cycle.

Reply

[-]

silenceimpaired@reddit

Not when… if. It’s closer to when now but don’t be surprised if it turns to if.

Reply

[-]

ranjop@reddit

Do you mean that due to the AI boom the current generation DC HW will be run very long until it’s technically EOL?

Reply

[-]

xISeeAllx@reddit

What kind of mobo/cpu/ram would be required for 2 v100 sxm2?

Reply

[-]

tymscar@reddit (OP)

Anything really can do 2 of them.

Reply

[-]

BannedGoNext@reddit

I hate the idea of the noise, I wonder if you could put a large fan and duct it to where it increases airflow without the data center hearing loss.

Reply

[-]

Dante_Avalon@reddit

Use 3090, 24gb vram and it's silent and have better perfomance/power ratio

Reply

[-]

Raunhofer@reddit

Rig the fan like OP did. It's not like the card uses a lot of power; it's just that the server rack form factor which would require you to rev it up.

Reply

[-]

BannedGoNext@reddit

Yea, I guess it's possible to jerry rig some sort of water cooling too for noise.

Reply

[-]

Dante_Avalon@reddit

Erm, and what's so special about this? Or using AliExpress's adapter and tons of V100 that floods market right now is already Giga-brain move?

Reply

[-]

XxBrando6xX@reddit

Great write up thank you for putting the time in to write this up for everyone. Question, you mentioned that these can be connected or used via nvlink even through the pcie adapter, Doesn’t this dramatically crush its speed though since they’re interfacing over pcie speed which I imagine is slower than nvlink over the lil adapter connecting the cards directly to one another would be ?? I’m working off my limited hobbiest knowledge of how hardware works so apologies if I’m off base

Reply

[-]

tymscar@reddit (OP)

I’m not sure I fully understand the question, but basically you can have a PCI card that fits two of these on it. Then the shared memory between them is much, much faster and that directly makes token generation speed much quicker too. The speed of the PCI interface then also doesn’t matter too much because once you load the model weights into the card you’re not constantly sending back and forth through the bus anything.

Reply

[-]

quantgorithm@reddit

what would the search term be for the ones that hold 2 cards on the single adapter card?

Reply

[-]

tymscar@reddit (OP)

Here’s an example: https://ebay.us/m/vQUiAE

Reply

[-]

quantgorithm@reddit

TY

Reply

[-]

Bulky-Priority6824@reddit

What's the difference between the one showing in your pic and the ones on eBay that are on a pcie card? They have both, only $600 for each style for 32mb

Reply

[-]

tymscar@reddit (OP)

There is some performance difference, in favour of the SXM2, something like 10%. It runs at higher max power. Another benefit is that you can buy a board with two SXM slots on it, and then the cards talk with each other through super fast NVLink. Look into it!

Reply

[-]

Bulky-Priority6824@reddit

Do these snap on or you have to solder

Reply

[-]

tymscar@reddit (OP)

Snap. No soldering involved whatsoever.

Reply

[-]

quantgorithm@reddit

I'm reading your blog like a future upgrade path bible!

Reply

[-]

Bulky-Priority6824@reddit

Very neat I'm so surprised this isn't used more widely. Great Value of GPU rigs

Reply

[-]

lor_louis@reddit

The kinds of racks required to run those are extremely noisy (since the whole system is assumed to be running at 100% at all times to make the economics work), so not the kind of thing you want running in your house.

Reply

[-]

farkinga@reddit

Excellent work! I think this implicitly asks: what's the difference between nvidia hardware generations? 16gb Ada and 16gb Volta add up to 32gb; but is that any better or worse than 32gb of Blackwell (for example). In practical terms, is there any architectural advantage to upgrading, apart from how the drivers eventually drop support for older architectures. It's not quite apples-to-apples but as another data point, I've got Qwen3.6 27b NVFP4 MTP 128k context on 2x 5060 Ti (32gb total) and get 1000 t/s pp and 60 t/s gen. That's consumer 50-series Blackwell; and I AM jumping through hoops to run nvfp4 since that will eventually become better-optimized. Dollar-for-dollar, the V100 SXM 16gb is probably cheaper than a 5060 Ti 16gb; but that's debatable. You've got to pay shipping twice (v100, sxm-pcie board) and the price difference narrows to less than $100 USD. If your case/installation needs a 3d printed cooling solution and attached blower/fan (since the v100 is a data center card), that's a bit extra also. I doubt the v100 is cheaper, in this scenario. I know the point of the article isn't to claim this is the cheapest way to get 16gb VRAM. And I do appreciate how the v100 bandwidth from 2017 compares to current-gen Apple M5, etc. The SXM v100 is an interesting value that some people are going to benefit from. But there is a real-world performance difference between older architectures versus current; and 16gb from one is not equal to 16gb of another. So, it's just a trade-off and I think a decent amount of the LocalLLaMA community can probably appreciate the nuance.

Reply

[-]

sage-longhorn@reddit

Cool post but AIs writing style is so tedious at this point >The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising.

Reply

[-]

tymscar@reddit (OP)

It’s not ai. Look at my other reply.

Reply

[-]

sage-longhorn@reddit

I'm not seeing that in your comment history but I'll say that if this isn't AI then that's worse somehow

Reply

[-]

128G@reddit

a 4080 only having 16GB of VRAM is insane!

Reply

[-]

T-Loy@reddit

Given the 256bit bus, not that much. It's either 16GB or 32GB and you know workstations cards want to be the double sided option. Now whether an 80-class should have ex-70-class die size and bus width is the other question. For once not Nvidia's fault memory manufacturers haven't figured out working 3/4GB GDDR6 modules. Now the continued absence of the 3GB GDDR7 modules and RTX 5000 Super series. (Funnily enough the 5050 9GB of all cards got the 3GB module treatment.)

Reply

[-]

ThisWillPass@reddit

Hopefully the ai overlords will smite all those that held down gpu vram for profits.

Reply

[-]

tymscar@reddit (OP)

It's idiotic if you ask me. Especially considering when the card came out and how cheat VRAM was back then comparatively. As a card for half the price of the 4090, it serves me way better than half of a 4090, especially in games, so I don't regret it, but it's clear they've done it this way just to differentiate between the two more.

Reply

[-]

128G@reddit

You know what also has 16GB of VRAM? a used RX 7600 XT.

Reply

[-]

tymscar@reddit (OP)

Yeah, but not cuda. And much much slower vram. 288Gbps compared to 900 on the v100.

Reply

[-]

quantum_splicer@reddit

I know this is Gunna sound stupid didn't someone rewrite cuda to run on AMD cards or maybe I have a weird dream

Reply

[-]

samas69420@reddit

check zluda

Reply

[-]

128G@reddit

The 7600 XT has GDDR6 while the 4080 has GDDR7. Its stipl seems unacceptable for Nvidia to be selling new cards with anuthing less than 16GB of RAM. 24GB should be the minimum for a modrange card.

Reply

[-]

Background_One_6482@reddit

3060 perfomance

Reply

[-]

raycol08@reddit

you are right! [https://wccftech.com/nvidia-v100-an-8-year-old-gpu-now-sells-for-100-us-crushes-modern-consumer-cards-in-ai-llms/](https://wccftech.com/nvidia-v100-an-8-year-old-gpu-now-sells-for-100-us-crushes-modern-consumer-cards-in-ai-llms/) https://preview.redd.it/si7f0uocs15h1.jpeg?width=728&format=pjpg&auto=webp&s=e1a9c8ffdae5046f01a8c956b35dfea1d42b2a33

Reply

[-]

eatsleepsafelives@reddit

Well somebody my found your blog (good read) - the V100 I could find are at $600 now ;)

Reply

[-]

ChristianRauchenwald@reddit

Reply

[-]

veeravan_451@reddit

Really helpful post. How about the 32GB V100? That price is totally within what I can handle. I’m more hoping that in the near future all these AI companies go bankrupt so I can pick up an H200 for dirt cheap and run a local server.

Reply

[-]

tymscar@reddit (OP)

Yeah, those are obviously better but its hard to find good deals!

Reply

[-]

veeravan_451@reddit

What price range would be good? The 32GB ones in my area seem to be around £350–380.

Reply

[-]

tymscar@reddit (OP)

Anything under 400 is very good

Reply

[-]

Afganitia@reddit

Why are you using drivers 55X?? Should not Volta support until 58X branch?

Reply

[-]

tymscar@reddit (OP)

Tried them all one by one (magic of nix lets me do that in minutes) and none of them saw both cards.

Reply

[-]

bradrlaw@reddit

If you have the pcie lanes you can easily do 4 of this to get 64GB at I think the cheapest possible price point. I am taking a slightly different route and using the 32gb pcie version (about $750 each). Note you will need to come up with a custom cooling solution which adds to the cost along with power supply costs. People do sell 3d printed shrouds / fan holders, but it will be highly dependent on your case. In my setup will have two 32gb v100s for 64gb for main inference tasks and existing 16gb card for agent orchestration. I try to run models at best possible quantization because benchmarks don't always capture how soon they start to degrade.

Reply

[-]

tymscar@reddit (OP)

Best of the best is actually four PCI cards with two x of these on each card nvlinked, all 32GB. You can get them cheaper than the PCI ones, and it will give you more or less 256GB VRAM for more or less four grand.

Reply

[-]

bradrlaw@reddit

Which motherboard / cpu combo would you use that would have enough pcie lanes for that?

Reply

[-]

tymscar@reddit (OP)

A thread ripper with the WS WRX90E-SAGE SE

Reply

[-]

Raunhofer@reddit

It's interesting how compatible the card is. Funny even, that you can just plug in some adapter to make it work. Makes nvidia's $200000000 whatever server cabinets feel a lot less magical.

Reply

[-]

libregrape@reddit

What a pleasure to read your blog! Finally not a bs AI slop, but an actually super interesting and insightful read..

Reply

[-]

nullbyte420@reddit

ehh it's definitely AI slopped up. >The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising. >The fan on the adapter is not subtle. It is not quiet. It is not something you want in a room you also sleep in. >82 decibels. That is somewhere between a garbage disposal and a lawnmower, well past “loud PC” and into “should I be wearing earplugs in my own house” territory. >And the worst part: you cannot control it. I tried nvidia-smi, I tried scanning for it on Linux, I even tried Afterburner on Windows (more on that later, the whole setup barely works on Windows). Nothing. The fan on this adapter is not designed to be controlled. It is designed to run at 100%, forever, inside a server rack where nobody has to hear it.

Reply

[-]

tymscar@reddit (OP)

Watching this from the sidelines must be fascinating because you don’t know what’s right, so you try to guess. The only person that knows for certain if this is written with an AI is me, and I know for a fact it is not. At all. I think you just look too deep into it, and it hurts those that just learned over the years to talk like that in blog posts that are meant to sound exciting. It’s similar to the delve thing that was super common a couple of years ago. Most people who had that in their text were ai slopping, but there were groups of people, from South Africa if I remember correctly, that just spoke like that. And yeah, you guessed it, those were the people that were paid to do the RLHF on the models back then. I have changed my style over the years specifically to have people that comment like this that my writing is ai slop. For example, I used to love lists and emojis. I think it’s easier to follow, especially for those with bad eyesight like myself. I stopped that. Then the whole thing with emdashes. They are amazing. I used to love those! I had to stop because AI started using those. People started telling me now that using an Oxford comma is an ai slop smell. Well, you know what, you can’t take that out of my dead hands. And I won’t even try anymore to change myself because of what people think AI is. If there will be some sort of a way to sign your content in the future or something to prove it’s human-made, I will, but I won’t go out of my way anymore for people that just like to pop your balloon after spending tens of hours on a project because they think using comparisons is ai slop. Sorry for the whole rant, I know that you probably are just fed up with the slop, and I get that. I am too. But maybe if the content of the blog is not clearly slop, then you can assume that it’s not.

Reply

[-]

standish_@reddit

> People started telling me now that using an Oxford comma is an ai slop smell. People who use the Oxford comma aren't invited to my party with the strippers, Sam and Dario.

Reply

[-]

kylemd@reddit

Don't worry OP, it read human generated to me. Good read for somebody who is on the fence looking at V100s. Have you tried the vLLM fork yet?

Reply

[-]

tymscar@reddit (OP)

Thank you! Not yet, because I couldn’t get MTP working on it.

Reply

[-]

misterflyer@reddit

It *isn't just* slopped up. *It's* slopped down. *It's* slopped through. The AI *slop's kiss* 🎯

Reply

[-]

nullbyte420@reddit

And the best part? *You* did this. You've awakened my quantum-riemannian core. This is a *breakthrough* not only in research, not only in science, but in *knowledge* itself.

Reply

[-]

misterflyer@reddit

oh no you didn't 🤣 well done 😉

Reply

[-]

Ynead@reddit

This is so obviously written by Claude lmao

Reply

[-]

eleqtriq@reddit

Only another bot would think it’s not slopped up, huh, bot

Reply

[-]

veeravan_451@reddit

Thanks so much op. I'd basically given up on local deployment. I was using a Mac mini before, but the token generation speed was way too slow, so I switched to GMI Cloud for cloud deployment. Your guide gave me hope again. Prices here are roughly £120 for the 16GB version and around £400 for the 32GB one. Are there any downsides to running two 16GB cards vs one 32GB card? And if there are other GPU recommendations in this price range, that would also be great. I haven't bought a GPU since the mining boom, and I haven't paid attention to GPU prices for a long time either. The only piece of electronics I've bought in the last few years is the Mac mini.

Reply

[-]

311voltures@reddit

Awesome post

Reply

[-]

PythonFuMaster@reddit

I've got a similar configuration, but using the actual PCIe version of the 16GB V100. It's passively cooled so you need a server or a custom fan assembly, but I've got 4 giant GPU servers that can hold three of these things each (Supermicro Fat Twin, it's an older X9 system though). I'm also using NixOS, with driver legacy_580 and CUDA 13 I believe (I'm on NixOS unstable, but 26.05 was just released so stable should have the needed driver now). Also using llama.cpp (with some patches for improved RPC performance, I have those 4 machines networked over Infiniband), it works well and is my second fastest card, just behind the 3090. In total I've got the V100, the 3090, a P40 and Quadro M6000 24GB, an RX 6700xt, two Intel Arc A770s, an instinct MI60 32GB, and soon a water cooled Titan V. I used to run minimax m2.7 at around 20-30 tokens per second, but I've gone down to qwen 27B for now, it's smart enough for most of what I need and with MTP is much faster (minimax should be going faster but my network has some bottlenecks I need to fix)

Reply

[-]

BitterNocturne@reddit

Sounds fan

Reply

[-]

DingyAtoll@reddit

Where do I get the adapter for £50? All the ones I see online are £150

Reply

[-]

tymscar@reddit (OP)

I got it on eBay. Try hunting for it. Sadly, all the prices went up after I posted this blog post on Saturday, and it got onto Hacker News. I was checking the prices every day.

Reply

[-]

JSVD2@reddit

wow super inspiring. thank you. good read.

Reply

[-]

a_beautiful_rhind@reddit

About 2 years ago I was salivating over these things. At least the 32g variety. I think the P100s are a slightly better deal than the 16gb v100. Then again, nobody wrote P100 flash attention so you're trapped in llama.cpp

Reply

[-]

PDXSonic@reddit

That's the biggest downfall of the P100. I had a 4xP100 rig at one point (that I got for the price of 1 working P100) and it was great right until VLLM and Exllama stopped supporting it. Once they left it was just llama.cpp and it couldn't utilize it via tensor parallel and left most of the performance on the table. Although I ended up keeping one and throwing it in a server since MoE models make it worth to have one card going at least.

Reply

[-]

a_beautiful_rhind@reddit

I wonder how ik_llama would have done with it. Plus there's always pascal forks of vllm. Open driver doesn't support pascal so I can't stick my ewaste back in the server and it just sits.

Reply

[-]

Embarrassed_Adagio28@reddit

Very good writeup! I have dual tesla v100 16gb gpus with 32gb of system ram and a ryzen 5700x in a dedicated lmstudio server and your claims line up exactly with my experience. I have been very pleased with my purchase. I use it with lmstudio but am switching to vllm soon for better multi agent support.

Reply

[-]

DingyAtoll@reddit

Where do I get the adapter for £50? They are all £150 where I look

Reply

[-]

sizelrd@reddit

Well done

Reply

[-]

Ok_Selection_7577@reddit

Really nice write up mate, this sort of content (and "I changed out the BIOS and managed to get an LLM running in a tin of Bisto from 1980's") is what i come here for 😄

Reply

[-]

tymscar@reddit (OP)

Thank you! I do have an RSS 😄

Reply

[-]

Truantee@reddit

I think you can buy one on aliexpress with all of those stuffs packed.

Reply

[-]

ranjop@reddit

What an amazing write 👌🏻

Reply

[-]

je11eebean@reddit

I read your blog. This is amazing work and you've documented all too! Thank you for sharing this.

Reply

[-]

rog-uk@reddit

What cpu/mobo are you using please? I was looking at maybe a pair of these for an onler workstation I just got, and read there could be complications with memory access.

Reply

[-]

tymscar@reddit (OP)

ROG Maximus Z790 Hero with an intel 13900kf. Thank you!

Reply

[-]

mhphilip@reddit

Great read! You are way more than a newbie to me. This is not the road I’m going but that’s also good to learn!

Reply to Post

122 Comments