Where do you all rent GPU servers for small ML / AI side projects?

Posted by Forsaken-Bobcat4065@reddit | LocalLLaMA | View on Reddit | 39 comments

I’m trying to find a GPU server for some small ML/AI side projects (LLMs and a bit of image gen, nothing super big). Ideally I’d like pay‑as‑you‑go, a decent modern GPU, good bandwidth, and a setup that’s easy to spin up and tear down without a ton of hassle.

I feel like I’ve already wasted a bunch of time comparing random providers, so I’m just gonna ask: what are you using right now that’s been working fine and not crazy expensive?

[-]

Alex_Dutton@reddit

For small LLM / image gen work, most people just spin up GPU when needed instead of keeping anything running 24/7. Even some folks still just mix in simpler setups like DigitalOcean for non-GPU parts (API, web, queues) so they don’t overbuild infra for side projects.

[-]

Carl_Peterson1@reddit

Thunder Compute CEO here, we try to balance reliability with cost. Would love for you to check us out

[-]

carl_peterson1@reddit

Thunder Compute makes it easy to create cheap instances with H100s and A100s. Many users start with one-off projects and then scale to run their startup on the platform (disclaimer: I'm the CEO)

[-]

y_daniels@reddit

do you support running custom template seems you only support comfyui and llm

[-]

carl_peterson1@reddit

We do. You can create an instance, connect to it, and run anything you need. Base instances come pre-installed with CUDA and common packages like PyTorch, uv, etc. to make setup easier. What are you trying to run?

[-]

y_daniels@reddit

cool. I have some automated wan workflows running without comfyui . I'll send you a DM. thanks

[-]

carl_peterson1@reddit

Perfect! I may be a bit slower to respond here, so feel free to join our Discord community where someone on our team usually responds within a few minutes

[-]

boringblobking@reddit

Is it really 1.38 for a H100? like same as a runpod dedicated instance, not shared or spot etc?

[-]

Carl_ThunderCompute@reddit

Yes, these are not spot instances, a.k.a. the instance will not be pre-empted. The underlying servers are large so are shared across users (250+ CPU cores, 2TB RAM, 8 GPUs), but you have the full GPU and CPU + Memory of your instance

[-]

boringblobking@reddit

nice i might have try this thanks

[-]

ThunderComputeHQ@reddit

Great, good luck with your projects!

[-]

Safe-Introduction946@reddit

Try vast.ai. The most affordable place to rent GPUs because it offers a worldwife marketplace of hosts with consumer and data center GPUs.

[-]

frentro_max@reddit

Hivenet is worth checking if you want something cheaper with 4090 or 5090 options and more stable long runs.

[-]

KFSys@reddit

For small side projects, I usually just spin up a GPU droplet on DigitalOcean. It’s not the absolute cheapest GPU out there, but the setup is really straightforward, and I don’t have to mess around with weird marketplaces or bidding systems.

For me, the nice part is you can create the machine in a couple of minutes, SSH in, run your training or experiments, then destroy it when you're done, so you’re only paying for the time it exists. Bandwidth and networking have been solid, too.

There are cheaper options like RunPod or some smaller GPU providers, but I’ve found DO easier when I just want something predictable and don’t want to spend half a day comparing specs and configs. For side projects, convenience is worth it.

[-]

arnav_m_@reddit

Using dcompute.cloud atm, their team is pretty fine, got me a rtx 4090 for like 0.49$/hr for a week. Did my job.

[-]

Micky_Haller@reddit

Deploybase is a dashboard for tracking real-time GPU and LLM pricing across cloud and inference providers. You can view performance stats and pricing history, compare side by side, and bookmark to track any changes. https://deploybase.ai

[-]

pmv143@reddit

For small side projects where you want pay as you go and don’t want to babysit instances, I’d look at serverless style GPU setups rather than raw rented boxes.

The main thing to watch is how they handle model loading and idle GPUs. A lot of providers look cheap per hour but you end up paying for warm instances sitting there.

We’ve been building a runtime focused on bursty LLM workloads where you can fully evict GPUs and restore models quickly instead of keeping them warm.

[-]

chastieplups@reddit

and? there doesn't seem to be pricing comparison websites for serverless, and providers like runpod charge twice as much.

[-]

pmv143@reddit

Yeah that’s fair. Pure $/hr can look higher for serverless. The thing to compare isn’t hourly rate though, it’s total billed GPU minutes for your actual workload. If your model is bursty and sits idle 70–80% of the time, a cheaper hourly box that stays warm can end up costing more than a higher $/hr serverless setup that actually scales to zero.

If you’re running steady 24/7 utilization, raw rented GPUs usually win. If traffic is spiky, serverless often wins. So it really depends on your usage pattern more than sticker price.

[-]

Hector_Rvkp@reddit

i've used vast.ai, you can rent retail cards for cheap. salad is another. many of these cards are dude at home who put the card online, it's the peer to peer of gpus. there's also beefier setups there. You can load a comfyui instance and load a queue, for example.

[-]

NoahFect@reddit

many of these cards are dude at home who put the card online

How does that work, exactly? I rent Joe Schmoe's B300, and then when I'm in the middle of a 7-day run, Joe decides he needs to use his B300 for something else, and pulls the plug on me...?

[-]

Hector_Rvkp@reddit

hahaha. the providers have reliability scores, probably because that sort of thing has happened. what i do know from trying to find info about salad is that you have a bunch of guys who used to mine crypto who now put these cards on these servers, and they monitor usage, so if you're using one, they are looking at the daily income and they wont kick you out. That would also crater their own reliability score. That said, idk if inference / training is crash proof, because power outages are a thing, hardware does crash, and so on. So i assume that whatever you're doing, if you're doing it properly, must have the ability to resume somewhere, not start from scratch. it's one of the 1st signs between good code and bad code, it's how it handles errors.
There's also people that make these gpus available more professionally, btw, it's not just Brendon in Minnesota who's got a spare 5090 running in his garage on a rig. But Brendon is there too, along w bigger fish. Globally.

[-]

paulahjort@reddit

The comparing-providers problem is the actual issue here. Time is money. Prices move daily and vary 2-3x for the same GPU depending on availability. By the time you've manually checked RunPod, Vast.ai, Lambda, and CoreWeave you're already an hour in.

For your use case (LLMs + image gen, bursty) RTX 4090 on RunPod or Vast.ai spot usually wins on price. Happy to share a quick setup if useful.

I built a CLI that queries all of them in parallel and returns a ranked list in seconds:

Run it on Claude Code:

npm install -g terradev-mcp

claude mcp add terradev --command terradev-mcp

For side projects the free tier covers you with one instance at a time, pay-as-you-go directly to whichever provider wins the quote. Your keys stay local, no markup.

[-]

HealthyCommunicat@reddit

Used to do runpod - now at AWS, 1.5tb ram ec2 is dirt cheap

[-]

NigaTroubles@reddit

Like how much that cheap ?

[-]

semangeIof@reddit

Cheapest EC2 instance type that fits this spec is "r6a.metal" and it is just under 11 USD/hour

[-]

abnormal_human@reddit

That's not a GPU instance.

[-]

Initial-Phase-5567@reddit

And certainly not "dirt cheap"

[-]

HealthyCommunicat@reddit

idk guys i use to use runpod and vastai but its aws has just been noticeably cheaper

[-]

Initial-Phase-5567@reddit

Is performance a lot worse on CPU only?

[-]

HealthyCommunicat@reddit

yes but its not for inference so i dont rlly care as long as i can get what i need done - OP said for ML projects, so I was assuming its ML stuff and not just inferencing

[-]

Initial-Phase-5567@reddit

Ah cool, makes sense

[-]

melanov85@reddit

Before you start paying for GPU servers — what hardware are you actually running locally right now? I ask because for "small ML/AI side projects, LLMs and a bit of image gen, nothing super big" you might not need to rent anything. I run quantized LLMs on CPU and finetune models on a GTX 1650 (4GB VRAM). Not a typo. With the right optimization — proper quantization, memory management, and knowing how to work within your hardware limits — you'd be surprised what consumer hardware can do. And here's the thing people don't talk about: cloud GPU performance isn't what you think it is. Even renting a box with a RTX 6000 running a 13B model, you're dealing with network latency, shared resources, virtualization overhead, and noisy neighbors on the same node. By the time your prompt hits the GPU and the response comes back through the pipe, it's slow as hell compared to the same model running locally on modest hardware with no round trip. The cloud GPU trap for small projects is real: you spin up an instance to run a 7B model that could've run on your own machine, you're paying per hour for something that should've been free, and it's not even faster. Then you forget to tear it down one night and wake up to a bill. So what's your current setup? CPU, GPU, RAM? And what models/sizes are you actually trying to run? There's a good chance the answer is "just run it locally" with the right tooling, and you keep your data private and your wallet intact.

[-]

LostPrune2143@reddit

The noisy neighbor problem mentioned here is real if you're on shared infrastructure. That's exactly why some providers offer dedicated GPUs where you're the only one on the hardware.

We run barrack.ai with dedicated RTX A6000s and H100s. Per-minute billing so you're not paying for idle time, no contracts, zero egress fees, and full API access with 65+ endpoints. H100s start at $1.99/hr.

Happy to give you $10 in free credits to test your workflow. DM me if interested.

[-]

qwen_next_gguf_when@reddit

Runpod. I still have some credit with them but I don't have anything to fine-tune since unsloth already killed fine-tuning. ; )

[-]

Safe-Introduction946@reddit

Vast.ai is a solid choice. There are lots of 30-series retail cards (3080/3090) often available around $0.20-$0.50/hr. ComfyUI runs fine; to reduce interruptions, filter for hosts with higher uptime or data-center tags rather than the cheapest home-host listings. I can help you build a search/filter to find more stable hosts if you want.

[-]

michaelsoft__binbows@reddit

I dont run on cloud but from when i looked, gh200 were cheap. Now at $2/hr on lambda, less cheap than before.

Maybe check https://gpus.io/

Seems like RTX PRO 6000 is listed on there at runpod at $0.50 each.

[-]

jacek2023@reddit

Congratulations on your awesome posts where you ask where to send money, and then another account tells you where to send it, so everyone on Reddit knows where to send money.

[-]

wbiggs205@reddit

I have a server with this company very good price and very good support. I have there for 6 month. This is a Affiliates link

https://app.seimaxim.com/aff.php?aff=17