NVIDIA V100 32GB for AI in 2026

Posted by mihaii@reddit | LocalLLaMA | View on Reddit | 17 comments

hello.

i have the oportunity of buying Nvidia V100 with 32GB for about 915$ / 775 euro. I want to use to for Local LLM on premise, load up some models, use it for agentic coding using qwen, gemma 4 etc

is it a better buy than a Nvidia 3090? they are about the same price

[-]

Stepfunction@reddit

The v100 is pretty ancient at this point, but it is still a decent GPU. Your main issue is going to be compatibility. If you go that route, be ready to compile a lot of stuff yourself.

[-]

If you're using llama.cpp or ik, you're better off compiling anyway. Takes a few minutes to setup a script to handle that by searching this very sub. From then on, it's just run the script whenever you want to run a new version

I have 3090s and Mi50s (32GB like the V100). 32GB makes a much bigger difference than most think. If you're running a single GPU, you can run a 27B model at Q8 with 50-60k all in VRAM. With the 3090, you'd have to drop to Q4 or Q5, and the results aren't even close for any serious task or anything that requires nuance.

[-]

pacman829@reddit

what your performance on single vs dual mi50 with qwen3.6 or the gemma 4 moe model ?

currently trying to figure out twhat to buy for a small local build... was going to go triple p100 since i found a good deal but the 32gb of ram for the mi50 sound too good ...

[-]

FullstackSensei@reddit

Haven't tried them. I use mine to run minimax and Qwen 3.5 397B

[-]

pacman829@reddit

I'm also curious how they're doing with those big models

[-]

FullstackSensei@reddit

I get ~30t/s with minimax Q4_K_XL fully in VRAM and about 18t/s in hybrid 3x Mi50s + Xeon CPU running Q8. Qwen 3.5 397B Q4 runs also at ~18t/s on 3 GPUs + CPU.

[-]

pacman829@reddit

If you do give them a go i'd love to see how they do

[-]

Plastic-Stress-6468@reddit

If your model+context fits in 24gb, the 3090 is significantly faster than the V100.
If your model+context goes over by even 1 gb, the V100 will be faster than the 3090.

[-]

Simple_Library_2700@reddit

If you're at all interested in dense models you could buy 2 16gb v100s for quite a bit less than the price of 1 32gb v100 and run them on an interposer board and destroy a 3090 in terms of speed. Above is my result of running dense qwen3.5 on 4 v100s.

[-]

DocMadCow@reddit

What are your thoughts on mixing memory sizes on cards? Like a V100 32GB + V100 16GB?

[-]

Simple_Library_2700@reddit

A lot of inference engines don’t like it from what I can see

[-]

Mindless_Pain1860@reddit

V100 doesn’t support BF16, and its lack of INT8 support is a major drawback. A lot of quantization relies on INT8 under the hood, so this can hurt performance. The L2 cache is also too small (6MiB) the 3090 has the same issue. As a result, attention is much less efficient on these older GPUs than on RTX 40/50 series cards or the A100/H100.

[-]

mihaii@reddit (OP)

thanks for making some sense into my mind. I already have a 4090, i was thinking about having a 2nd machine for various tests.. but I will get a 3090

[-]

etaoin314@reddit

if you were thinking about putting these in the same computer to run large models I would go with a 3090 or even another 4090. the v100 will bottleneck you pretty bad and waste a lot of the 4090's speed. the 3090 will still slow you down a little, but not nearly as much as the alternative. also 48gb is plenty to run anything in the 30b range with q8 and very large contexts. honestly I would save up and get a second 4090, that will really fly!

[-]