NVIDIA V100 32GB for AI in 2026
Posted by mihaii@reddit | LocalLLaMA | View on Reddit | 17 comments
hello.
i have the oportunity of buying Nvidia V100 with 32GB for about 915$ / 775 euro. I want to use to for Local LLM on premise, load up some models, use it for agentic coding using qwen, gemma 4 etc
is it a better buy than a Nvidia 3090? they are about the same price
Stepfunction@reddit
The v100 is pretty ancient at this point, but it is still a decent GPU. Your main issue is going to be compatibility. If you go that route, be ready to compile a lot of stuff yourself.
FullstackSensei@reddit
In the long run, we're all dead.
If you're using llama.cpp or ik, you're better off compiling anyway. Takes a few minutes to setup a script to handle that by searching this very sub. From then on, it's just run the script whenever you want to run a new version
I have 3090s and Mi50s (32GB like the V100). 32GB makes a much bigger difference than most think. If you're running a single GPU, you can run a 27B model at Q8 with 50-60k all in VRAM. With the 3090, you'd have to drop to Q4 or Q5, and the results aren't even close for any serious task or anything that requires nuance.
pacman829@reddit
what your performance on single vs dual mi50 with qwen3.6 or the gemma 4 moe model ?
currently trying to figure out twhat to buy for a small local build... was going to go triple p100 since i found a good deal but the 32gb of ram for the mi50 sound too good ...
FullstackSensei@reddit
Haven't tried them. I use mine to run minimax and Qwen 3.5 397B
pacman829@reddit
I'm also curious how they're doing with those big models
FullstackSensei@reddit
I get ~30t/s with minimax Q4_K_XL fully in VRAM and about 18t/s in hybrid 3x Mi50s + Xeon CPU running Q8. Qwen 3.5 397B Q4 runs also at ~18t/s on 3 GPUs + CPU.
pacman829@reddit
If you do give them a go i'd love to see how they do
Plastic-Stress-6468@reddit
If your model+context fits in 24gb, the 3090 is significantly faster than the V100.
If your model+context goes over by even 1 gb, the V100 will be faster than the 3090.
Simple_Library_2700@reddit
If you're at all interested in dense models you could buy 2 16gb v100s for quite a bit less than the price of 1 32gb v100 and run them on an interposer board and destroy a 3090 in terms of speed. Above is my result of running dense qwen3.5 on 4 v100s.
DocMadCow@reddit
What are your thoughts on mixing memory sizes on cards? Like a V100 32GB + V100 16GB?
Simple_Library_2700@reddit
A lot of inference engines don’t like it from what I can see
Mindless_Pain1860@reddit
V100 doesn’t support BF16, and its lack of INT8 support is a major drawback. A lot of quantization relies on INT8 under the hood, so this can hurt performance. The L2 cache is also too small (6MiB) the 3090 has the same issue. As a result, attention is much less efficient on these older GPUs than on RTX 40/50 series cards or the A100/H100.
mihaii@reddit (OP)
thanks for making some sense into my mind. I already have a 4090, i was thinking about having a 2nd machine for various tests.. but I will get a 3090
etaoin314@reddit
if you were thinking about putting these in the same computer to run large models I would go with a 3090 or even another 4090. the v100 will bottleneck you pretty bad and waste a lot of the 4090's speed. the 3090 will still slow you down a little, but not nearly as much as the alternative. also 48gb is plenty to run anything in the 30b range with q8 and very large contexts. honestly I would save up and get a second 4090, that will really fly!
Stepfunction@reddit
If you already have a 4090, then getting the 3090 and setting them both up on the same machine for an effective 48GB of VRAM will probably be the easiest path.
segmond@reddit
No, not a good deal at all. I personally won't buy V100 32gb unless it's $400 or less.
FalconX88@reddit
The strength of the V100 is double precision compute, so basically the opposite of what you want