B6000 vs h200 vs b200?
Posted by Jentano@reddit | LocalLLaMA | View on Reddit | 10 comments
We are trying to decide which cluster is best for us.
Hgx 8x hgx h200 is EoL and not available anymore according to suppliers in Europe?
Is an hgx or dgx 8x b200 cluster best $/token for rinning models like kimi k2.6 with token distributions between 20k and 200k per call? Any experiences/suggestions?
FusionCow@reddit
If you are BUYING gpus? 8xh200. if you are doing anything cloud computing, it'll always be cheaper to just use the kimi k2.6 api from some company, you can look on openrouter. but otherwise, the only time and place it'd be a good idea to rent gpus is if you're maxing out concurrency 99% of the time you're renting.
the b200 is in general not worth it over the h200, though the h200 has a pretty sizable difference over the pro 6000
Jentano@reddit (OP)
Yes, buying gpus.
FusionCow@reddit
then unless you specifically need the extra vram of a b200, h200 will do a perfect job
Ok_Warning2146@reddit
Does this performance difference carry over if u only do inference?
Jentano@reddit (OP)
Yes.
Ok_Warning2146@reddit
This probably has a lot to do with the high bandwidth of the HBM memory.
For a difference between a pro card (b200) and an enthusiast card (6000 pro), u can take a look at this thread.
https://www.reddit.com/r/LocalLLaMA/comments/1sn37no/how_much_will_you_pay_for_a_pcie_nvidia_b100_b150/
Zestyclose_Law7197@reddit
Also have a look at this github:
https://github.com/voipmonitor/rtx6kpro/tree/master
The B 6000 pro needs some tweaking to make full use of it's capabilities.
I kinda doubt you'll get better inference speeds with a H200.
For training the H200 would probabaly decimate the B 6000 Pro setup.
Zestyclose_Law7197@reddit
Have a look at https://gptrack.ai/
Not affiliated, just seems to be one of the few stating to deliver all sorts of setups within a reaonable time.
Ok_Warning2146@reddit
If u need fp4, then b200. Otherwise h200
Ok_Warning2146@reddit
If u do training, then b200. Otherwise 6000