B6000 vs h200 vs b200?

Posted by Jentano@reddit | LocalLLaMA | View on Reddit | 10 comments

We are trying to decide which cluster is best for us.

Hgx 8x hgx h200 is EoL and not available anymore according to suppliers in Europe?

Is an hgx or dgx 8x b200 cluster best $/token for rinning models like kimi k2.6 with token distributions between 20k and 200k per call? Any experiences/suggestions?

[-]

FusionCow@reddit

If you are BUYING gpus? 8xh200. if you are doing anything cloud computing, it'll always be cheaper to just use the kimi k2.6 api from some company, you can look on openrouter. but otherwise, the only time and place it'd be a good idea to rent gpus is if you're maxing out concurrency 99% of the time you're renting.

the b200 is in general not worth it over the h200, though the h200 has a pretty sizable difference over the pro 6000

[-]

Jentano@reddit (OP)

Yes, buying gpus.

[-]

FusionCow@reddit

then unless you specifically need the extra vram of a b200, h200 will do a perfect job

[-]

Ok_Warning2146@reddit

Does this performance difference carry over if u only do inference?

[-]

Jentano@reddit (OP)

Yes.

[-]

Ok_Warning2146@reddit

This probably has a lot to do with the high bandwidth of the HBM memory.

For a difference between a pro card (b200) and an enthusiast card (6000 pro), u can take a look at this thread.

https://www.reddit.com/r/LocalLLaMA/comments/1sn37no/how_much_will_you_pay_for_a_pcie_nvidia_b100_b150/

[-]

Zestyclose_Law7197@reddit

Also have a look at this github:
https://github.com/voipmonitor/rtx6kpro/tree/master

The B 6000 pro needs some tweaking to make full use of it's capabilities.
I kinda doubt you'll get better inference speeds with a H200.

For training the H200 would probabaly decimate the B 6000 Pro setup.

[-]