Benchmark evidence: NVIDIA CMP 100-210 Tensor Cores firmware-locked at 5% performance. E-waste by design?

Posted by desexmachina@reddit | hardware | View on Reddit | 27 comments

I've been testing the NVIDIA CMP 100-210, a Volta-based mining card with 16GB HBM2 and 640 Tensor Cores on paper. The results are... concerning.

Key Findings:

Test CMP 100-210 Expected (V100) Reality
FP32 matmul 10.56 TFLOPS \~15 TFLOPS 70% ✓
FP16 Tensor 5.62 TFLOPS \~118 TFLOPS **5% ✗**
TF32 (Tensor) 10.82 TFLOPS \~15 TFLOPS 72% ✓

The smoking gun: FP16 is **0.43x SLOWER** than FP32. With working Tensor Cores, FP16 should be \~8x faster. This is only possible if the Tensor Cores are disabled or heavily throttled at the firmware level.

The situation: - Crypto mining is dead → no primary use case - No display outputs → can't game - Firmware-locked Tensor Cores → can't do efficient AI inference - Result: e-waste by the thousands

Comparison: RTX 3060 shows normal Tensor Core behavior (FP16 is 3.3x faster than FP32). The CMP 100-210 should behave similarly if unlocked.

Petition for NVIDIA to release firmware unlock: https://c.org/CSc6HWpCVK Full benchmark methodology & results: https://gist.github.com/synchronic1/94d6b8c2ce89cea8f616527b5d64300a

This is hardware e-waste by design. The silicon works - it's artificially crippled.