Benchmark evidence: NVIDIA CMP 100-210 Tensor Cores firmware-locked at 5% performance. E-waste by design?

Posted by desexmachina@reddit | hardware | View on Reddit | 27 comments

I've been testing the NVIDIA CMP 100-210, a Volta-based mining card with 16GB HBM2 and 640 Tensor Cores on paper. The results are... concerning.

Key Findings:

Test	CMP 100-210	Expected (V100)	Reality
FP32 matmul	10.56 TFLOPS	\~15 TFLOPS	70% ✓
FP16 Tensor	5.62 TFLOPS	\~118 TFLOPS	5% ✗
TF32 (Tensor)	10.82 TFLOPS	\~15 TFLOPS	72% ✓

The smoking gun: FP16 is **0.43x SLOWER** than FP32. With working Tensor Cores, FP16 should be \~8x faster. This is only possible if the Tensor Cores are disabled or heavily throttled at the firmware level.

The situation: - Crypto mining is dead → no primary use case - No display outputs → can't game - Firmware-locked Tensor Cores → can't do efficient AI inference - Result: e-waste by the thousands

Comparison: RTX 3060 shows normal Tensor Core behavior (FP16 is 3.3x faster than FP32). The CMP 100-210 should behave similarly if unlocked.

Petition for NVIDIA to release firmware unlock: https://c.org/CSc6HWpCVK Full benchmark methodology & results: https://gist.github.com/synchronic1/94d6b8c2ce89cea8f616527b5d64300a

This is hardware e-waste by design. The silicon works - it's artificially crippled.

[-]

ProjectPhysX@reddit

Yes, they intentionally fuse off hardware through firmware, to hinder customers using the card for anything else than its intended purpose. This sucks.

Nvidia CMP 170HX (which is a cot down A100 die) for example has fused-multiply-add disabled through firmware. But with a software hack, you can still make it perform well in general compute/simulation workloads.

AMD Radeon VII for has half of its FP64 cores disabled through firmware, reducing FP64:FP32 ratio from native 1:2 to artificial 1:4.

[-]

UpsetKoalaBear@reddit

If anyone knows about Nvidia’s core layouts, it’s this guy!

Check his profile.

[-]

UpsetKoalaBear@reddit

Comparison: RTX 3060 shows normal Tensor Core behavior (FP16 is 3.3x faster than FP32). The CMP 100-210 should behave similarly if unlocked.

This is a pretty false comparison to make. Tensor cores also change architectures from generation to generation.

You should be comparing it with the 20 series, which came out around the same time and Turing was a variant of Volta.

[-]

Malygos_Spellweaver@reddit

Not sure why the post is NSFW, but 5% real performance is abysmal. Doesn't make sense, nobody mines crypto anymore.

[-]

desexmachina@reddit (OP)

Regular old GGUF inference is actually near 3090 performance

[-]

Maleficent_Celery_55@reddit

They probably used binned V100s to produce it so I wouldn't be surprised if it has something to do with the hardware than firmware.

Maybe try flashing V100 BIOS to confirm?

[-]

SuperNanoCat@reddit

Worth looking into, OP! They may have been more sophisticated with Tensor limitations on the Volta-based cards, but the GP102-based P102-100 5GB actually has 10GB of VRAM, with the full capacity unlockable with a modded vBIOS.

[-]

desexmachina@reddit (OP)

A very sad possibility is that the tensors are fused at the die, or binned w/ high tensor reject anyhow

[-]

naicha15@reddit

Good luck convincing Nvidia to reverse that.

It was an intentional business decision to sell otherwise crippled mining-specific cards. They saw what happened with the secondary market with used Pascal and AMD Polaris cards after that crypto cycle. They didn't want to see their future product sales crippled by artificially depressed secondary market old stock prices. That's why they made these available in bulk at a discount to miners over full V100s or the various Turing/Ampere cards.

Also, try not writing with AI if you want people to help you.

[-]

desexmachina@reddit (OP)

true, and another likely possibility is that the tensors are fused at the die and useless. This whole post was mostly for jk’s anyhow and testing some agentic workflow

[-]

spky-dev@reddit

the smoking gun

Silence, Claude.

[-]

steve09089@reddit

Yeah, I thought it was some kind of AI

[-]

desexmachina@reddit (OP)

GLM actually

[-]

sas41@reddit

You posted two paragraphs, and you couldn't even write it yourself?

JFC, have some decorum.

[-]

desexmachina@reddit (OP)

I’m embracing the agentic era

[-]

GalvenMin@reddit

My rule of thumb: why should I care to read something that someone else clearly didn't care to write?

[-]

randomkidlol@reddit

e-waste by design

yes thats the point. they dont want crypto cards devaluing their datacentre cards.

[-]

yuri_hime@reddit

Good luck with that - Volta is already past end of (real) support.

I wouldn't be surprised if that feature has a fuse like the TU106-in-1650 GPUs did, where tensor core is disabled or throttled to match the original GPU (which didn't have them in the first place).

[-]

ToughDefinition2591@reddit

According to spec sheets the Bus Interface is also only PCIe 1.0 x1. Not exactly great.

[-]

Beneficial_Common683@reddit

If you can hack Falcon or nvidia signature then you can unrestrict the core. Instead of petition.org just put a big prize for the hacker

[-]

chippinganimal@reddit

Doubt a change.org petition would do anything, you'd probably have better luck getting in touch with an Nvidia driver developer over email if you can find their contact info

Pretty sure you can use it to game if you use it on a pc with integrated graphics, windows supports using other gpus for the actual processing. I could be wrong but I believe that's kinda how gaming laptops without a Mux switch work

[-]

desexmachina@reddit (OP)

I probably should try that next. The tensors work, the clocks are lowered, but the biggest nerf is that it is pcie 1x, but that's only an issue for model loading, the HBM2 is great for inference, 3090 like performance

[-]

twnznz@reddit

It’s good for single card inference. Don’t try and build a cluster from these.

[-]

desexmachina@reddit (OP)

For sure, always loaded SLM is what it would be good for because load times are too long

[-]

ParthProLegend@reddit

Pretty sure you can use it to game if you use it on a pc with integrated graphics, windows supports using other gpus for the actual processing. I could be wrong but I believe that's kinda how gaming laptops without a Mux switch work

No, you are not wrong. The other gpu if not plugged to a display would be categorised into the type, "Render-Only Display Device". A connected GPU would be "Full Display Device" instead. You can verify that by doing Win+R and then typing dxdiag and pressing enter key if you have the two GPUs.

Source: I don't have MUX and have been annoyed by having this setup for the last 3 ¼ years. It's a headache for somethings and a miracle for other things. But for gaming, it's annoying due to performance loss, latency and overheating.

[-]

jmakov@reddit

Use GLM-5.1 or GPT-5.4 to reverse engineer and write your own firmware

[-]

desexmachina@reddit (OP)

There’s vbios variants, only issue is that Nvidia put a chip on the board for this specific gatekeep