Benchmark evidence: NVIDIA CMP 100-210 Tensor Cores firmware-locked at 5% performance. E-waste by design?
Posted by desexmachina@reddit | hardware | View on Reddit | 27 comments
I've been testing the NVIDIA CMP 100-210, a Volta-based mining card with 16GB HBM2 and 640 Tensor Cores on paper. The results are... concerning.
Key Findings:
| Test | CMP 100-210 | Expected (V100) | Reality |
|---|---|---|---|
| FP32 matmul | 10.56 TFLOPS | \~15 TFLOPS | 70% ✓ |
| FP16 Tensor | 5.62 TFLOPS | \~118 TFLOPS | **5% ✗** |
| TF32 (Tensor) | 10.82 TFLOPS | \~15 TFLOPS | 72% ✓ |
The smoking gun: FP16 is **0.43x SLOWER** than FP32. With working Tensor Cores, FP16 should be \~8x faster. This is only possible if the Tensor Cores are disabled or heavily throttled at the firmware level.
The situation: - Crypto mining is dead → no primary use case - No display outputs → can't game - Firmware-locked Tensor Cores → can't do efficient AI inference - Result: e-waste by the thousands
Comparison: RTX 3060 shows normal Tensor Core behavior (FP16 is 3.3x faster than FP32). The CMP 100-210 should behave similarly if unlocked.
Petition for NVIDIA to release firmware unlock: https://c.org/CSc6HWpCVK Full benchmark methodology & results: https://gist.github.com/synchronic1/94d6b8c2ce89cea8f616527b5d64300a
This is hardware e-waste by design. The silicon works - it's artificially crippled.
ProjectPhysX@reddit
Yes, they intentionally fuse off hardware through firmware, to hinder customers using the card for anything else than its intended purpose. This sucks.
Nvidia CMP 170HX (which is a cot down A100 die) for example has fused-multiply-add disabled through firmware. But with a software hack, you can still make it perform well in general compute/simulation workloads.
AMD Radeon VII for has half of its FP64 cores disabled through firmware, reducing FP64:FP32 ratio from native 1:2 to artificial 1:4.
UpsetKoalaBear@reddit
If anyone knows about Nvidia’s core layouts, it’s this guy!
Check his profile.
UpsetKoalaBear@reddit
This is a pretty false comparison to make. Tensor cores also change architectures from generation to generation.
You should be comparing it with the 20 series, which came out around the same time and Turing was a variant of Volta.
Malygos_Spellweaver@reddit
Not sure why the post is NSFW, but 5% real performance is abysmal. Doesn't make sense, nobody mines crypto anymore.
desexmachina@reddit (OP)
Regular old GGUF inference is actually near 3090 performance
Maleficent_Celery_55@reddit
They probably used binned V100s to produce it so I wouldn't be surprised if it has something to do with the hardware than firmware.
Maybe try flashing V100 BIOS to confirm?
SuperNanoCat@reddit
Worth looking into, OP! They may have been more sophisticated with Tensor limitations on the Volta-based cards, but the GP102-based P102-100 5GB actually has 10GB of VRAM, with the full capacity unlockable with a modded vBIOS.
desexmachina@reddit (OP)
A very sad possibility is that the tensors are fused at the die, or binned w/ high tensor reject anyhow
naicha15@reddit
Good luck convincing Nvidia to reverse that.
It was an intentional business decision to sell otherwise crippled mining-specific cards. They saw what happened with the secondary market with used Pascal and AMD Polaris cards after that crypto cycle. They didn't want to see their future product sales crippled by artificially depressed secondary market old stock prices. That's why they made these available in bulk at a discount to miners over full V100s or the various Turing/Ampere cards.
Also, try not writing with AI if you want people to help you.
desexmachina@reddit (OP)
true, and another likely possibility is that the tensors are fused at the die and useless. This whole post was mostly for jk’s anyhow and testing some agentic workflow
spky-dev@reddit
Silence, Claude.
steve09089@reddit
Yeah, I thought it was some kind of AI
desexmachina@reddit (OP)
GLM actually
sas41@reddit
You posted two paragraphs, and you couldn't even write it yourself?
JFC, have some decorum.
desexmachina@reddit (OP)
I’m embracing the agentic era
GalvenMin@reddit
My rule of thumb: why should I care to read something that someone else clearly didn't care to write?
randomkidlol@reddit
yes thats the point. they dont want crypto cards devaluing their datacentre cards.
yuri_hime@reddit
Good luck with that - Volta is already past end of (real) support.
I wouldn't be surprised if that feature has a fuse like the TU106-in-1650 GPUs did, where tensor core is disabled or throttled to match the original GPU (which didn't have them in the first place).
ToughDefinition2591@reddit
According to spec sheets the Bus Interface is also only PCIe 1.0 x1. Not exactly great.
Beneficial_Common683@reddit
If you can hack Falcon or nvidia signature then you can unrestrict the core. Instead of petition.org just put a big prize for the hacker
chippinganimal@reddit
Doubt a change.org petition would do anything, you'd probably have better luck getting in touch with an Nvidia driver developer over email if you can find their contact info
Pretty sure you can use it to game if you use it on a pc with integrated graphics, windows supports using other gpus for the actual processing. I could be wrong but I believe that's kinda how gaming laptops without a Mux switch work
desexmachina@reddit (OP)
I probably should try that next. The tensors work, the clocks are lowered, but the biggest nerf is that it is pcie 1x, but that's only an issue for model loading, the HBM2 is great for inference, 3090 like performance
twnznz@reddit
It’s good for single card inference. Don’t try and build a cluster from these.
desexmachina@reddit (OP)
For sure, always loaded SLM is what it would be good for because load times are too long
ParthProLegend@reddit
No, you are not wrong. The other gpu if not plugged to a display would be categorised into the type, "Render-Only Display Device". A connected GPU would be "Full Display Device" instead. You can verify that by doing Win+R and then typing dxdiag and pressing enter key if you have the two GPUs.
Source: I don't have MUX and have been annoyed by having this setup for the last 3 ¼ years. It's a headache for somethings and a miracle for other things. But for gaming, it's annoying due to performance loss, latency and overheating.
jmakov@reddit
Use GLM-5.1 or GPT-5.4 to reverse engineer and write your own firmware
desexmachina@reddit (OP)
There’s vbios variants, only issue is that Nvidia put a chip on the board for this specific gatekeep