Faulty chip surface ex factory on a Radeon RX 9070XT, extreme hotspot temperatures and research into the causes of pitting
Posted by Flying-T@reddit | hardware | View on Reddit | 68 comments
JakeTappersCat@reddit
Very smart that nvidia removed the hot spot probe, now nobody will know if they have the same problem!
NGGKroze@reddit
It's a valid concern true, but according to Nvidia themselves, they removed the sensor as "it was no longer accurate and no longer relevant."
Thingreenveil313@reddit
We can famously all trust Nvidia
loozerr@reddit
Have there been reports of GPUs frying or struggling to turbo?
People have the assumption set in stone that 80c is high and 100c is an emergency, and now that we could see the hottest spot's temperature it is suddenly a problem.
Thingreenveil313@reddit
Frankly I haven't been paying much attention to the Nvidia cards besides all of the crashes, melting cables, potential fires, driver issues, and hot fixes for black screen issues (x3).
loozerr@reddit
I'm not even talking strictly Nvidia, hotspot temp measurement is just a constant source of FUD.
Thingreenveil313@reddit
Nvidia is the topic of conversation here and you're responding specifically to my comments on Nvidia not being trustworthy. I don't have any comments or options on GPU hotspot temps and any "FUD" surrounding them.
loozerr@reddit
Okay? The article is about hotspot temperatures and this thread about Nvidia discontinuing its monitoring.
Even the example of the pitted surface seems perfectly functional. 9070 boosts to 2970 according to spec, Igor's example managed 3154 according to the GPU-Z screenshot.
Pitted die feels wrong but what is the actual impact of it? Similarly seeing 110C hotspot feels wrong but does it matter if you are still exceeding the spec boost clocks?
VenditatioDelendaEst@reddit
Silicon is an extremely brittle material. Chip with physical flaws like that is living on borrowed time.
"Rat tail in soup feels wrong but what is the actual impact of it? Does it matter if it was held for several minutes well over 85°C and everything that was on the rat is good and dead?
loozerr@reddit
Based on nothing but a guess. Maybe the pits reduce the stress caused by thermal cycles.
Absolutely senseless analogy.
VenditatioDelendaEst@reddit
No. Why would you even suggest that? "Cracks grow," has been Known in engineering basically as long as engineering has existed as a science.
The point is that large edge-hotspot Δ relative to other examples of the same product indicates a QC issue, and in an absolute sense means the cooling system is less effective (because thermal density).
Strazdas1@reddit
No, in fact AMD is the topic of discussion and some people keep injecting Nvidia into this.
Strazdas1@reddit
current dies can run up to 115C without issues, probably more. Heck, youll be hard pressed to find throttling at less than 95C nowadays. People still live in fantasy land where 70C is high temperature rather than expected low load working conditions.
teutorix_aleria@reddit
I guess those missing ROPs were also no longer relevant
PainterRude1394@reddit
Story about AMD defect*
nViDiA bAD aMIrIGht.
Ilktye@reddit
Also it's of course the top voted comment.
PainterRude1394@reddit
Yes, it's gotten worse as the AMD fanatics/shareholders have taken over discussions like this.
mauri9998@reddit
I seriously wonder about AMD fanatics. Are they really like this or are they making money off of their fanaticism in some way? Cuz I can't imagine ever being that devoted to a company.
Strazdas1@reddit
They are really like this. I know a few in real life. Otherwise decent fella, start talking abou hardware and they will have endless treasure trove of misconceptions and myths.
EKmars@reddit
I have an AMD GPU and I'm just finding them obnoxious. Double standards drive me nuts, might as well admit you have none at all.
DodecahedronSpace@reddit
When you're leader of the pack and being peice of shit liars and borderline scammers, you should expect it. 🤷
bibober@reddit
Reminds me of when people at my company complained of slow Citrix sessions mid-day during high utilization periods and sent task manager screenshots to IT showing 100% CPU usage as proof. The solution from IT was disabling access to task manager. Can't prove high CPU utilization now, so the problem is solved!
Flimsy_Swordfish_415@reddit
cmon that's genius :D
AK-Brian@reddit
Just in case anyone else runs into a similarly devious admin,
wmic cpu get loadpercentage
from a command prompt can also sort of get you what you need. ;)Flimsy_Swordfish_415@reddit
usually in these cases cmd is disabled too :)
COMPUTER1313@reddit
Maybe it was a 3D chess move from IT to install crypto miners on the PCs. Can’t prove crypto mining if the users can’t pull up task manager.
taps forehead
COMPUTER1313@reddit
And if the GPU burns itself outside of the warranty period, then they have to buy another one! Marketing win!
__Rosso__@reddit
Nice whataboutism
Never understood the AMD cocksucking on Reddit, well understand for CPUs because those are GOATed, but GPUs is beyond me
mrstankydanks@reddit
Reddit is a bubble. It’s still only 1/3rd the user base X has. The people here represent a small, niche group that can’t really impact wider market trends.
NuclearReactions@reddit
Gamer mentality. People ought to grow up, we are merely customers that's it. We have to be fans of good prices, great value and customer oriented practices. Not of companies.
rayquan36@reddit
How can we make this about Nvidia?
chefchef97@reddit
Comparing scenarios between the two players in a duopoly is weird to you?
rayquan36@reddit
Not weird at all, very much expected from Reddit and someone who owns AMD stock lol
noelsoraaa@reddit
Found CPUPro's alt account lol
Flying-T@reddit (OP)
With a bit of irony
Flying-T@reddit (OP)
With a bit of irony
dr1ppyblob@reddit
Fwiw, some AMD cards have always had issues with hotspot temps.
My 6950XT would hit 110c under heavy load. re-pasting didn’t work. What did work was PTM7950. The die itself is convex which caused the thermal paste to pump out or become uneven. That’s not a problem with PTM7950.
Optimal_Visual3291@reddit
Most 9070xt’s already use PTM7950.
Lumpy-Eggplant-2867@reddit
Huh, we posting igor again?
pashhtk27@reddit
Any idea how to mitigate high memory temperatures? Would putting extra cooling pads on the back of the PCB to the backplate work (since most cards are coming without any such pads on the back)
Quatro_Leches@reddit
seems to be the issue with amd cards this gen, they are probably pushing GDDR6 way way up. you really just have to make the fan curve aggressive even tho its overkill for the gpu itself. since the VRAM will be at near 90c even if your barely taxing it.
Glowing-Strelok-1986@reddit
In addition to what you suggested, some people have lowered their temperatures by building ducts to duct the air from pass-through cards directly to an exhaust.
NGGKroze@reddit
We'll see how this evolves. While Igor's Lab says this for now is isolated case, I've seen many reports of high Hotspots and Mem temps on other subs - some not as high as 113C, but others close to that (over 100C as well). It's never good for the longterm life of a GPU to run such high temps
amazingspiderlesbian@reddit
I wonder why the 9070xts have such hot memory and Hotspot temps. My memory junction temps on my 5080 are about 55-60 degrees under full load. And the memory is overclocked +3000 to 36gpbs
justjanne@reddit
I'd bet the card igorslab has was faulty and should've been thrown out, but due to high demand was shipped anyway.
amazingspiderlesbian@reddit
I was talking about the memory temp
justjanne@reddit
Look at the screenshot, that's also fine.
You're swapping cause and action. When comparing two different cooling solutions, you'll have to match hotspot temps.
For a GPU with a hotspot of 75°C, your hypothetical 10K temp gradient cooler would achieve average temps around 65°C, while this cooling solution achieves average temps around 50°C.
It's perfectly normal to have a relatively large temp gradient if the overall cooling solution is overspecced for your load. The RX 9070 XT has a TDP of 300W, but a cooler design that you'd expect for a 400W card (the architecture and size are somewhere between the RTX 4080 super and RTX 4080 ti). In the case of my screenshot, it used just 250W, leading to an even larger temperature gradient.
If you wanted to reduce that, you'd have to go with a vapor chamber design, but that's not really necessary for 250-300W card. Silicon can handle 85-95°C perfectly fine, whether as constant or cycled load.
amazingspiderlesbian@reddit
https://www.techpowerup.com/review/asrock-radeon-rx-9070-xt-taichi-oc/39.html
Here is proof since I didn't provide any. On 8 different models the average gpu temp is mid to high 50s with hotspots average 80 degrees. A massive swing.
And memory temps Averaging 90 degrees. Again really fucking hot. In a case with other components those memory temps can easily reach 100 degrees.
Compared to the 5080 I was talking about.
https://www.techpowerup.com/review/msi-geforce-rtx-5080-vanguard-soc/39.html
Average memory temp in the 60s
justjanne@reddit
And just look at how much power they're using! Absolutely incredible.
Tbh, the stock voltage for the RX 9070 XT is far too high. I achieved the benchmark result linked above at -125mV, which is the lowest that's long term stable on my card.
As most of the GPUs in that test are OC variants, they might actually be running with an even higher voltage, making the problem even worse.
amazingspiderlesbian@reddit
No it's wasn't talking about your memory temp.
I was just talking about in general from the posts I see on the radeon subreddit. Your gpu temps are very cold even with the big Hotspot swing so I wouldn't expect the memory to be very warm either. Most 9070xts aren't running at 40 ish degrees
punktd0t@reddit
Nvidia doesn't show the hotspot temp at all.
amazingspiderlesbian@reddit
Yeah i was talking about the memory temp
nullusx@reddit
The radeon chip is more dense, it has more transistors per mm2. Some Radeon chips are more concave than normal in my experience, might be a production issue.
NGGKroze@reddit
For the chip itself, sure, a possible explanation, but Memory modules getting this high? Some say there is contact problem between the cooler and the modules, which is reasonable explanation, as some say they have perfectly fine temps (80-85C Memory)
nullusx@reddit
The article provided doesnt talk about memory temperatures. Am I missing something?
NGGKroze@reddit
We stirred a bit away started talking about Memory temps as well :D but you are right.
szczszqweqwe@reddit
Nvidia remove hot spot sensors in 50X0.
plantsandramen@reddit
My GPU temp max is 46c, hot spot is 82c. This is during Steel Nomad benchmark. Huge variance.
amazingspiderlesbian@reddit
That's almost a 40c difference to the Hotspot. That's insane
plantsandramen@reddit
With a higher power level I can get 47c GPU vs 89 hotspot. It's definitely pretty large
cadaada@reddit
That was a problem in the last gen, right? The faulty vapor chambers too
__Rosso__@reddit
Average AMD moment I guess.
My 6750XTs hot spot, no matter what I do, is 80-90, always 20-30c over the rest of the die.
HavocInferno@reddit
That's a pretty normal delta though, even for many Nvidia cards. Thinking as far back as Pascal at least, full load delta on my air cooled cards has been 20+.
But Nvidia was smart this gen and just removed the hotspot sensor from its API. So you wouldn't even know the delta on Blackwell anymore.
bondybus@reddit
My old 4070ti and 4080 had a difference of 10C between hotspot and core, not as much as the 6800 that I tested before(15-20C)
ParthProLegend@reddit
Keeping the temps under 80% while losing 5-7% performance should be the norm.
Nobuga@reddit
My hotspot is always +35 degress of gpu temp, and mem temp teach up to 92 degrees, I find it uncomfortable.
Framed-Photo@reddit
Hopefully an outlier case, because I really want at least one line of GPU's that isn't at risk of cooking itself alive out of the box...
AK-Brian@reddit
This is a genuinely good examination and writeup; I'm really curious to know if other cards are similarly affected at the surface level, whether from PowerColor or otherwise.