Faulty chip surface ex factory on a Radeon RX 9070XT, extreme hotspot temperatures and research into the causes of pitting

[-]

JakeTappersCat@reddit

Very smart that nvidia removed the hot spot probe, now nobody will know if they have the same problem!

[-]

NGGKroze@reddit

It's a valid concern true, but according to Nvidia themselves, they removed the sensor as "it was no longer accurate and no longer relevant."

[-]

loozerr@reddit

Have there been reports of GPUs frying or struggling to turbo?

People have the assumption set in stone that 80c is high and 100c is an emergency, and now that we could see the hottest spot's temperature it is suddenly a problem.

[-]

Thingreenveil313@reddit

Frankly I haven't been paying much attention to the Nvidia cards besides all of the crashes, melting cables, potential fires, driver issues, and hot fixes for black screen issues (x3).

[-]

loozerr@reddit

I'm not even talking strictly Nvidia, hotspot temp measurement is just a constant source of FUD.

[-]

Nvidia is the topic of conversation here and you're responding specifically to my comments on Nvidia not being trustworthy. I don't have any comments or options on GPU hotspot temps and any "FUD" surrounding them.

[-]

loozerr@reddit

Okay? The article is about hotspot temperatures and this thread about Nvidia discontinuing its monitoring.

Even the example of the pitted surface seems perfectly functional. 9070 boosts to 2970 according to spec, Igor's example managed 3154 according to the GPU-Z screenshot.

Pitted die feels wrong but what is the actual impact of it? Similarly seeing 110C hotspot feels wrong but does it matter if you are still exceeding the spec boost clocks?

[-]

VenditatioDelendaEst@reddit

Silicon is an extremely brittle material. Chip with physical flaws like that is living on borrowed time.

"Rat tail in soup feels wrong but what is the actual impact of it? Does it matter if it was held for several minutes well over 85°C and everything that was on the rat is good and dead?

[-]

loozerr@reddit

Chip with physical flaws like that is living on borrowed time.

Based on nothing but a guess. Maybe the pits reduce the stress caused by thermal cycles.

Rat tail in soup feels wrong but what is the actual impact of it?

Absolutely senseless analogy.

[-]

VenditatioDelendaEst@reddit

Based on nothing but a guess. Maybe the pits reduce the stress caused by thermal cycles.

No. Why would you even suggest that? "Cracks grow," has been Known in engineering basically as long as engineering has existed as a science.

Absolutely senseless analogy.

The point is that large edge-hotspot Δ relative to other examples of the same product indicates a QC issue, and in an absolute sense means the cooling system is less effective (because thermal density).

[-]

Strazdas1@reddit

No, in fact AMD is the topic of discussion and some people keep injecting Nvidia into this.

[-]

Strazdas1@reddit

current dies can run up to 115C without issues, probably more. Heck, youll be hard pressed to find throttling at less than 95C nowadays. People still live in fantasy land where 70C is high temperature rather than expected low load working conditions.

[-]

teutorix_aleria@reddit

I guess those missing ROPs were also no longer relevant

[-]

PainterRude1394@reddit

Story about AMD defect*

nViDiA bAD aMIrIGht.

[-]

Ilktye@reddit

Also it's of course the top voted comment.

[-]

PainterRude1394@reddit

Yes, it's gotten worse as the AMD fanatics/shareholders have taken over discussions like this.

[-]

mauri9998@reddit

I seriously wonder about AMD fanatics. Are they really like this or are they making money off of their fanaticism in some way? Cuz I can't imagine ever being that devoted to a company.

[-]

Strazdas1@reddit

They are really like this. I know a few in real life. Otherwise decent fella, start talking abou hardware and they will have endless treasure trove of misconceptions and myths.

[-]

EKmars@reddit

I have an AMD GPU and I'm just finding them obnoxious. Double standards drive me nuts, might as well admit you have none at all.

[-]

DodecahedronSpace@reddit

When you're leader of the pack and being peice of shit liars and borderline scammers, you should expect it. 🤷

[-]

bibober@reddit

Reminds me of when people at my company complained of slow Citrix sessions mid-day during high utilization periods and sent task manager screenshots to IT showing 100% CPU usage as proof. The solution from IT was disabling access to task manager. Can't prove high CPU utilization now, so the problem is solved!

[-]

Flimsy_Swordfish_415@reddit

The solution from IT was disabling access to task manager

cmon that's genius :D

[-]

AK-Brian@reddit

Just in case anyone else runs into a similarly devious admin, wmic cpu get loadpercentage from a command prompt can also sort of get you what you need. ;)

[-]

Flimsy_Swordfish_415@reddit

usually in these cases cmd is disabled too :)

[-]

COMPUTER1313@reddit

Maybe it was a 3D chess move from IT to install crypto miners on the PCs. Can’t prove crypto mining if the users can’t pull up task manager.

taps forehead

[-]

COMPUTER1313@reddit

And if the GPU burns itself outside of the warranty period, then they have to buy another one! Marketing win!

[-]

Rosso@reddit

Nice whataboutism

Never understood the AMD cocksucking on Reddit, well understand for CPUs because those are GOATed, but GPUs is beyond me

[-]

mrstankydanks@reddit

Reddit is a bubble. It’s still only 1/3rd the user base X has. The people here represent a small, niche group that can’t really impact wider market trends.

[-]

NuclearReactions@reddit

Gamer mentality. People ought to grow up, we are merely customers that's it. We have to be fans of good prices, great value and customer oriented practices. Not of companies.

[-]

rayquan36@reddit

How can we make this about Nvidia?

[-]

chefchef97@reddit

Comparing scenarios between the two players in a duopoly is weird to you?

[-]

rayquan36@reddit

Not weird at all, very much expected from Reddit and someone who owns AMD stock lol

[-]

noelsoraaa@reddit

Found CPUPro's alt account lol

[-]

Flying-T@reddit (OP)

With a bit of irony

[-]

Flying-T@reddit (OP)

With a bit of irony

[-]

dr1ppyblob@reddit

Fwiw, some AMD cards have always had issues with hotspot temps.

My 6950XT would hit 110c under heavy load. re-pasting didn’t work. What did work was PTM7950. The die itself is convex which caused the thermal paste to pump out or become uneven. That’s not a problem with PTM7950.

[-]

Optimal_Visual3291@reddit

Most 9070xt’s already use PTM7950.

[-]

Lumpy-Eggplant-2867@reddit

Huh, we posting igor again?

[-]

pashhtk27@reddit

Any idea how to mitigate high memory temperatures? Would putting extra cooling pads on the back of the PCB to the backplate work (since most cards are coming without any such pads on the back)

[-]

Quatro_Leches@reddit

seems to be the issue with amd cards this gen, they are probably pushing GDDR6 way way up. you really just have to make the fan curve aggressive even tho its overkill for the gpu itself. since the VRAM will be at near 90c even if your barely taxing it.

[-]

Glowing-Strelok-1986@reddit

In addition to what you suggested, some people have lowered their temperatures by building ducts to duct the air from pass-through cards directly to an exhaust.

[-]

NGGKroze@reddit

We'll see how this evolves. While Igor's Lab says this for now is isolated case, I've seen many reports of high Hotspots and Mem temps on other subs - some not as high as 113C, but others close to that (over 100C as well). It's never good for the longterm life of a GPU to run such high temps

[-]

amazingspiderlesbian@reddit

I wonder why the 9070xts have such hot memory and Hotspot temps. My memory junction temps on my 5080 are about 55-60 degrees under full load. And the memory is overclocked +3000 to 36gpbs

[-]

justjanne@reddit

Nvidia doesn't properly report hotspot temps anymore
My RX 9070 XT, with OC, stays below 46°C (GPU) and below 71°C (Die Hotspot).
Port Royale: https://www.3dmark.com/pr/3298668
In Game: https://i.k8r.eu/iI7YMQ.png

I'd bet the card igorslab has was faulty and should've been thrown out, but due to high demand was shipped anyway.

[-]

amazingspiderlesbian@reddit

I was talking about the memory temp

[-]

justjanne@reddit

I was talking about the memory temp.

Look at the screenshot, that's also fine.

a 25 degree difference between Hotspot and core isn't good either

For a normal gpu that's running at 60-70 that would be a Hotspot above 90 degrees

You're swapping cause and action. When comparing two different cooling solutions, you'll have to match hotspot temps.

For a GPU with a hotspot of 75°C, your hypothetical 10K temp gradient cooler would achieve average temps around 65°C, while this cooling solution achieves average temps around 50°C.

It's perfectly normal to have a relatively large temp gradient if the overall cooling solution is overspecced for your load. The RX 9070 XT has a TDP of 300W, but a cooler design that you'd expect for a 400W card (the architecture and size are somewhere between the RTX 4080 super and RTX 4080 ti). In the case of my screenshot, it used just 250W, leading to an even larger temperature gradient.

If you wanted to reduce that, you'd have to go with a vapor chamber design, but that's not really necessary for 250-300W card. Silicon can handle 85-95°C perfectly fine, whether as constant or cycled load.

[-]

amazingspiderlesbian@reddit

https://www.techpowerup.com/review/asrock-radeon-rx-9070-xt-taichi-oc/39.html

Here is proof since I didn't provide any. On 8 different models the average gpu temp is mid to high 50s with hotspots average 80 degrees. A massive swing.

And memory temps Averaging 90 degrees. Again really fucking hot. In a case with other components those memory temps can easily reach 100 degrees.

Compared to the 5080 I was talking about.

https://www.techpowerup.com/review/msi-geforce-rtx-5080-vanguard-soc/39.html

Average memory temp in the 60s

[-]

justjanne@reddit

Here is proof since I didn't provide any. On 6 different models the average gpu temp is mid to high 50s with hotspots average 80 degrees. A massive swing.

And just look at how much power they're using! Absolutely incredible.

Tbh, the stock voltage for the RX 9070 XT is far too high. I achieved the benchmark result linked above at -125mV, which is the lowest that's long term stable on my card.

As most of the GPUs in that test are OC variants, they might actually be running with an even higher voltage, making the problem even worse.

[-]

amazingspiderlesbian@reddit

No it's wasn't talking about your memory temp.

I was just talking about in general from the posts I see on the radeon subreddit. Your gpu temps are very cold even with the big Hotspot swing so I wouldn't expect the memory to be very warm either. Most 9070xts aren't running at 40 ish degrees

[-]

punktd0t@reddit

Nvidia doesn't show the hotspot temp at all.

[-]

amazingspiderlesbian@reddit

Yeah i was talking about the memory temp

[-]

nullusx@reddit

The radeon chip is more dense, it has more transistors per mm2. Some Radeon chips are more concave than normal in my experience, might be a production issue.

[-]

NGGKroze@reddit

For the chip itself, sure, a possible explanation, but Memory modules getting this high? Some say there is contact problem between the cooler and the modules, which is reasonable explanation, as some say they have perfectly fine temps (80-85C Memory)

[-]

nullusx@reddit

The article provided doesnt talk about memory temperatures. Am I missing something?

[-]

NGGKroze@reddit

We stirred a bit away started talking about Memory temps as well :D but you are right.

[-]

szczszqweqwe@reddit

Nvidia remove hot spot sensors in 50X0.

[-]

plantsandramen@reddit

My GPU temp max is 46c, hot spot is 82c. This is during Steel Nomad benchmark. Huge variance.

[-]

amazingspiderlesbian@reddit

That's almost a 40c difference to the Hotspot. That's insane

[-]

plantsandramen@reddit

With a higher power level I can get 47c GPU vs 89 hotspot. It's definitely pretty large

[-]

cadaada@reddit

That was a problem in the last gen, right? The faulty vapor chambers too

[-]

Rosso@reddit

Average AMD moment I guess.

My 6750XTs hot spot, no matter what I do, is 80-90, always 20-30c over the rest of the die.

[-]

HavocInferno@reddit

That's a pretty normal delta though, even for many Nvidia cards. Thinking as far back as Pascal at least, full load delta on my air cooled cards has been 20+.

But Nvidia was smart this gen and just removed the hotspot sensor from its API. So you wouldn't even know the delta on Blackwell anymore.

[-]