Super god bin 9700 pro matches 7900xtx
Posted by psychoOC@reddit | LocalLLaMA | View on Reddit | 22 comments
Was scratching my head when I kept seeing 3,300mhz on this card, decided to let her eat geekbench before I give her the psychoOC treatment cooling. Knew it was a god bin but wasn't expecting her to match/beat the 7900xtx while the card is still on the blower. Ended up getting the world record entirely for navi 48 on a blower card across benchmarks. This 9700 pro is paired with a custom binned mi100 to run 72b q5 models. I'll post numbers of AI benchmarks after everything is done. Just thought yall would enjoy these numbers.
MelodicRecognition7@reddit
do heatsinks on HDDs actually help? I don't care about SSDs but high temperature for HDDs is really bad.
psychoOC@reddit (OP)
I run zfs/arc so my hdd's are always running, I never saw them go past 48c but same time fans inside the case is 3k rpms so they don't really have a chance to get hot hot. The heatsinks helped but I don't know the temps from before. Tbf the copper heatsinks cover almost the entire hdd and weigh almost 1 pound each. Heavy
blojayble@reddit
I am going to have 3x R9700 in my setup. I have not investigated OC/undervolt with this card, is the potential performance/efficiency gain significant? How to check which "bin" do I have? Thanks.
psychoOC@reddit (OP)
I don't have multiple 9700's so I don't know the perfect sweet spot, I'm only going off of other scores I see on the internet. But I noticed on this card, going past 225watts made 0 difference. I put limiter to 225watts and undervolt to China, the gains were very aggressive, but this is for bursting. Sustained was only 15% increase in performance, bursting tho almost punched 40% increase over stock. This specific 9700 already had insane numbers at stock tho vs other 9700's so I can't say if this will work for you. Overall from stock to 225 undervolt I got noticeable gains and fan sits at 32% while core sits at 58c sustained max load. Blower is dead silent. But once you pass 300watt mark, these cards are fire breathers with almost 0 gain.
GoodTip7897@reddit
I have a 7900xtx to trade you for both gpus (obviously).
I really wish I had bought a 9700 instead of my 7900xtx. I might switch eventually. Dual 9700s would be the dream.
Embarrassed_Adagio28@reddit
Why? I am considering buying either 1x r9700 32gb or 2x 7900xtx for llm's. Is there something I should know?
ROS_SDN@reddit
9700 has better prefill and vllm support, but worse decode.
7900xtx has no awq, or fp8 support its a nonparallel inference machine. It also lacks ecc.
I'd disagree though unless you're running MOE's and not logit stability testing 7900xtx are great for single user or pipeline jobs.
2-4x r9700 would run qwen122b 10b better, or give better parralel support for qwen35b, but you'll lose on qwen27b.
7900xtx have good value proposition if you know the trade offs.
alphatrad@reddit
I own two of the XTX's and now 3 of the R9700's the XTX's are a little bit faster at token gen - but the memory cap and their sheer Fing size is the real issue.
I took this photo when I was swapping out the cards.I could only fit TWO of the XTX's in my case which is a VERY large server case.
The R9700's though make up for the 10% drop in memory bandwidth and it's barely noticeable.
ROS_SDN@reddit
Look if I could justify it I'd get 4xr9700 but 2 7900xtx fit in my case and cost as much as one r9700. They have trade offs and youre right size is one.
alphatrad@reddit
I swapped to the R9700 because you can buy 3 for less than ONE 5090!!!
NVIDIA guys are always getting ripped off
GoodTip7897@reddit
More vram. I love my 7900xtx but 24gb is a bit cramped for what I do. Also 9700 is better form factor for the T7910 that I use (server card fits better than a consumer gpu). I could easily fit 2x or even 3x 9700 but 2x 7900xtx would be very annoying.
Also if you get AMD you had better love to mess around with configuration. LLMs are easy but if you want to run Flux dev or other image models pytorch is hell.
Also ubuntu is best for performance if you want to run it solely for AI.
psychoOC@reddit (OP)
Wouldn't blame yeah, after the big wait times for using vulkan because im using mi100/9700 on large models, Im going to wish I ran dual 9700's aswell LOL. You are not missing out to much, 9700 is still slow vram. mi100 beats this 9700 by 3 fold in ai. So im stuck with the mi100 lyf for gaming models
GoodTip7897@reddit
Did you overclock the vram? I was able to get mine from 1249Mhz (real clock not DDR) to 1300Mhz.
You probably could squeeze out 5% at bare minimum. Also might want to turn ECC off if you just use it for personal use. Some people said you can squeeze out more tokens that way.
grunt_monkey_@reddit
Whats your estimate on the uplift with turning off ECC?
GoodTip7897@reddit
Don't know... I saw it somewhere.
Obviously it has a stability risk but it shouldn't be worse than any other GPU that comes without ecc.
psychoOC@reddit (OP)
1300 is def a flex, I have not touched vram yet, still waiting on parts so I can evc2 the mi100 to fully unlock it finally. I noticed when the mi100's crash, they enjoy ghosting me until I reset the pc fully. So I have not gotten to brave yet.
GoodTip7897@reddit
Im curious what models you run - Kimi Dev? You had said 72B.
I wonder if the time gap makes it worth it to run qwen 3.6 or gemma 4 over some of the older dense models. But then again there is certainly a quality that large dense models have that the smaller models cant quite match
false79@reddit
I believe the 9700 is slower inference but bigger vram. 7900xtx is faster but capped at 24gb.
I find 32gb is not enough to run q8 models. 48gb is needed.
So q4 for 9700/7900 owners it is.
GoodTip7897@reddit
not if you could oc it to be the same speeds as a 7900xtx like OP did lol. Must have won the silicon lottery
psychoOC@reddit (OP)
Want to know something ironic? super god bins don't benefit from overclocking, got this world record at 225watts sub 1v (-88mv offset), 2600mhz vram. These god lvl bins spit out insane numbers in harmonic efficiency and scales with temps. But that's also the downside with god bins, more heat/power you give them, worse they do. 7900xtx performance at 225watts ecc 32gb vram.
GoodTip7897@reddit
so you're running mostly undervolt with stock clocks? Stock power limit or did you dial it back?
I admit I was too presumptious about god bins. I thought they would like to be pumped full of voltage and oc'd but maybe thats mostly just for posting with really high clocks/
I do run a decent 50mv undervolt, but I wouldn't ever think to push it to 88. I don't have any good stress testing other than LLM inference and image generation (hence why I only did a mild undervolt). I run Proxmox with GPU passed through to a Ubuntu VM that doesn't have any GUI or display, so that's why I can't exactly fire up Furmark, 3Dmark, or OCCT.
You definitely have good silicon.
I had mostly given up on undervolting in the past because i've always lost the silicon lottery (my old 6800 and my old vega 56 never wanted anything to be undervolted more than -20mv). Then my 7900xtx needed a -50mv to even run at "decent" temps (still 90c hotspot) because even a repaste didn't help. (Probably the cooler is trash but it is what it is).
psychoOC@reddit (OP)
Boost algoriithm's for these cards is limitless kinda, keeps going until it detects a instability and backs down slightly within few ms. If a card burst boost can handle very high clocks with good signal integrity, it will keep pushing into that area. So you gotta undervolt and never touch static clocks or core offset, once you hit that efficiency harmonic the card likes, it will keep bursting in that golden area. Rdna4 is pretty wild like that. I tried pushing core and vram slightly but got terrible scores, even at 330watt limit.
God bins are extremely rare, only 1 more posted on the internet of a es sample of a 9070xt but that's it for Navi 48. Mine and that es. High bin to ultra bins scale with volts and clock speed forcing. For xoc God bins kinda suck ass. Only good for that specific harmonic area. IPC on them are insane but overclock like dog ass. They also die super fast if it gets thrown off its harmonic area.