AMD plans for FSR4 to be fully AI-based — designed to improve quality and maximize power efficiency

[-]

DktheDarkKnight@reddit

Just in time. It's possible that FSR 4 and PSSR are just the same thing and AMD was just waiting for the PS5 Pro announcement to introduce it.

[-]

From-UoM@reddit

Or Sony made it on their own like Intel did.

And now amd has to respond because their hardwate has AI upscaling for someone else instead their own.

[-]

Strazdas1@reddit

What Cerny actually meant is semi-custom, because custom would mean Sony has designed a new uarch which isnt what happened here. But Custom is good enough for average person to understand.

[-]

From-UoM@reddit

PSSR is patented by Sony from 2021

That rules out any chance of it having links with fsr 4 which started on 9-12 months ago.

[-]

Strazdas1@reddit

PSSR is also a software solution utilizing what is presumably regular tensor cores from AMD. PSSR use does not mean Sony made custom hardware design.

[-]

From-UoM@reddit

We know from ROCm rdna4 doesn't have Tensor cores equivalents.

[-]

Educational_Sink_541@reddit

The 3D audio engine on the PS5 isn’t custom hardware, it’s just some CUs fused off for dedicated audio processing.

[-]

From-UoM@reddit

The PS5 has 4 CU disabled with 36 out of 40 Usable.

The ps5 pro has 64 CUs with 4 disabled to make 60 CUs

The xbox series x also has 4 disabled with 52/56 CUs

The series s has 20/24 CUs usable

So all four consoles disabling exactly 4 CUs. So its highly likely are not fused of CUs are for dedicated processing but for just yeild rates. The xboxes doesn't have audio processing and that also disable 4 CUs.

[-]

Educational_Sink_541@reddit

This is info from digital foundry who described it as a re-engineered compute unit. So maybe it’s tacked on instead of a stock CU but it’s not like Sony is engineering bespoke silicon here.

https://www.eurogamer.net/digitalfoundry-2020-playstation-5-specs-and-tech-that-deliver-sonys-next-gen-vision

I highly doubt anything in the Pro is something Sony engineered, most likely this is something AMD came up with.

[-]

From-UoM@reddit

The Tempest Engine is effectively a re-engineered AMD GPU compute unit, stripped of its caches and relying solely on DMA transfers - just like a PS3 SPU. In turn, this opens the door to full utilisation of the CU's vector units

So they made a custom unit based off a CU.

Its not a CU fused off. Its a completely different unit. I.e A custom hardware.

And Sony makes arguably the best audio devices in the world with XM series. Ofcourse they know how to do it.

[-]

Rippthrough@reddit

AMD had software to allow you to do with on GPUs with what was effectively audio ray tracing LONG before the PS

[-]

Educational_Sink_541@reddit

They added exclusive RDNA CUs on the APU with no cache and use it for audio processing. This is not equivalent to designing a discrete audio processing unit from scratch which seemed to be what you implied originally. These are just CUs.

The Xbox basically does the same thing except it uses the CPU and does it in software via Atmos for Headphones. If I wanted to be very contrarian I would say the PS5 basically uses GPU-accelerated audio processing (tbh not all that incorrect but they did modify the CU to remove the cache so whatever).

[-]

DktheDarkKnight@reddit

AMD already mentioned it has been in development for 9 to 12 months.

[-]

capybooya@reddit

I would hope so, it would be preferable if the base architecture is fixed and can be built on for several generations to avoid having features break or left unaccellerated in not too long again.

[-]

max1001@reddit

But this also means more limited hardware support.

[-]

ShadowRomeo@reddit

Might get downvoted for this, but i honestly think we need to let go support of Pascal and Polaris, these GPU architectures are literally 8+ years old.

[-]

matkinson123@reddit

True, you wouldn't expect phones to have this sort of support. (although some of the latest ones almost do!)

[-]

Strazdas1@reddit

You wouldnt expect GPUs to have this sort of support a decade ago. Remmeber when you had to upgrade at least every 2 generations if you wanted to play new releases?

[-]

WuWaCamellya@reddit

Agreed. In a world where both DLSS and XeSS are both just strictly better in both image quality and performance AMD needs to just accept it and stop worrying about ancient tech, which they seem to finally be doing. Nothing will prevent the use of older versions of FSR on that hardware and there is no sense in holding back modern hardware artificially for the sake of GPUs from 2016.

[-]

Estbarul@reddit

It's like 4000 vs 3000 series, framegen or not

[-]

max1001@reddit

Which everyone on this sub criticized....

[-]

Dreamerlax@reddit

The "fake frames" narrative died when FSR3 dropped.

[-]

WHY_DO_I_SHOUT@reddit

Well, throwing away support for old hardware gets more acceptable when the cutoff date is further away.

[-]

Vb_33@reddit

Cards of the past can use XeSS DP4a and FSR1-3.

[-]

PointSpecialist1863@reddit

You can do AI upscaling with shaders. You only need tensor cores to reduce power consumption.

[-]

dparks1234@reddit

The market for GPUs without tensor core equivalents is shrinking. Every RTX card since 2018 has the hardware and so does every Intel dGPU. It’s just Radeon and old Pascal cards that will be missing out.

[-]

Winter_2017@reddit

More importantly, it's not included on AMD iGPUs. I'm sure that had a lot to do with the decision to make FSR a software solution.

[-]

PMARC14@reddit

Do the modern RDNA3.5 ones not have tensor accelerators onboard either? Would they then have to try and use the NPU

[-]

kyralfie@reddit

Would they then have to try and use the NPU

It's basically a separate co-processor on the die, the latency will be too high to use it for graphics.

[-]

From-UoM@reddit

Its double. There will be a latency hit but doable.

Apple does it MetalFX and the AutoSR on Qualcomm uses the NPU.

The PS5 Pro is also possiblly using an NPU or separate block as Cerny said it custom hardware.

[-]

Earthborn92@reddit

I thought it was XDNA2? Not exactly custom.

[-]

From-UoM@reddit

Could be a highly custom XDNA or self designed

300 TOPs of int8 is a lot. Much higher than Xdna2 does.

[-]

Earthborn92@reddit

Big XDNA is used in the Xilinx products. It's not like NPUs are the only application.

I think it is "custom" in the same way that the Tempest Audio ending is at most.

[-]

kyralfie@reddit

Maybe it is. Depends on the implementation really.

[-]

dahauns@reddit

it needs tight and low latency compute integration with the GPU

I don't see the issue - its not like a post process effect like upscaling needs huge amounts of context switches between GPU and NPU in a hot loop.

[-]

Elegant_Hearing3003@reddit

Sorta, everything back to like RDNA1 should support this in some fashion.

[-]

OwlProper1145@reddit

They can continue to use older versions of FSR.

[-]

ABotelho23@reddit

We've got 3 generations of FSR and their minor revisions. Older cards can continue using those.

[-]

BinaryJay@reddit

I think everyone knew this was inevitable, the surprising part is how long it's taking them to actually do it.

[-]

cuttino_mowgli@reddit

It's a very AMD move. Let the competitor have at it and have their own alternative that's very different but somewhat janky. If the competitor's tech advance that much they have to have their own competing tech that's not the alternative that they first put through.

[-]

werpu@reddit

Well they were caught with their pants down

[-]

APES2GETTER@reddit

They did nothing with their pants down for a few generations that they walked out of the stall with their pants down until someone pointed out to them that their pants were down.

[-]

Strazdas1@reddit

As late as 2022 AMD was publicly claiming AI was a mistake for Nvidia. They got caught with pants in another dimension.

[-]

bubblesort33@reddit

I'm curious how much AI training it really takes. Does Nvidia have a thousand GPUs deployed running for months at a time improving DLSS constantly? Or is it 99% manual tweaking and a couple of days training on a single server rack every few months?

[-]

Strazdas1@reddit

Does Nvidia have a thousand GPUs deployed running for months at a time improving DLSS constantly?

Yes. I think they call it DGX SuperPOD now. Nvidia claims: DLSS uses the power of NVIDIA’s supercomputers to train and regularly improve its AI model.

[-]

ResponsibleJudge3172@reddit

Nvidia has two data centers to use. Selene and another that I forgot it's name.

Both DLSS, Ray reconstruction, frame gen, etc were trained on these data centers as well as all the other graphics and non graphics projects

[-]

BinaryJay@reddit

I wonder if things would be much different today if ATI remained ATI. I have lots of good memories owning ATI cards and not so much from my post ATI experiments.

[-]

werpu@reddit

ATI drivers were often bad

[-]

Rippthrough@reddit

And in the same era Geforce drivers were downright disgraceful at times, to the point of causing hardware failure

[-]

KnownDairyAcolyte@reddit

amd bought ati because ati was on fire. The hd 2900xt was a horrendous launch. They'd have either be bought out by someone else or died off

[-]

Tuned_Out@reddit

I often wonder the same if 3dfx never went out of business.

[-]

DontReadThisHoe@reddit

Once again nvidia ahead by years

[-]

XenonJFt@reddit

We shall see. Nvidia early invested AI and reaped. We don't know what they are planting seeds to reap again. Other Than that it's been ATI and nvidia banging heads for new tech for years.

[-]

DontReadThisHoe@reddit

I think k we will definitely see something around ray reconstruction. It's been a year and it's not been updated since launch. And it's showing some serious issues. Like smearing in areas where normal denoisers don't especially on edges under movement. Star wars outlaws digital foundry analysis shows this really well. I'd love to see it improved as it does add such a better RT picture. Shame the downsides are pretty hefty

[-]

dudemanguy301@reddit

Ray Reconstruction makes me think that Neural Radiance Cache is going to fit under the DLSS umbrella.

It’s a similar technology to Spacial Hash Radiance Cache but with an ML model to learn and inference on the long path radiance.

[-]

jcm2606@reddit

Maybe, but that'd really start stretching the goals of the DLSS suite. Ray reconstruction at least makes sense because, at a fundamental level, temporal upscaling and reconstruction is part of most modern denoising techniques for raytracing. Neural radiance caching, on the other hand, is completely different to temporal upscaling and reconstruction, so trying to make it fit under the DLSS umbrella would be like trying to fit neural texture compression under the DLSS umbrella. It just doesn't make much sense because they're both unrelated to the goals of the DLSS suite and how most techniques within the suite work.

[-]

dudemanguy301@reddit

Perhaps but Reflex is already a total deviation, it doesn’t even leverage machine learning unlike the other options in the suite.

[-]

ResponsibleJudge3172@reddit

Doesn't the current support have NRC by default with SHaRC support fallback? I remember seeing this at the launch of NRC

[-]

dudemanguy301@reddit

Their pathtracing SDK allows for NRC but we haven’t seen it being used. Cyberpunk 2077 overdrive uses SHaRC.

[-]

jcm2606@reddit

Fair point.

[-]

ResponsibleJudge3172@reddit

It's more of running these models while using DLSS to hide increased frame time.

I this case, NRC falls under making DLSS fast, and thus hard to say if it will even be marketed to public unlike imge quality improvements

[-]

ecffg2010@reddit (OP)

TL;DR

the final major topic that he talked about is FSR4, FidelityFX Super Resolution 4.0. What’s particularly interesting is that FSR4 will move to being fully AI-based, and it has already been in development for nearly a year.

Full quote

Jack Huynh: On the handheld side, my number one priority is battery life. If you look at the ASUS ROG Ally or the Lenovo Legion Go, it’s just that the battery life is not there. I need multiple hours. I need to play a Wukong for three hours, not 60 minutes. This is where frame generation and interpolation [come in], so this is the FSR4 that we're adding.

Because FSR2 and FSR3 were analytical based generation. It was filter based. Now, we did that because we wanted something with a very fast time to market. What I told the team was, "Guys, that's not where the future is going." So we completely pivoted the team about 9-12 months ago to go AI based.

So now we're going AI-based frame generation, frame interpolation, and the idea is increased efficiency to maximize battery life. And then we could lock the frames per second, maybe it's 30 frames per second, or 35. My number one goal right now is to maximize battery life. I think that's the biggest complaint. I read the returns too from the retailer, where people want to be able to play these games. [End quote]

[-]

mac404@reddit

"AI-based frame generation, frame interpolation" is such a word statement. Did he mean upscaling when he said interpolation? And why call out an AI version of arguably the best part of FSR?

Assuming this means AI upscaling, I think this is good news. If they've been working on models for a year, that may explain lackluster 2.x updates a bit more at least too.

[-]

Vb_33@reddit

Isn't DLSS3 FG considered AI based?

[-]

MrPapis@reddit

I think thinking they have not atleast for some period worked on AI upscaling is foolish. They obviously needed it and they obviously coulnt make it in record time. Which is basically why they made FSR what it is. They were not thinking about AI upscaling since Nvidia came out with it and they ignored it through DLSS1 as it was trash, like FSR1 mind you. But by the time DLSS2 rolled around and some time into its devleopment they must have realized theat they needed to budget and put in the time to make it. They just hasnt had teh capacity to do so. Now everything is ramping up for them as Data center is rising along with being teh defacto CPU manufacturer for both performance and efficiency.

I just hope they release FSR4 with the 8000 series. That would really be a big move on AMD's part. But perhaps 1 year isnt enough to make AI upscaling. I hope it is though. But likely its for RDNA4(FSR4 it makes sense). but if they could come with it early that would be so big for christmas sales 2024.

[-]

bubblesort33@reddit

Yeah, and he talked about 30 or 35 fps. Intel is going the "extrapolation" route with frame generation and they've talked about that. You'll see frames that don't truly even exist on a logical level for the CPU yet. Something interpolated to 30 fps would feel atrocious, with an internal frame rate and response time of 15 fps.

I'm skeptical even extrapolation to 30 fps would feel or look good, but maybe they can surprise us.

[-]

tukatu0@reddit

I think extrapolation is probably going to have artifacts unless/until the art designers factor it minding the animations. So to speak. Eh they'll figure it out after a while. At the very least if allowed to pick my poision. I think i would take the artifacts in the form of hands suddenly teleporting during extrapolation. Instead of the artifacts of upscaling which is a smear caused by practically deleting data in the image.

[-]

mac404@reddit

Interesting, don't think I've heard anything about Intel's plans in a while. Any links you could share?

Related to your other part of the comment.. yeah. I'm not convinced either when starting from that low of a frame rate. Although maybe their thought process would be "good enough on a small screen if it gives you much better battery life." We'll have to see how well it works in practice.

[-]

bubblesort33@reddit

https://www.reddit.com/r/intel/comments/18jzivh/intel_frame_generation_technology_for_xess_could/

[-]

SchighSchagh@reddit

Did he mean upscaling when he said interpolation?

yes, upscaling is done by interpolation

[-]

mac404@reddit

I mean, kinda. But the combined phrase "frame interpolation" is consistently used to mean the same type of thing as frame generation.

[-]

darkbbr@reddit

He probably meant to say "AI-based frame generation, [which is a form of] frame interpolation"

[-]

mac404@reddit

Yeah, those are the only two ways I can read it. I definitely hope it's the latter.

[-]

ArcadeOptimist@reddit

How much ya thinking Xbox is pushing this for their rumoured handheld?

[-]

From-UoM@reddit

I wonder if rdna3 or older will get screwed over here.

Pssr isn't coming to the Ps5 with rdna2 and ps5 pro will use part of Rdna4.

He is referring to handheld APUs which have an NPU. Not clear about rdna4

Sony has yet to say if the PS5 pro has a NPU or using dedicated cores on the SM. It definitely one or the other based on the uneven TOPs to shaders ratio.

So it clearly needs dedicated AI hardware which rdna3 and older completely lack. (No rdna3 does have dedicated hardware. It uses shaders with instruction set for ML)

So Will NPUs only get it? Will rdna4 get it? Will fsr3 or xess dp4a style method (slower and inferior) be the fallback for rdna3 or older?

[-]

ShadowRomeo@reddit

Likely will be limited to RDNA 4 and above, considering that it is hardware based. And honestly i think that is the best way moving forward, yes it might screw people behind previous gen hardware, but it is something that is inevitable and delaying it is only causing more harm for future hardware buyers and is practically only holding up the development | growth of a certain product.

[-]

PointSpecialist1863@reddit

They did not mentioned hardware based just that FSR4 is AI based.

[-]

From-UoM@reddit

You know i don't think rdna4 will get this.

Everything said here is for handhelds which use APUs that have an NPU

Rdna4 went into development way before 9-12 months before this was in the work.

No leaks this of dedicated ai cores either.

And rdna4 will get replaced by udna.

Sony also says they made a custom hardware for hardware for PSSR and only said RT was from future rdna. No mention of ai hardware from future rdna.

If you add it all up this screams usage of an NPU. Which I doubt will be in an rdna4 gpu.

[-]

Ok-Transition4927@reddit

Lenovo Legion Go and ROG Ally have NPU disabled in Ryzen Z1 Extreme I think

[-]

From-UoM@reddit

The Z2 is coming early next year no?

That should have the NPU and is most likely rdna3.5 based.

This and him specifically mentioning only Handhelds.

So fsr 4.0 might be NPU only first. Then rdna4 (should it have dedicated cores)

[-]

uzzi38@reddit

We don't know if Z2 will have the NPU enabled or not. The Z1 series also has it on die but disabled.

I don't think we're looking at an NPU-only solution. Strix's NPU isn't all that powerful - 50TOPs is comparable to the lowest end RDNA3 GPU's FP16 throughput which sounds good at a glance, but actually leveraging that NPU at the same time as the GPU will incur extra latency and memory bandwidth pressure as you pass data from the iGPU to the NPU (which involves going through RAM as there's no shared cache between the two). Given that the lowest end RDNA3 GPU right now (the 7600) sports 43TFLOPs of FP16, if it can run on Strix's NPU, then it should run on every RDNA3 and RDNA4 GPU.

Also as an aside, there's nothing to indicate RDNA4 has dedicated AI acceleration blocks, all Linux enablement patches have shown enhanced WMMA support (the shader based solution AMD uses for RDNA), but no MFMA support (the AI accelerator solution AMD uses for CDNA). You also get sparsity for pretty much everything FP16 and below as well.

[-]

From-UoM@reddit

40 tops int 8 would be enough.

The rtx 3050 6GB which does Dlss and is the slowest rtx card is 60 TOPs

Also the NPU is dedicated meaning it won't effect game performance.

Meanwhile the on the rdna3 it will as it would use the shaders meaning that it would take away from game performance.

Thry could a lighter and inferior version for rdna3 like XeSS Dp4a but even that has good hit with the latest 1.3 version.

https://www.techspot.com/articles-info/2860/bench/2.png

https://www.techspot.com/articles-info/2860/bench/1.png

Native average - 50 fps

Dlss balanced on 4070 - 81

Fsr Balanced on 4070 - 77

Fsr Balanced on 7800xt - 77

Xess Quality (internal Res dlss/fsr Balanced) on 7800xt - 67

Dlss on the 4070 is 20% faster than XeSS on the 7800xt which is a lot.

[-]

uzzi38@reddit

Meanwhile the on the rdna3 it will as it would use the shaders meaning that it would take away from game performance.

You are aware that you can't use Tensor cores and shader cores at the same time, right? You don't have the register bandwidth to sustain operations on both at the same time.

If the 2060 is good enough to run the full DLSS with only 45TOPs int8 (Turing does not support sparsity according to the whitepaper) then that means that the RX7600 - with "only" 43TFLOPs FP16 - should be able to run something similar as well.

[-]

ResponsibleJudge3172@reddit

They can't be assigned together at once but Nvidia in the developer forums has confirmed that they can run simultaneously in a different WARP. Issuing for shaders then for tensor cores per clock

[-]

From-UoM@reddit

The 2060 Tops is wrong

The 2070 alone has 120 Tops in the white paper

You think the 2060 is 1/3 rd of the 2070?

The 2060 should be ~104 Tops of Int8.

[-]

TheNiebuhr@reddit

only 45TOPs int8

But it has 110 at 1800mhz, no?

[-]

imaginary_num6er@reddit

RDNA2 and older will be screwed because RDNA3 has those AI cores

[-]

From-UoM@reddit

Rdna3 doesnt have dedicated AI cores.

They have AI acceleration instruction sets on the shaders.

Why do you think Sony added their own custom hardware on the PS5 Pro?

[-]

deusXex@reddit

Tensor cores are nothing more than a set of instructions for accelerated matrix operations. The only difference from "standard" instructions (or cores if you will) is that they have added specific wide registers to increase memory throughput.

[-]

cuttino_mowgli@reddit

I wonder if rdna3 or older will get screwed over here.

Absolutely. They need to redesign their GPU

[-]

Darkstalker360@reddit

I have an 8840u handheld and it had an NPU capable of 45 TOPS, think fsr4 will work on it!

[-]

Firefox72@reddit

Going the Intel way will probably be AMD's best bet.

FSR4 for RDNA4.

FSR3 for RDNA3 and older.

[-]

From-UoM@reddit

Both versions of XeSS are ML based.

That why even XeSS Dp4a looks better than fsr

Fsr 3 is not

[-]

Firefox72@reddit

I know but that doesn't change the approach.

I'm pretty sure AMD could develop a solution that can leverage hardware based on hardware detection.

[-]

From-UoM@reddit

Have you seen how slow XeSS dp4a is on non intel cards?

Dp4a runs on the shaders which eats into game performance.

That's why both Nvidia and Intel opted for dedicated AI cores.

Fsr3 is extremely light on processing and can run on shaders. But the drawbacks are quite obvious with the worst image quality.

[-]

Firefox72@reddit

Slow is an overstatement.

XeSS runs fine on my 6700XT. Yes its not always as fast as FSR but sometimes the tradeoffs are actually worht the small FPS loss.

XeSS is very much so usable on AMD cards.

[-]

From-UoM@reddit

XeSS 1.3 isn't. Its much than before.

[-]

CatalyticDragon@reddit

AMD's plan appears to have been to seed the install base with enough NPUs and 7000 series GPUs/APUs (with WMMA instructions) to make it worth it before rolling this out.

I appreciate this approach over making it a marketing feature of your latest GPUs.

We know Sony's PSSR will be using the XDNA2 NPU in the PS5Pro and that little logic unit is already shipping in laptops and will be coming to handheld gaming devices next year. RDNA4 cards will probably launch around the same timeframe and that's when you should expect FSR4 to get an official announcement (if not launch).

As much as I like high end GPUs using such 'tricks' to push high frame rates at 4k, I am more excited by the idea of a "SteamDeck 2" having a Zen5 CPU, beefier GPU with much improved ray tracing, and an NPU for everything from upscaling to cloth simulations, all while running in tens of watts.

[-]

dj_antares@reddit

enough NPUs

High latency.

WMMA instructions

Only takes 16x16x16 and does not even accelerate int8 (same performance as FP16), and also takes 32 cycles to complete (equivalent to 512 ops per CU per cycle).

We know Sony's PSSR will be using the XDNA2 NPU

No we don't. PSSR almost certainly will work on RDNA 3.5 meaning it is not XDNA-based.

AMD has two paths going forward. Either stick with dp4a then when UDNA lands, make a MFMA version, OR they go with WMMA which may or may not be better than dp4a then replace WMMA with MFMA leaving no generic version.

[-]

aliensorsomething@reddit

AMD needs to do this or they run the risk of becoming the third choice once arc is out of the early adopter beta test phase, even intel knew this and they had their own ML upscaling solution out of the gates.

[-]

mb194dc@reddit

In other news, upscaling is shite and you're better off just turning the details down and avoiding all the shimmering, artifacting and other visual issues that come from using it.

[-]

dudemanguy301@reddit

List of things “we don’t need” according to AMD fanatics right before AMD delivered exactly that thing:

An overhaul to DX11 / OpenGL drivers
Raytracing acceleration
Upscaling
Frame generation
Machine learning

Been a harsh 4 years for the clowns out there. 😔

[-]

Much_Introduction167@reddit

Give me a good AI upscale from 720p to 4K and I'm sold. Or even more than 2x Frame Generation, that would be even better.

[-]

No_Share6895@reddit

sweet dedicated RT and AI hardware with AI reconstruction too. Especially with how slow intel is at making new gens this is great to see.