Benchmarking Nvidia's RTX Neural Texture Compression tech that can reduce VRAM usage by over 80%

[-]

jocnews@reddit

Where are the redditors denying that it will lead to huge FPS drops because the tech makes one of the most basic operations of game graphics overly expensive?

[-]

Vushivushi@reddit

GPU vendors obviously want to sell more compute and less VRAM and unfortunately with DRAM contract ASPs approaching $15/GB, the tradeoff is a must.

Even 2GB of VRAM saved could allow for a 10% larger GPU die and surely that's enough to offset the overhead. The rest goes to their margins.

[-]

It is cool but I'll just ignore it until hardware support is widely available and inference on sample is performant enough to make the default. Inference on load is not worth the quality loss and encode time when I can just do that offline.

[-]

Humble-Effect-4873@reddit

According to developers' Q&A at GDC, current NTC uses FP8 for both the RTX 40 and 50 series, but the 50 series actually supports FP4. After the next-generation 60 series is released, could NTC and DLSS suddenly announce support for FP4? The 50 series might be able to reduce performance loss by nearly half.

[-]

MonoShadow@reddit

Decompress on sample + dlss does slow down the test scene by quite a bit. Decompress on load doesn't fix VRAM limitations. And the tech introduces noise into the scene for DLSS to clean up while DLSS also runs on the same Tensor Cores as Neural compression.

Nice article overall, a shame 2000 and 3000 series weren't tested. Those cards have much slower tensor cores. Apparently it is also available to other vendors via DX Cooperative Vectors, so testing this on Intel or AMD might be interesting as well.

[-]

gorion@reddit

my test from 6months ago:

RTX NTC 0.8 on sponza at 1080p, i got:
- 5070TI: +0.5ms
- 2060: +5.4ms

[-]

ObjectivelyLink@reddit

That’s pretty massive no? At that point you’d rather drop settings surely. 5.4 is big overheard

[-]

read_volatile@reddit

Indeed which is why it’s not recommended to use inference-on-sample on cards prior to Ada

[-]

ObjectivelyLink@reddit

Looks like a situation where I bet this technology saves maybe the 4060 n up and this is where we’ll see the big cut off for 2nd and 3rd series rtx it least the lower end cards.

[-]

read_volatile@reddit

DLSSG and RR already showed they’ve been comfortable using FP8 without caring about leaving older cards behind. I’m imagining they’ll do the same thing with DLSS 5 by using NVFP4

[-]

ObjectivelyLink@reddit

Yeah but dlss 4.5 isn’t great until you hit it least the 3080 10gb. It might work but the cost to performance will be big. It won’t save the cards that’s need it like a 3070.

[-]

Devatator_@reddit

Yeah on the GitHub repo it's written tha the oldest card they tested it on was a 2000 series card, kinda interested in how that performs

[-]

tarmacjd@reddit

It’ll be interesting to see how they proceed with the different ‚modes‘. I only understood half of the article, but if they can find the right balance of compression it could be promising.

It bothered me when purchasing a GPU that DLSS runs so much better on the higher VRAM cards where you don’t need it. This can be part of the solution there.

[-]

MarJDB@reddit

The second they decide to give us 6gb xx60 and 8gb xx70 because you dont need more anymore im switching to amd... i can just see this coming from NGreedia -_-

[-]

NoPriorThreat@reddit

if 80% compression really holds, then 6GB is effectively 30GB.

[-]

Keulapaska@reddit

a 30GB card will have 5x/3.33x the memory bandwidth of 6GB card though.

[-]

WhoTheHeckKnowsWhy@reddit

i think this tech exists solely so that lower tier crappy cards don't straight up puke the bed like they do when their vram is overrun. Yeah it will be slower, but not constantly hitting the system ram for texture space slow.

[-]

jenny_905@reddit

Getting these PCMR types to understand will always be a battle, they're still struggling with GDDR7 bus widths.

[-]

MarJDB@reddit

And what about current and older games will this work automatically on them too?

[-]

NoPriorThreat@reddit

which older game requires more than 6 GB vram?

[-]

nosurprisespls@reddit

Before this argument gets to level 10, what's you all's definition of "older"?

[-]

MarJDB@reddit

2020 until end of 25, current now until this tech releases... honestly not even trying to argue, would be great if it eventually gets so good and be almost "lossless" while using 80% less mem. DLSS started quite shty and became pretty good so will have to wait and see.

[-]

AnechoidalChamber@reddit

Mine is any games currently released or released in the future that won't be using NTC.

And there are plenty of current games that already bust 8GB GPUs wide open.

[-]

NoPriorThreat@reddit

2+ year

[-]

steve09089@reddit

They won't roll back because existing games that don't have this tech exist.

Though I can definitely see them using this tech as an excuse to freeze VRAM counts as is

[-]

Sopel97@reddit

I wish there was an implementation of on sample that doesn't require STF because it doesn't feel necessary. Though I could see DLSS/DLAA being de-facto always-on so that might not matter that much.

[-]

denoflore_ai_guy@reddit

And - this is ai relevant. A few pieces of literature released lately make using RT cores for MoE models faster - wondering how this would be applicable.

[-]

pythonic_dude@reddit

1ms penalty means going from 100fps to 91. Wish Tom's provided actual numbers for all resolutions instead of just mentioning it in very vague terms while only giving a single graph per card.

[-]

EdliA@reddit

Wouldn't there be a performance gain from not chocking the VRAM though?

[-]

Olde94@reddit

i say this as someone with a 1660ti laptop (6GB) who have run a 1440x3440 monitor. If you hit the Vram limit and it doesn't chrash, performance is not a hit but a smackdown. It's a crawl at that point. i had games where the difference between medium and high were.. 55fps and like 10. On my 4070s the same setting change would be 55 and 45 (relatively speaking) and even less sometimes depending on the setting tweaked. mind you i talk about medium to high, not ultra/extreme etc. where some settings go heywire and just eat resources.

[-]

pythonic_dude@reddit

I'm obviously assuming that you are within vram limits in both cases (by dropping the texture quality way down if needed, thanks to it not measurably affecting performance while you are within vram limits). Measuring vs out-of-vram scenario is not viable since it varies too much (by game, by scene in a game, by pcie version..)

[-]

EdliA@reddit

I mean the cards that need this tech are the ones that are operating at the limit. The others would just not have it on at all or at much lower compression.

[-]

pythonic_dude@reddit

Every card can benefit from this tech. There are plenty of use cases for obscenely large textures that would suck the life even out of 5090. Environment and NPCs so big they are basically environment as the biggest example, but also basically anything close-up.

Worst case scenario, 2ms for 5060 at 4k is going from 60fps to 54fps, for example. It's perfectly viable.

[-]

SignalButterscotch73@reddit

Still very interesting to read but until its in a game (and preferably works on all 3 manufacturers cards) it doesn't exist.

[-]

dudemanguy301@reddit

Read the article? 🤷‍♂️

The technology is also supported on AMD and Intel GPUs.

[-]

DerpSenpai@reddit

Qualcomm too btw

[-]

ParthProLegend@reddit

False

[-]

DerpSenpai@reddit

Previous posts talks about Qualcomm also doing this

[-]

SignalButterscotch73@reddit

It should in theory, but to my knowledge it like the others have only been tested on their own hardware. Untested is unknown. Just like not implemented is not existing.

[-]

xXx-c00L_BoY-xXx@reddit

You cant be serious about other manufacturers. It’s up to them to develop this tech.

[-]

SignalButterscotch73@reddit

All 3 have a new compression tech in the works. For any of them to become the new standard, replacing BCn, it needs to be cross compatible across all of them in my opinion.

BCn was developed originally by S3 Graphics, not Nvidia, not 3DFX, not ATI, not AMD and not Intel. Even at their hight, S3 were a nobody compared to the big names but they made the best compression algorithm and it became the standard, outlasting them in the GPU space.

[-]

N2-Ainz@reddit

NVIDIA has like 90%+ of the market, so there really wouldn't that much of an issue if they would only implement NVIDIA's solution

Would still be pretty bad and I doubt they only implemenr one version but a standard would be pretty nice

[-]

syknetz@reddit

Console don't run Nvidia, Nvidia needs their shit to run on console if they ever hope to get developers on board.

[-]

Beautiful_Ninja@reddit

Why do people keep forgetting the Switch 2 exists? AMD is not even half the console market anymore with Xbox sales being basically non-existent at this point.

[-]

syknetz@reddit

There are about 5 times as many PS5 than Switch 2 out there. AMD is still much more than half of the "premium" gaming segment.

And the point is moot, developers won't throw away a 100M install base because they can get slightly better performance in some cases. Cases which don't include the Switch 2 here, because it falls short of the Nvidia recommendations for real-time texture decompression with its Ampere GPU.

[-]

EdliA@reddit

Doesn't matter. DLSS became popular on PC long before the others had a proper upscaling. The PC gaming audience is big enough for the developers to bother with it.

[-]

MonoShadow@reddit

Intel already has one. I think AMD is developing their own. I think the idea here is before all 3 come together on a standard there's no reason for devs to ship 3 different versions of assets.

[-]

Seref15@reddit

. I think the idea here is before all 3 come together on a standard there's no reason for devs to ship 3 different versions of assets.

I mean, they have a reason if Nvidia stops selling high-VRAM gaming GPUs. Studios will have to conform themselves to available hardware.

I feel pretty confident that the reason NTC exists is to get away with putting less memory on gaming cards so Nvidia has more available for datacenter/inference cards.

[-]

jenny_905@reddit

Intel are working on something similar.

[-]

beneficiarioinss@reddit

Just like always, other manufacturers will release equivalent techniques. Nvidia has been at the bleeding edge of gaming, and everyone else is just failing to catch up

[-]

ElectronicStretch277@reddit

AMD already announced Universal Compression no? While Nvidia is ahead AMD has been catching up on ML features (not game implementation, that's out of their hands) at a fairly fast pace.

[-]

EnglishBrekkie_1604@reddit

AMD’s implementation is more limited IIRC, saves on storage but not VRAM. Since Intel’s technique does save VRAM Intel actually is ahead of AMD here, like they were with upscaling.

[-]

SignalButterscotch73@reddit

On the other hand, AMD is more generous with VRAM so can afford to work on something more limited. (that's probably also cheaper to create for their smaller software team)

[-]

EnglishBrekkie_1604@reddit

Intel is equally generous so it’s a bit of a moot point. Also this tech will almost certainly be the most useful for iGPUs (not just for the VRAM but because it saves bandwidth), so Intel having it for their iGPUs and AMD not having it is yet another way they get mogged by ARC, somehow.

[-]

ycnz@reddit

They made plenty of claims around VRAM compression when the 20 series. Fuck that shit, my 2080 was not great.

[-]

005Orangefvr@reddit

FOR XDA DEVELOPERS:

[APP] AI2ORBIT Benchmark Pro v1.2.0 - Professional Hardware Benchmarking

Features: * 8 device benchmarks (CPU, Memory, GPU, Cache, CUDA, Vulkan, DRAM, Frame Timing) * Native C++17 engine - not a wrapper, real hardware measurement * DRAM memory optimization demonstrates up to 17x performance gain * Financial CDS pricing with Monte Carlo simulation reference implenentation. * Spotify playback controls built in reference music stream implementation * Free tier: 3 benchmarks / Pro: all 8 (12.99 GBP)

Supported architectures: ARM64, ARM32, x86, x86_64 Min SDK: Android 5.0 (API 21) Size: ~4 MB

Download: Link available 12th April.

Feedback welcome - this is an early release and we are actively developing.

[-]

GenZia@reddit

So... We are getting 9 gig 6060s @ 96-bit, after all?

[-]

beneficiarioinss@reddit

I doubt that. Vram is crazy cheap nowadays, probably 24gb on a 6050 minimum

[-]

gvargh@reddit

yeah 24 gigaBITs sounds realistic for nvidia

[-]

BavarianBarbarian_@reddit

Someone check the hopium supply, I think this dude just decimated our entire stock

[-]

DIYfu@reddit

From what timeline did you just come here? I wanna go there.

[-]

crshbndct@reddit

6060 6GB

6060ti 8GB

90% of older games won’t run well, and newer games will require every one of Nvidias technologies just to look half as good as those older games.

All for the sake of $20 worth of vram.

[-]

jsheard@reddit

That would be a pretty stupid move considering this tech requires per-game integration, even the concept does stick its going to take a few generations to become the norm.

[-]

dudemanguy301@reddit

NTC is a new format, games that use more than 12GB already exist, will continue to exist, and new ones will release before NTC is common as well. NTC for all textures is also not guaranteed, and may be leveraged more piecemeal. Lastly inference on sample has its own performance and image quality implications that may be undesirable for all scenes or GPUs, where inference on load or inference on feedback is preferable and those methods either save less VRAM or none at all.

[-]

CaptainMonkeyJack@reddit

This is kind of interesting.

An important thing here is that games do not necessarily have to use this uniformly across every texture. It can be a per-texture decision, and from the examples they showed it seems like it can get even more granular than that, where only the specific parts actually needed at that moment get pulled in.

The way I keep thinking about this is as a caching hierarchy. Maybe what would traditionally be something like 1TB of texture assets ends up looking more like 100GB on disk, 10GB on the GPU in a compressed form, and maybe 2GB in a more performance-oriented format for the stuff that matters most right now.

Then the job is just to move intelligently through that hierarchy: keep most of the world in the cheaper form, promote what matters, and avoid paying the cost of keeping everything in its most expensive form all the time.

That is why the caching and streaming side of this seems so interesting to me. The sampler feedback approach in the article seems like it may already be going in that direction, although the performance hit looked a bit bigger than I expected, which makes me wonder whether being a little less aggressive about evicting things would help.

I also think this gets really interesting when combined with DirectStorage-style pipelines, where assets can be streamed more directly to the GPU and decompressed there. If the assets are already much smaller before they even move through the pipeline, then that should mean less data being moved around overall, helping with speed and latency too.

And the final layer of that cache hierarchy could basically be the internet. We already have games like Flight Simulator using world data measured in petabytes, so if this kind of compression approach works well, it feels like it could either allow much more quality within the same bandwidth budget or make those kinds of huge streamed worlds far more practical in terms of internet requirements and operating cost.

That is what feels exciting here to me: not just smaller textures, but a path toward much larger and richer worlds at more reasonable install sizes, bandwidth needs, and memory budgets.

[-]

AnechoidalChamber@reddit

Well, I might've been partly wrong, this could perhaps save 8GB GPUs...

But first I'd like to see it tested on 8GB 20xx and 300xx GPUs like the 2070 and 3070.

[-]

AutoModerator@reddit

Hello RTcore! Please double check that this submission is original reporting and is not an unverified rumor or repost that does not rise to the standards of /r/hardware. If this link is reporting on the work of another site/source or is an unverified rumor, please delete this submission. If this warning is in error, please report this comment and we will remove it.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.