Benchmarking Nvidia's RTX Neural Texture Compression tech that can reduce VRAM usage by over 80%
Posted by RTcore@reddit | hardware | View on Reddit | 67 comments
jocnews@reddit
Where are the redditors denying that it will lead to huge FPS drops because the tech makes one of the most basic operations of game graphics overly expensive?
Vushivushi@reddit
GPU vendors obviously want to sell more compute and less VRAM and unfortunately with DRAM contract ASPs approaching $15/GB, the tradeoff is a must.
Even 2GB of VRAM saved could allow for a 10% larger GPU die and surely that's enough to offset the overhead. The rest goes to their margins.
FrogNoPants@reddit
It is cool but I'll just ignore it until hardware support is widely available and inference on sample is performant enough to make the default. Inference on load is not worth the quality loss and encode time when I can just do that offline.
Humble-Effect-4873@reddit
According to developers' Q&A at GDC, current NTC uses FP8 for both the RTX 40 and 50 series, but the 50 series actually supports FP4. After the next-generation 60 series is released, could NTC and DLSS suddenly announce support for FP4? The 50 series might be able to reduce performance loss by nearly half.
MonoShadow@reddit
Decompress on sample + dlss does slow down the test scene by quite a bit. Decompress on load doesn't fix VRAM limitations. And the tech introduces noise into the scene for DLSS to clean up while DLSS also runs on the same Tensor Cores as Neural compression.
Nice article overall, a shame 2000 and 3000 series weren't tested. Those cards have much slower tensor cores. Apparently it is also available to other vendors via DX Cooperative Vectors, so testing this on Intel or AMD might be interesting as well.
gorion@reddit
my test from 6months ago:
ObjectivelyLink@reddit
That’s pretty massive no? At that point you’d rather drop settings surely. 5.4 is big overheard
read_volatile@reddit
Indeed which is why it’s not recommended to use inference-on-sample on cards prior to Ada
ObjectivelyLink@reddit
Looks like a situation where I bet this technology saves maybe the 4060 n up and this is where we’ll see the big cut off for 2nd and 3rd series rtx it least the lower end cards.
read_volatile@reddit
DLSSG and RR already showed they’ve been comfortable using FP8 without caring about leaving older cards behind. I’m imagining they’ll do the same thing with DLSS 5 by using NVFP4
ObjectivelyLink@reddit
Yeah but dlss 4.5 isn’t great until you hit it least the 3080 10gb. It might work but the cost to performance will be big. It won’t save the cards that’s need it like a 3070.
Devatator_@reddit
Yeah on the GitHub repo it's written tha the oldest card they tested it on was a 2000 series card, kinda interested in how that performs
tarmacjd@reddit
It’ll be interesting to see how they proceed with the different ‚modes‘. I only understood half of the article, but if they can find the right balance of compression it could be promising.
It bothered me when purchasing a GPU that DLSS runs so much better on the higher VRAM cards where you don’t need it. This can be part of the solution there.
MarJDB@reddit
The second they decide to give us 6gb xx60 and 8gb xx70 because you dont need more anymore im switching to amd... i can just see this coming from NGreedia -_-
NoPriorThreat@reddit
if 80% compression really holds, then 6GB is effectively 30GB.
Keulapaska@reddit
a 30GB card will have 5x/3.33x the memory bandwidth of 6GB card though.
WhoTheHeckKnowsWhy@reddit
i think this tech exists solely so that lower tier crappy cards don't straight up puke the bed like they do when their vram is overrun. Yeah it will be slower, but not constantly hitting the system ram for texture space slow.
jenny_905@reddit
Getting these PCMR types to understand will always be a battle, they're still struggling with GDDR7 bus widths.
MarJDB@reddit
And what about current and older games will this work automatically on them too?
NoPriorThreat@reddit
which older game requires more than 6 GB vram?
nosurprisespls@reddit
Before this argument gets to level 10, what's you all's definition of "older"?
MarJDB@reddit
2020 until end of 25, current now until this tech releases... honestly not even trying to argue, would be great if it eventually gets so good and be almost "lossless" while using 80% less mem. DLSS started quite shty and became pretty good so will have to wait and see.
AnechoidalChamber@reddit
Mine is any games currently released or released in the future that won't be using NTC.
And there are plenty of current games that already bust 8GB GPUs wide open.
NoPriorThreat@reddit
2+ year
steve09089@reddit
They won't roll back because existing games that don't have this tech exist.
Though I can definitely see them using this tech as an excuse to freeze VRAM counts as is
Sopel97@reddit
I wish there was an implementation of on sample that doesn't require STF because it doesn't feel necessary. Though I could see DLSS/DLAA being de-facto always-on so that might not matter that much.
denoflore_ai_guy@reddit
And - this is ai relevant. A few pieces of literature released lately make using RT cores for MoE models faster - wondering how this would be applicable.
pythonic_dude@reddit
1ms penalty means going from 100fps to 91. Wish Tom's provided actual numbers for all resolutions instead of just mentioning it in very vague terms while only giving a single graph per card.
EdliA@reddit
Wouldn't there be a performance gain from not chocking the VRAM though?
Olde94@reddit
i say this as someone with a 1660ti laptop (6GB) who have run a 1440x3440 monitor. If you hit the Vram limit and it doesn't chrash, performance is not a hit but a smackdown. It's a crawl at that point. i had games where the difference between medium and high were.. 55fps and like 10. On my 4070s the same setting change would be 55 and 45 (relatively speaking) and even less sometimes depending on the setting tweaked. mind you i talk about medium to high, not ultra/extreme etc. where some settings go heywire and just eat resources.
pythonic_dude@reddit
I'm obviously assuming that you are within vram limits in both cases (by dropping the texture quality way down if needed, thanks to it not measurably affecting performance while you are within vram limits). Measuring vs out-of-vram scenario is not viable since it varies too much (by game, by scene in a game, by pcie version..)
EdliA@reddit
I mean the cards that need this tech are the ones that are operating at the limit. The others would just not have it on at all or at much lower compression.
pythonic_dude@reddit
Every card can benefit from this tech. There are plenty of use cases for obscenely large textures that would suck the life even out of 5090. Environment and NPCs so big they are basically environment as the biggest example, but also basically anything close-up.
Worst case scenario, 2ms for 5060 at 4k is going from 60fps to 54fps, for example. It's perfectly viable.
SignalButterscotch73@reddit
Still very interesting to read but until its in a game (and preferably works on all 3 manufacturers cards) it doesn't exist.
dudemanguy301@reddit
Read the article? 🤷♂️
DerpSenpai@reddit
Qualcomm too btw
ParthProLegend@reddit
False
DerpSenpai@reddit
Previous posts talks about Qualcomm also doing this
SignalButterscotch73@reddit
It should in theory, but to my knowledge it like the others have only been tested on their own hardware. Untested is unknown. Just like not implemented is not existing.
xXx-c00L_BoY-xXx@reddit
You cant be serious about other manufacturers. It’s up to them to develop this tech.
SignalButterscotch73@reddit
All 3 have a new compression tech in the works. For any of them to become the new standard, replacing BCn, it needs to be cross compatible across all of them in my opinion.
BCn was developed originally by S3 Graphics, not Nvidia, not 3DFX, not ATI, not AMD and not Intel. Even at their hight, S3 were a nobody compared to the big names but they made the best compression algorithm and it became the standard, outlasting them in the GPU space.
N2-Ainz@reddit
NVIDIA has like 90%+ of the market, so there really wouldn't that much of an issue if they would only implement NVIDIA's solution
Would still be pretty bad and I doubt they only implemenr one version but a standard would be pretty nice
syknetz@reddit
Console don't run Nvidia, Nvidia needs their shit to run on console if they ever hope to get developers on board.
Beautiful_Ninja@reddit
Why do people keep forgetting the Switch 2 exists? AMD is not even half the console market anymore with Xbox sales being basically non-existent at this point.
syknetz@reddit
There are about 5 times as many PS5 than Switch 2 out there. AMD is still much more than half of the "premium" gaming segment.
And the point is moot, developers won't throw away a 100M install base because they can get slightly better performance in some cases. Cases which don't include the Switch 2 here, because it falls short of the Nvidia recommendations for real-time texture decompression with its Ampere GPU.
EdliA@reddit
Doesn't matter. DLSS became popular on PC long before the others had a proper upscaling. The PC gaming audience is big enough for the developers to bother with it.
MonoShadow@reddit
Intel already has one. I think AMD is developing their own. I think the idea here is before all 3 come together on a standard there's no reason for devs to ship 3 different versions of assets.
Seref15@reddit
I mean, they have a reason if Nvidia stops selling high-VRAM gaming GPUs. Studios will have to conform themselves to available hardware.
I feel pretty confident that the reason NTC exists is to get away with putting less memory on gaming cards so Nvidia has more available for datacenter/inference cards.
jenny_905@reddit
Intel are working on something similar.
beneficiarioinss@reddit
Just like always, other manufacturers will release equivalent techniques. Nvidia has been at the bleeding edge of gaming, and everyone else is just failing to catch up
ElectronicStretch277@reddit
AMD already announced Universal Compression no? While Nvidia is ahead AMD has been catching up on ML features (not game implementation, that's out of their hands) at a fairly fast pace.
EnglishBrekkie_1604@reddit
AMD’s implementation is more limited IIRC, saves on storage but not VRAM. Since Intel’s technique does save VRAM Intel actually is ahead of AMD here, like they were with upscaling.
SignalButterscotch73@reddit
On the other hand, AMD is more generous with VRAM so can afford to work on something more limited. (that's probably also cheaper to create for their smaller software team)
EnglishBrekkie_1604@reddit
Intel is equally generous so it’s a bit of a moot point. Also this tech will almost certainly be the most useful for iGPUs (not just for the VRAM but because it saves bandwidth), so Intel having it for their iGPUs and AMD not having it is yet another way they get mogged by ARC, somehow.
ycnz@reddit
They made plenty of claims around VRAM compression when the 20 series. Fuck that shit, my 2080 was not great.
005Orangefvr@reddit
FOR XDA DEVELOPERS:
[APP] AI2ORBIT Benchmark Pro v1.2.0 - Professional Hardware Benchmarking
Features: * 8 device benchmarks (CPU, Memory, GPU, Cache, CUDA, Vulkan, DRAM, Frame Timing) * Native C++17 engine - not a wrapper, real hardware measurement * DRAM memory optimization demonstrates up to 17x performance gain * Financial CDS pricing with Monte Carlo simulation reference implenentation. * Spotify playback controls built in reference music stream implementation * Free tier: 3 benchmarks / Pro: all 8 (12.99 GBP)
Supported architectures: ARM64, ARM32, x86, x86_64 Min SDK: Android 5.0 (API 21) Size: ~4 MB
Download: Link available 12th April.
Feedback welcome - this is an early release and we are actively developing.
GenZia@reddit
So... We are getting 9 gig 6060s @ 96-bit, after all?
beneficiarioinss@reddit
I doubt that. Vram is crazy cheap nowadays, probably 24gb on a 6050 minimum
gvargh@reddit
yeah 24 gigaBITs sounds realistic for nvidia
BavarianBarbarian_@reddit
Someone check the hopium supply, I think this dude just decimated our entire stock
DIYfu@reddit
From what timeline did you just come here? I wanna go there.
crshbndct@reddit
6060 6GB
6060ti 8GB
90% of older games won’t run well, and newer games will require every one of Nvidias technologies just to look half as good as those older games.
All for the sake of $20 worth of vram.
jsheard@reddit
That would be a pretty stupid move considering this tech requires per-game integration, even the concept does stick its going to take a few generations to become the norm.
dudemanguy301@reddit
NTC is a new format, games that use more than 12GB already exist, will continue to exist, and new ones will release before NTC is common as well. NTC for all textures is also not guaranteed, and may be leveraged more piecemeal. Lastly inference on sample has its own performance and image quality implications that may be undesirable for all scenes or GPUs, where inference on load or inference on feedback is preferable and those methods either save less VRAM or none at all.
CaptainMonkeyJack@reddit
This is kind of interesting.
An important thing here is that games do not necessarily have to use this uniformly across every texture. It can be a per-texture decision, and from the examples they showed it seems like it can get even more granular than that, where only the specific parts actually needed at that moment get pulled in.
The way I keep thinking about this is as a caching hierarchy. Maybe what would traditionally be something like 1TB of texture assets ends up looking more like 100GB on disk, 10GB on the GPU in a compressed form, and maybe 2GB in a more performance-oriented format for the stuff that matters most right now.
Then the job is just to move intelligently through that hierarchy: keep most of the world in the cheaper form, promote what matters, and avoid paying the cost of keeping everything in its most expensive form all the time.
That is why the caching and streaming side of this seems so interesting to me. The sampler feedback approach in the article seems like it may already be going in that direction, although the performance hit looked a bit bigger than I expected, which makes me wonder whether being a little less aggressive about evicting things would help.
I also think this gets really interesting when combined with DirectStorage-style pipelines, where assets can be streamed more directly to the GPU and decompressed there. If the assets are already much smaller before they even move through the pipeline, then that should mean less data being moved around overall, helping with speed and latency too.
And the final layer of that cache hierarchy could basically be the internet. We already have games like Flight Simulator using world data measured in petabytes, so if this kind of compression approach works well, it feels like it could either allow much more quality within the same bandwidth budget or make those kinds of huge streamed worlds far more practical in terms of internet requirements and operating cost.
That is what feels exciting here to me: not just smaller textures, but a path toward much larger and richer worlds at more reasonable install sizes, bandwidth needs, and memory budgets.
AnechoidalChamber@reddit
Well, I might've been partly wrong, this could perhaps save 8GB GPUs...
But first I'd like to see it tested on 8GB 20xx and 300xx GPUs like the 2070 and 3070.
AutoModerator@reddit
Hello RTcore! Please double check that this submission is original reporting and is not an unverified rumor or repost that does not rise to the standards of /r/hardware. If this link is reporting on the work of another site/source or is an unverified rumor, please delete this submission. If this warning is in error, please report this comment and we will remove it.
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.