Arseny Kapoulkine - Measuring acceleration structures
Posted by ga_st@reddit | hardware | View on Reddit | 3 comments
Posted by ga_st@reddit | hardware | View on Reddit | 3 comments
Noble00_@reddit
Thanks for sharing this, really interesting.
It' really interesting to see AMD's software improvements across AMDVLK releases. Using a 7900 GRE starting from 2024.Q3 to 2025.Q1 there was a \~55% reduction in BLAS size, though it does seem any remaining gains for RDNA2/3 are going to be limited.
What's also interesting to note, while the theoretical BLAS sizes for each driver aren't that wildly different from the measured values, with changes to RDNA4 what is expected to be 9.2 bytes/triangle in theory is actually \~48 bytes/triangle measured. Kapoulkine does discuss further as to why, but perhaps like with AMDVLK driver releases showing improvement, perhaps over time we can see some improvements taking advantage of RDNA4.
Another thing I found interesting is that Xe2 (B580) and RDNA4 (9070 XT) does not seem far off in RT acceleration (47.9 vs 45 bytes/triangle). Hoping Kapoulkine may be interested in doing a writeup for Intel as well:
ga_st@reddit (OP)
Also the fact that the data/memory layout differs between vendors, so ray traversal and intersection routines that get optimized only for a specific vendor will not perform as well on a different vendor.
OutlandishnessOk11@reddit
Theoretical of 9.2byte/triangles, that is crazy. Blackwell is getting pretty close for real world content, some good improvement there.