Nice write-up, but probably should add to the article that arguably the proper fix is to just slap the flat qualifier on the shader variable to disable interpolation if your GLSL version supports it (I've seen that you mentioned you target version 1.20 which does not).
If the CPU decides the adjacency integer is 238, it'll write 238.0f and the shader will read 238.0f, cast it back to 238
I'm not going to say I foresaw the exact nature of the bug but this sentence immediately triggered my eyebrows. Floats are cursed and I would never assume they're going to be exactly anything ever in any context.
Doesn’t this teach us “don’t cast bitmasks or any other non-continuous function input to float unless you want to be sad” more than about what gpus do?
I am not so sure. Yes they are represented by floats, but math might be done differently on them because they are floats because they are labled as ints. Like I can see the GPU doing what OP did implicitly for certain operations to prevent undershoots (or overshoots in other scenarios) or skip some float only operations entirely.
So it might help. It really depends on the specifics.
But hey, according to the downvotes, I'm wrong, and so is apparently MJP (who knows a thing or two about shader programming ;) )
In theory you could be right if the HLSL -> intermediate -> gpu asm compiler takes it into account and does analysis if it can make assumptions based on the fact a variable is an int. In practice... why would it, as the hw is targeted towards float processing anyway.
In practice... why would it, as the hw is targeted towards float processing anyway.
I mean since you could skip calculations on ints that wouldn't help, you might save something on that end. But GPUs are probably still in a very "just throw cycles at the problem across many lanes" mindset where this doesn't really make that much of a difference anyway.
It's a good rule of thumb, although floats have more than enough precision to exactly represent the integers 0-255, so if you're not doing any actual math on the values, just casting to float and back, the casting itself won't cause a bug.
In this case, the bad assumption that I had was that if each vertex of a triangle had the same value for a given attribute, every fragment in the triangle would also have that same value. I'd never realized before now that this isn't true.
Even without the casting, I thought it was an interesting, counterintuitive fact about GPU interpolation.
But you were doing math on it, because of the interpolation. It doesn't matter if all points "should" return the same value, as soon as you start doing math on it you cannot assume you're going to get exact results. Most arithmetical identities fail on floating point numbers, and you cannot assume that the compiler will simplify the arithmetic.
Most one or two argument identities hold, suitably generalized to include nan and signed zero.. They only really start failing once you get to three arguments: I.e. (a+b) + c vs a + (b+c).
Isn't that another argument to not use the vertex buffer for that? I mean your value clearly is per face rather than per vertex. I'm genuinely curious, I'm not a career graphics dev (and unaware if there even are ways to pass per-face ints or even floats that are as performant), but just the general formulation of the problem you're solving says the tools are wrong.
Well, the value is per-instance. Each sand tile is one instance, all in the same instance buffer. If I wasn't using instancing, the adjacency value could be a uniform, but with instancing, a uniform has the same value for every instance in the batch, so I can't use a uniform for this.
The value varies per-instance, so I put it in the instance data buffer. I don't know where else I could put it.
And the only thing you can put in the instance data buffer is floats, 20 of them per instance. (This is a limitation of the bgfx library I'm using, not a graphics API limitation.)
I'm totally not an expert either, just learning as I go! There could well be a better solution. I just don't know what it would be.
I was gonna ask "why the obsession with floats" but this seems a very reasonable decision. Coding GPUs (especially ones only supporting old standards) is just *special*.
You can run into this with vertex UVs too. Even if every value you pass in is in the range [0, 1], you can still get UVs in the fragment shader that are larger than 1.
If you just care about the raw bits, and know each vert has the same value, you can also use flat interpolation. That will pass through the value from the first vertex in the tri.
Don't get me wrong, I was also unaware of the projection stuff in the interpolation, but even if it wasn't there, the math for the basic barycentric interpolation would be u * 255.0 + v * 255.0 + w * 255.0 where u, v and w are arbitrary positive floats that add up to 1.0 and the result is no means guaranteed to be 255.0 exactly.
Fair point, but the real lesson is that GPUs will silently do something "reasonable" with your garbage input instead of telling you it's garbage. That's the part that actually burns you at 2am.
Yup, the issue was that the Domain of the Numbering System included a fractional part.
(Note, floating point numbers will also break domain name system lookups if you need to return "I have a hostname A record that matches your query.... to within an arbitrary epsilon.")
It could... but then you can't use JavaScript at all, for example, and you can't do a bunch of things on GPUs, and... a bunch of other stuff. Sometimes you use what's available, not wish for the exact tool you want.
I think that's more about size invariance than rounding. For bilinear interpolation and rasterization it's actually more convenient to have pixel/texel centers at integer coordinates instead of half-integer coordinates. But doing that has the horrible side effect of making the precise bounds for the texture dependent upon the texture size. Direct3D 9 did this with its clip space coordinates and the half pixel offset made a mess of projection matrices -- quite a lot of games got it subtly wrong leading to artifacts.
With half-integer pixel/texel centers, the exact full area of a texture is (0,0) - (1,1) when sampled and (-1,-1)-(1,1) in normalized clip coordinates for rendering. This is always the same regardless of the size of the texture.
With integer pixel centers, the exact full area is offset by half a pixel. For Direct3D 9, it's (-1 - 1/w, -1 + 1/h) - (1 - 1/w, 1 + 1/h), because the centers are offset up/left and the clip coordinate system is bottom-up. This is annoying because it makes your projection matrix dependent upon the viewport size and is tricky to get just right. If you forget about this it puts you at exactly between pixels/texels where 2D blits can get all sorts of artifacts from numerical roundoff, fill convention rules, indeterminate nearest neighbor sampling, and max blur on bilinear 1:1 blits.
This also applies to integer texel centers, but I'm not sure if any APIs used that. AFAIK D3D9 was the only mainstream graphics API to use integer centers for rasterization and all APIs used half-integer texel positioning.
This is the part AI tools make harder to notice you're missing. When I'm debugging something low-level I've hit before, I have mental hooks for it. When I let a model explain it to me the first time, I get the answer but not the hooks. And then the next time it comes up I still have to ask the model. It's the difference between owning the knowledge and just having access to it. The gap doesn't show up until you're in a production incident at 2am with no internet.
To add to what the other poster this also teaches us teaches us not to make integers from floats using truncation (casts). Use rounding. In C/C++ use roundf() always. But I don't know what shader language offers.
From a correctness perspective best to put the +0.5f into the shader instead of the CPU code that writes the values. But I think I can admit to myself I'd put it in the CPU code too because the shader code runs so many times (per pixel).
It's frustrating how much less convenient it is in C and C++ to round instead of truncate when converting float to int, despite rounding often being the more stable choice. lroundf(), for instance, is a library function that sets errno, and depending on floating point optimization and strictness settings can vary from slow to horribly slow despite CPUs often having native conversion instructions for it.
I would recommend lrintf() instead of lroundf() btw, since the former can map to native round-to-nearest-even, with potential benefits both in performance and in being unbiased.
Frustrating that there isn't a C way to do it using C's rudimentary generics support. Because having to change function calls when types change is annoying.
Although you can always make your own non-standard calls.
Would be great if there was a version which doesn't raise inexact signals.
This actually bit me years ago when I was writing a shader that used a flag value interpolated across a triangle — spent an embarrassing amount of time wondering why my conditionals were misfiring before someone pointed out that "same value at all vertices" does not mean "same value at all fragments." Feels obvious in hindsight but it's one of those things nobody tells you upfront.
GPU interpolation behavior is one of those things that bites you once and you never forget it. the hardware doing the interpolation per-fragment rather than just at vertices is such a fundamental part of how rasterization works but it's genuinely not obvious until you see it break
If you're emulating integers with floats, you'd better always round, floor, or ceil the result IMHO. It's up to the compiler optimize, yet retaining the intention.
This is one of those bugs that taught me something, so I did this writeup. I hope it's interesting. I tried to write it like a murder mystery, showing you the bug first and then dropping clues until the reveal, so maybe if you're into graphics programming you'll go "aha!" at some point and figure it out before you get to the end.
Floating point accuracy does vary a little between vendors. It also depends on exactly how the driver compiles the shader. For example fused multiply and add (FMA) has different precision to just doing (A*B)+C.
PandaMoniumHUN@reddit
Nice write-up, but probably should add to the article that arguably the proper fix is to just slap the
flatqualifier on the shader variable to disable interpolation if your GLSL version supports it (I've seen that you mentioned you target version 1.20 which does not).HighRelevancy@reddit
I'm not going to say I foresaw the exact nature of the bug but this sentence immediately triggered my eyebrows. Floats are cursed and I would never assume they're going to be exactly anything ever in any context.
gurebu@reddit
Doesn’t this teach us “don’t cast bitmasks or any other non-continuous function input to float unless you want to be sad” more than about what gpus do?
Otis_Inf@reddit
GPUs treat 'ints' in shaders also as floats, so that won't help you, I'm afraid
jacenat@reddit
I am not so sure. Yes they are represented by floats, but math might be done differently on them because they are floats because they are labled as ints. Like I can see the GPU doing what OP did implicitly for certain operations to prevent undershoots (or overshoots in other scenarios) or skip some float only operations entirely.
So it might help. It really depends on the specifics.
Otis_Inf@reddit
Nope. https://gamedev.net/forums/topic/696946-normalized-unsigned-integers-vs-floats-as-vertex-data/#post-5379884
But hey, according to the downvotes, I'm wrong, and so is apparently MJP (who knows a thing or two about shader programming ;) )
In theory you could be right if the HLSL -> intermediate -> gpu asm compiler takes it into account and does analysis if it can make assumptions based on the fact a variable is an int. In practice... why would it, as the hw is targeted towards float processing anyway.
jacenat@reddit
I mean since you could skip calculations on ints that wouldn't help, you might save something on that end. But GPUs are probably still in a very "just throw cycles at the problem across many lanes" mindset where this doesn't really make that much of a difference anyway.
rogual@reddit (OP)
It's a good rule of thumb, although floats have more than enough precision to exactly represent the integers 0-255, so if you're not doing any actual math on the values, just casting to float and back, the casting itself won't cause a bug.
In this case, the bad assumption that I had was that if each vertex of a triangle had the same value for a given attribute, every fragment in the triangle would also have that same value. I'd never realized before now that this isn't true.
Even without the casting, I thought it was an interesting, counterintuitive fact about GPU interpolation.
Kered13@reddit
But you were doing math on it, because of the interpolation. It doesn't matter if all points "should" return the same value, as soon as you start doing math on it you cannot assume you're going to get exact results. Most arithmetical identities fail on floating point numbers, and you cannot assume that the compiler will simplify the arithmetic.
wnoise@reddit
Most one or two argument identities hold, suitably generalized to include nan and signed zero.. They only really start failing once you get to three arguments: I.e. (a+b) + c vs a + (b+c).
Kered13@reddit
In this case OP was implicitly relying on the distributive property, which absolutely does not hold in floating points.
wnoise@reddit
Yes, and distributing over 3 summands, which also has the rebracketing problem.
gurebu@reddit
Isn't that another argument to not use the vertex buffer for that? I mean your value clearly is per face rather than per vertex. I'm genuinely curious, I'm not a career graphics dev (and unaware if there even are ways to pass per-face ints or even floats that are as performant), but just the general formulation of the problem you're solving says the tools are wrong.
rogual@reddit (OP)
Well, the value is per-instance. Each sand tile is one instance, all in the same instance buffer. If I wasn't using instancing, the adjacency value could be a uniform, but with instancing, a uniform has the same value for every instance in the batch, so I can't use a uniform for this.
The value varies per-instance, so I put it in the instance data buffer. I don't know where else I could put it.
And the only thing you can put in the instance data buffer is floats, 20 of them per instance. (This is a limitation of the bgfx library I'm using, not a graphics API limitation.)
I'm totally not an expert either, just learning as I go! There could well be a better solution. I just don't know what it would be.
umtala@reddit
It's probably like this to support WebGL v1 which only supports floats IIRC. WebGL v2 supports integers.
jacenat@reddit
I was gonna ask "why the obsession with floats" but this seems a very reasonable decision. Coding GPUs (especially ones only supporting old standards) is just *special*.
MaleficentCaptain114@reddit
You can run into this with vertex UVs too. Even if every value you pass in is in the range
[0, 1], you can still get UVs in the fragment shader that are larger than 1.If you just care about the raw bits, and know each vert has the same value, you can also use flat interpolation. That will pass through the value from the first vertex in the tri.
gurebu@reddit
Don't get me wrong, I was also unaware of the projection stuff in the interpolation, but even if it wasn't there, the math for the basic barycentric interpolation would be u * 255.0 + v * 255.0 + w * 255.0 where u, v and w are arbitrary positive floats that add up to 1.0 and the result is no means guaranteed to be 255.0 exactly.
rogual@reddit (OP)
That's a good point! I guess I still don't fully understand the bug, then.
GhostPilotdev@reddit
Fair point, but the real lesson is that GPUs will silently do something "reasonable" with your garbage input instead of telling you it's garbage. That's the part that actually burns you at 2am.
hongooi@reddit
It's floating point.
It's never floating point.
It was floating point.
roflpotato@reddit
floating point, the lupus of programming
Ok-Tie545@reddit
It’s a floating point issue about 0.00000000000000001% of the time
throwaway131072@reddit
The rest of the time it's DNS
wrosecrans@reddit
Yup, the issue was that the Domain of the Numbering System included a fractional part.
(Note, floating point numbers will also break domain name system lookups if you need to return "I have a hostname A record that matches your query.... to within an arbitrary epsilon.")
cdsmith@reddit
It could... but then you can't use JavaScript at all, for example, and you can't do a bunch of things on GPUs, and... a bunch of other stuff. Sometimes you use what's available, not wish for the exact tool you want.
max123246@reddit
Ugh I always forget that JavaScript's number is a float by default. True insanity
kaoD@reddit
You made me realize this is why texel centers sit at (0.5,0.5)
ack_error@reddit
I think that's more about size invariance than rounding. For bilinear interpolation and rasterization it's actually more convenient to have pixel/texel centers at integer coordinates instead of half-integer coordinates. But doing that has the horrible side effect of making the precise bounds for the texture dependent upon the texture size. Direct3D 9 did this with its clip space coordinates and the half pixel offset made a mess of projection matrices -- quite a lot of games got it subtly wrong leading to artifacts.
kaoD@reddit
Not sure if I follow.
ack_error@reddit
With half-integer pixel/texel centers, the exact full area of a texture is (0,0) - (1,1) when sampled and (-1,-1)-(1,1) in normalized clip coordinates for rendering. This is always the same regardless of the size of the texture.
With integer pixel centers, the exact full area is offset by half a pixel. For Direct3D 9, it's (-1 - 1/w, -1 + 1/h) - (1 - 1/w, 1 + 1/h), because the centers are offset up/left and the clip coordinate system is bottom-up. This is annoying because it makes your projection matrix dependent upon the viewport size and is tricky to get just right. If you forget about this it puts you at exactly between pixels/texels where 2D blits can get all sorts of artifacts from numerical roundoff, fill convention rules, indeterminate nearest neighbor sampling, and max blur on bilinear 1:1 blits.
This also applies to integer texel centers, but I'm not sure if any APIs used that. AFAIK D3D9 was the only mainstream graphics API to use integer centers for rasterization and all APIs used half-integer texel positioning.
ProfessionalLimp3089@reddit
This is the part AI tools make harder to notice you're missing. When I'm debugging something low-level I've hit before, I have mental hooks for it. When I let a model explain it to me the first time, I get the answer but not the hooks. And then the next time it comes up I still have to ask the model. It's the difference between owning the knowledge and just having access to it. The gap doesn't show up until you're in a production incident at 2am with no internet.
CodyDuncan1260@reddit
Would love it if you crossposted this to r/GtaphicsProgramming. Very fun read.
bugrit@reddit
You might have misspelled that
CodyDuncan1260@reddit
lol, I certainly did.
Oh, typing on a cell phone. You never seem to succeed me.
AdUnlucky9870@reddit
this is the kind of post i come to r/programming for. everyone's arguing about frameworks and nobody actually knows how the hardware works underneath
happyscrappy@reddit
To add to what the other poster this also teaches us teaches us not to make integers from floats using truncation (casts). Use rounding. In C/C++ use roundf() always. But I don't know what shader language offers.
From a correctness perspective best to put the +0.5f into the shader instead of the CPU code that writes the values. But I think I can admit to myself I'd put it in the CPU code too because the shader code runs so many times (per pixel).
Interesting bug.
ack_error@reddit
It's frustrating how much less convenient it is in C and C++ to round instead of truncate when converting float to int, despite rounding often being the more stable choice. lroundf(), for instance, is a library function that sets errno, and depending on floating point optimization and strictness settings can vary from slow to horribly slow despite CPUs often having native conversion instructions for it.
I would recommend lrintf() instead of lroundf() btw, since the former can map to native round-to-nearest-even, with potential benefits both in performance and in being unbiased.
happyscrappy@reddit
Frustrating that there isn't a C way to do it using C's rudimentary generics support. Because having to change function calls when types change is annoying.
Although you can always make your own non-standard calls.
Would be great if there was a version which doesn't raise inexact signals.
I'll try out lrintf() thanks for the tip.
StrawberryLiva@reddit
This actually bit me years ago when I was writing a shader that used a flag value interpolated across a triangle — spent an embarrassing amount of time wondering why my conditionals were misfiring before someone pointed out that "same value at all vertices" does not mean "same value at all fragments." Feels obvious in hindsight but it's one of those things nobody tells you upfront.
TheOneAndOnlyRandom@reddit
Is there a reason you couldn't just disable the interpolation in the shader?
rogual@reddit (OP)
I'm supporting GLSL 1.2, which as far as I know doesn't let you disable interpolation.
(Perhaps I shouldn't support such old systems anymore, but dropping those is a separate piece of work. I just needed to get the bug fixed.)
flip314@reddit
Does it support non-perspective interpolation?
alphadester@reddit
GPU interpolation behavior is one of those things that bites you once and you never forget it. the hardware doing the interpolation per-fragment rather than just at vertices is such a fundamental part of how rasterization works but it's genuinely not obvious until you see it break
TexZK@reddit
If you're emulating integers with floats, you'd better always round, floor, or ceil the result IMHO. It's up to the compiler optimize, yet retaining the intention.
rogual@reddit (OP)
This is one of those bugs that taught me something, so I did this writeup. I hope it's interesting. I tried to write it like a murder mystery, showing you the bug first and then dropping clues until the reveal, so maybe if you're into graphics programming you'll go "aha!" at some point and figure it out before you get to the end.
Otis_Inf@reddit
great writeup!
Rare-Mastodon-8377@reddit
such a nice write up, I enjoyed it thoroughly thank you :)
radarsat1@reddit
Oh my god, nice write up and good catch. I'm not sure I would have figured this out.
intersystemsdev@reddit
It was really interesting! Thanks!)
MeasurementSuperb562@reddit
But once you do math or go beyond 255, floats can introduce problems. Ints are safer for integer data unless you specifically need a float.
Necessary-Summer-348@reddit
GPUs are weirdly good at things they weren't designed for. What'd you learn?
max123246@reddit
This was about rendering graphics though?
mr_birkenblatt@reddit
Turn off interpolation of you want precise values (and also always index into x+0.5 on GPU)
happyscrappy@reddit
What you mention is what footnote 6 says to me. Isn't that what flat does? He says why he doesn't do it.
mr_birkenblatt@reddit
Oh I didn't look through the footnotes. Thanks for pointing that out
dukey@reddit
Floating point accuracy does vary a little between vendors. It also depends on exactly how the driver compiles the shader. For example fused multiply and add (FMA) has different precision to just doing (A*B)+C.
tsegelke@reddit
I really enjoyed reading this. Thanks!