I learned something about GPUs today

[-]

PandaMoniumHUN@reddit

Nice write-up, but probably should add to the article that arguably the proper fix is to just slap the flat qualifier on the shader variable to disable interpolation if your GLSL version supports it (I've seen that you mentioned you target version 1.20 which does not).

[-]

HighRelevancy@reddit

If the CPU decides the adjacency integer is 238, it'll write 238.0f and the shader will read 238.0f, cast it back to 238

I'm not going to say I foresaw the exact nature of the bug but this sentence immediately triggered my eyebrows. Floats are cursed and I would never assume they're going to be exactly anything ever in any context.

[-]

gurebu@reddit

Doesn’t this teach us “don’t cast bitmasks or any other non-continuous function input to float unless you want to be sad” more than about what gpus do?

[-]

Otis_Inf@reddit

GPUs treat 'ints' in shaders also as floats, so that won't help you, I'm afraid

[-]

jacenat@reddit

so that won't help you, I'm afraid

I am not so sure. Yes they are represented by floats, but math might be done differently on them because they are floats because they are labled as ints. Like I can see the GPU doing what OP did implicitly for certain operations to prevent undershoots (or overshoots in other scenarios) or skip some float only operations entirely.

So it might help. It really depends on the specifics.

[-]

Otis_Inf@reddit

Nope. https://gamedev.net/forums/topic/696946-normalized-unsigned-integers-vs-floats-as-vertex-data/#post-5379884

But hey, according to the downvotes, I'm wrong, and so is apparently MJP (who knows a thing or two about shader programming ;) )

In theory you could be right if the HLSL -> intermediate -> gpu asm compiler takes it into account and does analysis if it can make assumptions based on the fact a variable is an int. In practice... why would it, as the hw is targeted towards float processing anyway.

[-]

jacenat@reddit

In practice... why would it, as the hw is targeted towards float processing anyway.

I mean since you could skip calculations on ints that wouldn't help, you might save something on that end. But GPUs are probably still in a very "just throw cycles at the problem across many lanes" mindset where this doesn't really make that much of a difference anyway.

[-]

rogual@reddit (OP)

It's a good rule of thumb, although floats have more than enough precision to exactly represent the integers 0-255, so if you're not doing any actual math on the values, just casting to float and back, the casting itself won't cause a bug.

In this case, the bad assumption that I had was that if each vertex of a triangle had the same value for a given attribute, every fragment in the triangle would also have that same value. I'd never realized before now that this isn't true.

Even without the casting, I thought it was an interesting, counterintuitive fact about GPU interpolation.

[-]

Kered13@reddit

But you were doing math on it, because of the interpolation. It doesn't matter if all points "should" return the same value, as soon as you start doing math on it you cannot assume you're going to get exact results. Most arithmetical identities fail on floating point numbers, and you cannot assume that the compiler will simplify the arithmetic.

[-]

wnoise@reddit

Most one or two argument identities hold, suitably generalized to include nan and signed zero.. They only really start failing once you get to three arguments: I.e. (a+b) + c vs a + (b+c).

[-]

Kered13@reddit

In this case OP was implicitly relying on the distributive property, which absolutely does not hold in floating points.

[-]

wnoise@reddit

Yes, and distributing over 3 summands, which also has the rebracketing problem.

[-]

gurebu@reddit

Isn't that another argument to not use the vertex buffer for that? I mean your value clearly is per face rather than per vertex. I'm genuinely curious, I'm not a career graphics dev (and unaware if there even are ways to pass per-face ints or even floats that are as performant), but just the general formulation of the problem you're solving says the tools are wrong.

[-]

rogual@reddit (OP)

Well, the value is per-instance. Each sand tile is one instance, all in the same instance buffer. If I wasn't using instancing, the adjacency value could be a uniform, but with instancing, a uniform has the same value for every instance in the batch, so I can't use a uniform for this.

The value varies per-instance, so I put it in the instance data buffer. I don't know where else I could put it.

And the only thing you can put in the instance data buffer is floats, 20 of them per instance. (This is a limitation of the bgfx library I'm using, not a graphics API limitation.)

I'm totally not an expert either, just learning as I go! There could well be a better solution. I just don't know what it would be.

[-]

umtala@reddit

It's probably like this to support WebGL v1 which only supports floats IIRC. WebGL v2 supports integers.

[-]

jacenat@reddit

I was gonna ask "why the obsession with floats" but this seems a very reasonable decision. Coding GPUs (especially ones only supporting old standards) is just *special*.

[-]

MaleficentCaptain114@reddit

You can run into this with vertex UVs too. Even if every value you pass in is in the range [0, 1], you can still get UVs in the fragment shader that are larger than 1.

If you just care about the raw bits, and know each vert has the same value, you can also use flat interpolation. That will pass through the value from the first vertex in the tri.

[-]

gurebu@reddit

Don't get me wrong, I was also unaware of the projection stuff in the interpolation, but even if it wasn't there, the math for the basic barycentric interpolation would be u * 255.0 + v * 255.0 + w * 255.0 where u, v and w are arbitrary positive floats that add up to 1.0 and the result is no means guaranteed to be 255.0 exactly.

[-]

rogual@reddit (OP)

That's a good point! I guess I still don't fully understand the bug, then.

[-]

GhostPilotdev@reddit

Fair point, but the real lesson is that GPUs will silently do something "reasonable" with your garbage input instead of telling you it's garbage. That's the part that actually burns you at 2am.

[-]

hongooi@reddit

It's floating point.

It's never floating point.

It was floating point.

[-]

roflpotato@reddit

floating point, the lupus of programming

[-]

Ok-Tie545@reddit

It’s a floating point issue about 0.00000000000000001% of the time

[-]

throwaway131072@reddit

The rest of the time it's DNS

[-]

wrosecrans@reddit

Yup, the issue was that the Domain of the Numbering System included a fractional part.

(Note, floating point numbers will also break domain name system lookups if you need to return "I have a hostname A record that matches your query.... to within an arbitrary epsilon.")

[-]

cdsmith@reddit

It could... but then you can't use JavaScript at all, for example, and you can't do a bunch of things on GPUs, and... a bunch of other stuff. Sometimes you use what's available, not wish for the exact tool you want.

[-]

max123246@reddit

Ugh I always forget that JavaScript's number is a float by default. True insanity

[-]

kaoD@reddit

You made me realize this is why texel centers sit at (0.5,0.5)

[-]

ack_error@reddit

I think that's more about size invariance than rounding. For bilinear interpolation and rasterization it's actually more convenient to have pixel/texel centers at integer coordinates instead of half-integer coordinates. But doing that has the horrible side effect of making the precise bounds for the texture dependent upon the texture size. Direct3D 9 did this with its clip space coordinates and the half pixel offset made a mess of projection matrices -- quite a lot of games got it subtly wrong leading to artifacts.

[-]

kaoD@reddit

But doing that has the horrible side effect of making the precise bounds for the texture dependent upon the texture size.

Not sure if I follow.

[-]

ack_error@reddit

With half-integer pixel/texel centers, the exact full area of a texture is (0,0) - (1,1) when sampled and (-1,-1)-(1,1) in normalized clip coordinates for rendering. This is always the same regardless of the size of the texture.

With integer pixel centers, the exact full area is offset by half a pixel. For Direct3D 9, it's (-1 - 1/w, -1 + 1/h) - (1 - 1/w, 1 + 1/h), because the centers are offset up/left and the clip coordinate system is bottom-up. This is annoying because it makes your projection matrix dependent upon the viewport size and is tricky to get just right. If you forget about this it puts you at exactly between pixels/texels where 2D blits can get all sorts of artifacts from numerical roundoff, fill convention rules, indeterminate nearest neighbor sampling, and max blur on bilinear 1:1 blits.

This also applies to integer texel centers, but I'm not sure if any APIs used that. AFAIK D3D9 was the only mainstream graphics API to use integer centers for rasterization and all APIs used half-integer texel positioning.

[-]

ProfessionalLimp3089@reddit

This is the part AI tools make harder to notice you're missing. When I'm debugging something low-level I've hit before, I have mental hooks for it. When I let a model explain it to me the first time, I get the answer but not the hooks. And then the next time it comes up I still have to ask the model. It's the difference between owning the knowledge and just having access to it. The gap doesn't show up until you're in a production incident at 2am with no internet.

[-]

CodyDuncan1260@reddit

Would love it if you crossposted this to r/GtaphicsProgramming. Very fun read.

[-]

bugrit@reddit

You might have misspelled that

[-]

CodyDuncan1260@reddit

lol, I certainly did.
Oh, typing on a cell phone. You never seem to succeed me.

[-]

AdUnlucky9870@reddit

this is the kind of post i come to r/programming for. everyone's arguing about frameworks and nobody actually knows how the hardware works underneath

[-]

happyscrappy@reddit

To add to what the other poster this also teaches us teaches us not to make integers from floats using truncation (casts). Use rounding. In C/C++ use roundf() always. But I don't know what shader language offers.

From a correctness perspective best to put the +0.5f into the shader instead of the CPU code that writes the values. But I think I can admit to myself I'd put it in the CPU code too because the shader code runs so many times (per pixel).

Interesting bug.

[-]

ack_error@reddit

It's frustrating how much less convenient it is in C and C++ to round instead of truncate when converting float to int, despite rounding often being the more stable choice. lroundf(), for instance, is a library function that sets errno, and depending on floating point optimization and strictness settings can vary from slow to horribly slow despite CPUs often having native conversion instructions for it.

I would recommend lrintf() instead of lroundf() btw, since the former can map to native round-to-nearest-even, with potential benefits both in performance and in being unbiased.

[-]

happyscrappy@reddit

Frustrating that there isn't a C way to do it using C's rudimentary generics support. Because having to change function calls when types change is annoying.

Although you can always make your own non-standard calls.

Would be great if there was a version which doesn't raise inexact signals.

I'll try out lrintf() thanks for the tip.

[-]

StrawberryLiva@reddit

This actually bit me years ago when I was writing a shader that used a flag value interpolated across a triangle — spent an embarrassing amount of time wondering why my conditionals were misfiring before someone pointed out that "same value at all vertices" does not mean "same value at all fragments." Feels obvious in hindsight but it's one of those things nobody tells you upfront.

[-]

TheOneAndOnlyRandom@reddit

Is there a reason you couldn't just disable the interpolation in the shader?

[-]

rogual@reddit (OP)

I'm supporting GLSL 1.2, which as far as I know doesn't let you disable interpolation.

(Perhaps I shouldn't support such old systems anymore, but dropping those is a separate piece of work. I just needed to get the bug fixed.)

[-]

flip314@reddit

Does it support non-perspective interpolation?

[-]

alphadester@reddit

GPU interpolation behavior is one of those things that bites you once and you never forget it. the hardware doing the interpolation per-fragment rather than just at vertices is such a fundamental part of how rasterization works but it's genuinely not obvious until you see it break

[-]

TexZK@reddit

If you're emulating integers with floats, you'd better always round, floor, or ceil the result IMHO. It's up to the compiler optimize, yet retaining the intention.

[-]

rogual@reddit (OP)

This is one of those bugs that taught me something, so I did this writeup. I hope it's interesting. I tried to write it like a murder mystery, showing you the bug first and then dropping clues until the reveal, so maybe if you're into graphics programming you'll go "aha!" at some point and figure it out before you get to the end.

[-]