Still don't get why this is such a major issue, when you could write some kind of static analyser that detects 'incorrectness'. Vulkan for example prioritises speed over error management and handling bad values without exploding, unless you add the validation layer which is recommended for development. Why can't there simply be an optional 'validation layer' for C++? It can be as slow as it likes, and it can be removed from the final production compiled version of the software.
There are tools like UBsan and fuzzers, but there are two really hard parts about this:
1. A lot of UB questions are undecidable. Whether a loop is infinite is the Halting problem in the general case, for example. Involve user input and you can run into Schrodinger's UB.
2. UB optimizations like to assume UB won't happen, so you cannot trivially guard against it. This takes the problem and elevates it above other correctness problems because to assert against problems you have to learn an annoying amount of trivia about your tools.
Reasonable defaults could alleviate most of the pain, but good luck when this is how a lot of the arguments go: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475
> UB optimizations like to assume UB won't happen,
To be clear, a correct C/C++ program is *defined* as never invoking UB.
> so you cannot trivially guard against it.
You can if you don't invoke UB in order to check for it. Comparing a pointer to null before dereferencing it is always safe and won't be compiled out. The problem with your link is that it intentionally overflows an integer in order to check if an integer will overflow. That's obviously not going to work. You have to rewrite the check in such a way that this does not happen. Correct examples are found in the comment in that link.
>This takes the problem and elevates it above other correctness problems because to assert against problems you have to learn an annoying amount of trivia about your tools.
See here. I'm aware of how you're expected to do this, and it's past my line of caring since I didn't agree to have this problem in the first place. I care about my work, not about becoming a rules lawyer. Just give me Java so I can languish in philosophically coherent mediocrity instead.
>Comparing a pointer to null before dereferencing it is always safe and won't be compiled out.
I'm not saying you're wrong, but there is so much Brownian motion around this issue that I'm afraid I don't believe you. I'm sure some compilers, at least, respect null checks as policy.
> I'm aware of how you're expected to check for overflow in GCC, and it's past my line of caring since I didn't agree to have this problem in the first place. I care about my work, not about becoming a rules lawyer.
Well unfortunately part of caring about your work is understanding the rules of the programming language you choose. There are languages that can automatically escalate arithmetic to BigNum libraries for you that may work better for you.
> > Comparing a pointer to null before dereferencing it is always safe and won't be compiled out.
> I'm not saying you're wrong, but there is so much Brownian motion and wackiness around UB that I'm afraid I don't believe you. I'm sure some compilers, at least, respect null checks as policy, which would be a reason to prefer them.
They are right, null checks are perfectly defined.
The issue is that you can't imply to the compiler that the pointer is definitely not null by using it *before* you do the check, which sometimes happens in real source codes.
If you do that, the compiler will see the null check as a duplicated check and may well remove it, just like it would remove a redundant load or store.
Unlike other UB this is at least easy enough to detect in the source code itself.
>Well unfortunately part of caring about your work is understanding the rules of the programming language you choose.
Then part of the programming language's job is to have reasonable rules, and that's not what the standards bodies did with C/C++. This cannot be a one way street if you want me to take your point of view seriously. This is why I work on my own languages. See again: Checking signed integer overflow requiring the memorization of yet more trivia, either through compiler flags or instrinsics. Sorry, but my headspace is filled with much more important things.
Please don't pretend this problem can be solved through education, either, because most people won't get experience working in compilers. Note: I'm not actually asking for a solution here because I don't expect the standards committees to change course, but I would appreciate a more fair attitude against people who use these kinds of tools in the future so these mistakes won't repeat in the future.
I'm not trying to argue just from the standard, you're right that this is a lame direction of argument.
But the C++ language designers are sort of in a bind. There are programmers who want to be able to use a language where a simple addition in the language translate to a simple `INC eax, 2` in the CPU.
That requires either treating overflow as impossible unless told otherwise (the current 'UB' definition) or always checking for overflow to provide a strict definition of what happens afterwards (which will be slower).
They could have made something like `-fwrapv` the default with some kind of "fast addition" as an override instead, but that's not what most programmers wanted back then when CPU cycles were expensive.
I think nowadays there are languages which can come close to best of both worlds, where you get the "safe" arithmetic until you do enough conditional checks on your variables that the compiler can prove "fast" arithmetic is OK. I think it's Wuffs I'm thinking of? But again, that ship has mostly sailed for C and C++.
I'm not going to tell you I never get confused with UB either, but I do honestly believe that most of those rules make intuitive sense as to when they'd be defined behaviors and when they wouldn't.
> or always checking for overflow to provide a strict definition of what happens afterwards (which will be slower).
Only slower in some meaningless way that doesn't matter, because it pretends that it's not actually defined in the real world, when it fucking is.
And to be clear: Having a null check *after* dereferencing a pointer is completely useless, in all cases. If the pointer is not null, the check does nothing. If the pointer is null, you've probably segfaulted before reaching the null check. It's like having air bags that deploy five seconds after a crash. So the compiler is completely correct to remove it.
It would be a reasonable place to have a warning that the code is unreachable, and therefore likely not what the programmer meant to write. I think compilers are capable of doing this.
> Still don't get why this is such a major issue, when you could write some kind of static analyser that detects 'incorrectness'
Because you can’t do that without writing an entirely separate and incompatible langage.
> or create some kind of 'layer' that intercepts everything and checks for correctness.
Because you can’t do that without losing an order of magnitude performance, and then crashing your program in production, at which point why bother with C or C++?
C in particular is a bit shovelling language. Even if were much slower people who need to do bit shovelling would still have a good reason to use it. That is: Some domains just seem to prefer using bytes as their fundamental unit. I know some audio guys who feel this way, for example, although they couldn't afford the performance hit, so whatever.
As I said, it doesn't matter if the layer for validation is an order of magnitude slower, because it would only be necessary during development. It can be removed on the shipped production version of code. This is exactly what Vulkan developers are doing right now today, the Vulkan API on it's own will not offer any validation, if you feed it garbage, it will carry on until it crashes. But you can use a validation layer that intercepts each call and checks for correctness during development, so you only get the sanity checks during development, and you ditch them in the final application for performance reasons.
If we can do that with Vulkan, I see no reason why we couldn't do that with C++ in general.
Which is exactly why every other language has C bindings in some form (cython, linked libs, etc). C is a big ol' "fuck you I do what I want" and it's incredibly good at it's job.
It's incredibly terrible at it. Zig is better
C is lol you used UB fuck you I won't even tell you. You better have the entire standard memorized and know in depth what this particular compiler and version does on their particular platform. Good luck.
It's not just the fact that "C compilers produce fast code", it's also things like OS interactions.
There's a really good answer by alexis king here: [https://langdev.stackexchange.com/questions/3233/why-do-common-rust-packages-depend-on-c-code](https://langdev.stackexchange.com/questions/3233/why-do-common-rust-packages-depend-on-c-code)
This article is about problems caused by undefined behavior. It is not about C giving programmers the freedom to Thelma and Louise themselves. Unintended accelerations are a design flaw. The ability to drive yourself over a cliff, if you choose, is not.
They’ll get with it, or they’ll become unsalvageable. Time will tell.
At the end of the day, performance doesn’t mean shit if the program doesn’t even work right.
I do care very much about performance—I’ve been doing low-level high-performance work for my whole career, a lot of it in C and C++. But I also care a lot about correctness, and I’m *happier* using languages like Haskell and Rust, where I don’t need to waste so much time debugging against a hostile compiler.
Yeah, the discourse has some tendencies in a direction of
* "language X is going to eat C's lunch"
* C/C++ users are still eating well, so they interpret it as more hot air, while
* languages like C#, Java and Go are hardly failed languages.
GC languages aren't suited for everything C/C++ is used for, but we can expect Rust to eat into that segment, and likely more languages in the future.
So while leaving the mainstream takes forever for a language, it's hardly impossible that C's place in the mainstream decades in the future will be more similar to that of Fortran and Assembly: Something abstracted over by higher level languages, and otherwise being consigned to certain niches or legacy, like cobol.
That's... Unlikely to happen unless Windows and Unix-based systems stop being so dominant. Cobol has become obscure because the systems that use it are obscure even though they're still quite common in the wild. It is not obscure just because better languages came around.
The linux kernel has some bits in Rust now, I'd expect that to grow over time, and be possible for other OS-es as well. Even then, kernel programming is what I'd call a niche anyway.
There's also work being done to offer alternatives to some services and libraries, like sudo-rs, rustls, ntpd-rs. Linux too has changed over the years with systemd and wayland; it'll likely continue to evolve over the decades to come.
As for Windows, my impression is more that that's been the land of C# and CLR for a good while now.
Not to mention a lot of people are fine these days with barely a desktop environment and a browser. The backend's always been heterogenous in terms of languages, and much as we may think it sucks, I think we interact more with React frontends and clients than we do C/C++ frontends these days.
So as someone who started this millennium with Windows ME and then moved to linux, I don't know that I'll be using in 2050, but I _do_ expect it to have some significant differences from what I'm using now. And it's not hard at all to imagine that I'm using fewer C/C++ programs than I am today (and I am already using fewer of them than I were a couple of decades ago).
The problem I see is that these systems are still built on C-like assumptions. I can't read about Linux syscalls without the manpages assuming I'm using them through libc, so I have to go read musl's source to infer how they work. I can't find concrete documentation on type sizes because of old portability concerns, so I have to default to assuming 8 bytes because I'm targeting 64 bit systems. I have to regularly convert my strings into null terminated strings to talk to the system. Apple just treats libc as your interface with the OS, so you can never escape malloc in that ecosystem, which puts a hard stop on using virtual memory spaces for anything interesting.
Abstracting over these issues might solve them sometimes, but obscuring the machine quickly leads to problems like that one GTA game taking forever to load its online mode because of strlen nonsense. To actually do what we're talking about you need to ditch as many compatibility shims as possible, or else you damage your ability to have enough control to reason about what it's doing. What was once just a design decision will become the future tech priest's sacred scripture, and C's insistence on vaguely treating every machine like it's a PDP-11, and all the mental gymnastics that followed, is an excellent example of how poorly that can go.
Yeah, I don't exactly see us having systems without libc in the foreseeable future; more that a possible (not certain!) decline of C/C++ as general-purpose user-facing programming languages will take something on the order of decades to forever to be "done".
Leaving the mainstream doesn't mean dead & gone either; perl monks, delphi … oracles? and others will speak up and remind proggit that they're still around from time to time. It just means they're somewhere in the single digits in [surveys like stackoverflow's](https://survey.stackoverflow.co/2023/#most-popular-technologies-language). C's already dropped out of the top 10 for professional developers; I expect it to continue moving in the direction of being something for specialists.
There is something to be said for [leaving pretending to be a PDP-11 behind](https://queue.acm.org/detail.cfm?id=3212479), but actually replacing libc is a bigger task than the scope of my original comment, which is more in the direction of writing C/C++ directly becoming rarer. Like [MS writing Surface UEFI in Rust](https://techcommunity.microsoft.com/t5/surface-it-pro-blog/surface-uefi-evolution-in-boot-security-amp-device-management-to/ba-p/4159998).
If you don't understand C, then the examples of C-like assumptions I gave becomes utterly mystifying to the next generation. That's horrible because it moves us farther from understanding our machines and gives the next generation true cause to hate us for cursing them with esoteric magic they aren't able to understand. I don't want C to die as long as we depend on the world it built.
It won't die like that, but it might exist in a state of undeath. I would kind of assume that smart people are interested in an alternative to libc (that isn't the jvm or even beam), but I'm entirely unaware of any serious contender.
Now that /u/steveklabnik1 has shown up in the thread I'm reminded that Oxide computing has some interesting opinions about directions computers might move in. I think we have to be optimistic enough on informatics' behalf to not believe that its future will be wasted on a field that is forever just building a faster donkey.
> I would kind of assume that smart people are interested in an alternative to libc (that isn't the jvm or even beam), but I'm entirely unaware of any serious contender.
So, the thing is this: operating systems need to provide an API to user programs. Here are three relevant examples of what that API is:
* Linux: the kernel exposes system calls directly.
* Most (all?) non-Linux UNIXes: you are not allowed to call the kernel directly. libc is the user-space API.
* Windows: you are not allowed to call the kernel directly. You can call wrappers through Ntdll.dll. For convenience to C programmers, a libc equivalent, crt, is also provided.
So, on Linux and on Windows, an "alternative to libc" is something that makes sense: heck, Windows already ships one! The tough part is convincing other UNIXes to adopt it. And, that a new libc would have zero programs written for it, you'd need to port *everything* over if your goal is truly a libc-free system. And that's a lot of work.
I see a lot of FUD against C that pushes people away from it. Some deserved, some very silly. Manual memory management is not the bugbear people think, for example, because the often tread arguments are strawmen that reinforce themselves when people see that, yes, if you do manual memory management that way your life will be pain. It is difficult to talk about this in public spaces without people knee-jerking to the strawman, and so the unjustified hatred spreads. It almost seems better to not talk about it so there will be less cause for people to burn the effigy, and then perhaps in a generation when we're all tired there will be another chance.
> As for Windows, my impression is more that that's been the land of C# and CLR for a good while now.
Windows has actually shipped Rust in the kernel. In many ways, it is adopting Rust at the lowest levels faster than Linux has.
> Correctness
Claiming that the correct behavior for overflow is wrapping around is a hell of a take.
In fact, overflow in arithmetic operations is a such a common vector of vulnerabilities, that there's a dedicated `calloc` allocation method to avoid having the user multiply number of elements by size of element themselves, because if the multiplication wraps around, the allocation will succeed, but be significantly smaller than the number of elements would lead one to believe.
Modulo arithmetic has some nice properties -- such as `a - b + c` being equal to its the infinite bitwidth result if `a - b` overflows and `+ c` overflows "back" -- but it's also so full of gotchas that it's not a model one really wants to program against.
I wish instead for panic (exception, abort, etc...) on overflow, unless explicitly opted for, but unfortunately no compiler has good code generation for this, ... so performance-wise it lags behind, so nobody wants it. Even Rust, which uses panic on overflow in Debug by default, uses wrap-around on overflow in Release by default, because LLVM generates so poorly performing code (because its overflow-check code was developed to be a debugging aid).
> C and C++ Prioritize Performance over Correctness
I wish.
I mean, if I have to be juggle buzzing chainsaws, at the very least I'd want to make sure the risk I take is worth something. Like extreme performance. Unfortunately, it's not so.
Despite the myriad paper cuts, there are also plenty of dubious design decisions in both languages.
For example, NUL-terminated strings are poor, performance wise. But every standard library function handling strings in C requires NUL-terminated ones. There's no option to pass the size even if you know it. So not only does it blow-up in your face if the string is not NUL-terminated, but even if it is, it'll just be slower than it could be. Would you like to lose, or to lose?
And C++ isn't better. On paper, it promises that "You Don't Pay For What You Don't Use". In practice, v-tables and RTTI clog binaries even when unused. The standard sets and maps guarantee that elements are stable in memory, forcing them into dog-slow implementations, for a feature barely anybody use, and even those who do, only do so rarely.
You can write high-performance code in both languages -- though I'll note that serious implementations regularly go down to assembly/compiler built-ins -- but you regularly have to fight the languages, or eschew their standard libraries, to do so.
Meh.
Overflow is not a hard take to make. While wrapping does cause problems, and big ones if you're not careful, it is better to just choose SOMETHING that will reliably happen so you can work backwards to understand the problem when you're debugging. Saturation would be fine, too, but leaving it undefined is really, really silly.
Oh, I definitely agree that UB is the worst.
Unspecified Value is pretty nice because it means that any static/dynamic analysis tool identifying an overflow _knows_ it's a bug, and can flag it with 100% certainty. Rust's take there -- Unspecified Value which is either wrapping or panic -- is pretty good, since you still get a very reliable behavior.
Anything _other_ than wrapping, however, reduces optimization opportunities. Remember when I noted with wrapping an overflow could happen and be "reversed"? This means wrapping (modulo arithmetic) preserves the commutativity/associativity of addition & subtraction, ie with wrapping `x + 5 - 3 == x + 2`. If instead you choose other behaviors: panicking, saturation, poisoning, you name it, ... then this is no longer the case at the edges. If `x` is an `int` and `x = INT_MAX - 4`, then `x + 5` overflows when `x + 2` doesn't, and you can't go back from there.
Maybe at the HW level, it'd be possible to implement "overflow counters". `x + 5` overflows (positively) so you note `+1` in the counter, and then on applying `- 3`, you're back to 0, so all is good again. This would preserve commutativity/associativity while still detecting overflowed value. Problem is, multiplication & division won't play ball... so it's not clear it's worth it.
Integer overflow is a surprisingly tricky problem, outside the realm of "big integers".
> But every standard library function handling strings in C requires NUL-terminated ones. There's no option to pass the size even if you know it.
If you're using C++, `std::string_view` and it's helper functions solve this. Last I looked the set of functions is not as complete as those for `std::string`, but you can augment them with your own.
However you always end up hitting a point where you have to pass that `std::string_view` to a C function, and since `std::string_view` is not guaranteed to be null terminated, you have to allocate and copy a new string.
As a former Bell Labs guy these discussions always crack me up.
We've had "safe-ish" languages since the 1960's, see PL-1. We've also had safer hardware architectures, e.g. Harvard vs Von Neumann machines.
Simple, cheap and fast wins in the marketplace vs complex, expensive and slow. And safety isn't even on the radar.
I'll also add that you can write safe and correct C/C++ code if you want. It just isn't mandatory.
I'll even agree it's not a priority given standard libraries and compilers, as well as the standards themselves. And I'll suggest that is a notable omission.
C was created in an era where people specified the behaviors of their programs and used logic to verify their programs on paper before they opened a text editor. People today still do that, just, very few of them. Most people performing programming are actually just doing 1/3 of the job, the coding, and starting from an unspecified program and immediately modifying it. When Leslie Lamport dies his corpse will immediately begin uncontrollably spinning. It remains the best way to create safe programs.
C was created in an era where people had limited access to computers. You had to make your time on the computer count and it was essential that every minute on the computer was well-spent. Not to mention that you had nothing close to modern IDEs and static analysis. You can still do all that, but it's not as beneficial as it used to be.
Of course it's still beneficial, you get correct programs at the other end that actually work, rather than programs that randomly fail because the authors of the software never knew what they were building in the first place, started coding, and then immediately started changing what they coded. Of course people run foul of issues with undefined behavior in their language of choice and with optimizers butchering their code -- half of them never even bothered to think about the types they were incorrectly using in the first place. If people actually figured out what they needed to do and modeled what they thought was correct their programs would be correct much more often. Even in languages where half the point is limiting undefined behavior I see the same problems in programs all over the map. Every single operating system I've seen written in Rust has exposed all of its memory to userspace, or provided access to uninitialized memory, or indeterminate memory, and so forth. TockOS provided access to all memory, Redox provided access to freed memory, so on and so forth, SSDD. People are not taking the time to determine what should be done, are not taking the time to model anything to give them confidence their ideas about what should be done result in a correct program, and immediately start modifying what they ended up with afterwards. You write safe C/C++ programs, or Rust programs, or Java programs, and programs in effectively any language, by knowing what you need to do, verifying what you need to do is correct, and then writing your program. It's like Chefs in a restaurant trying to mix a batter while it's already in the oven, without even knowing what dessert the customer wanted in the first place. Yeah, their customers are going to notice something is very wrong with their order, and it would be beneficial to stop doing those things so that what hits their customer's plate is correct.
You actually still do need to make your time on the computer count -- recently at a job I fixed the same remote code execution vulnerability in a Python SAAS in the front end and their API layer. The coders couldn't be bothered to spend enough time thinking to realize that passing untrusted data to a shell was a terrible idea. People aren't programming anymore, they're just coding themselves straight off the rails and barely thinking along the way.
I never said it isn’t beneficial. I said it’s not as beneficial.
Suppose someone offered you a million dollars to write correct software without doing all the modeling you usually do. Would you send untrusted input to a shell? Of course you wouldn’t. No competent person would ever do that no matter how little modeling they did.
Also, the great thing about being in the modern world is that automated, foundational solutions exist, so you don't have to do the work on paper, your computer can use the semantics of the language and annotations to automatically search for a proof of correctness for you.
> How would you write safe C/C++?
Keep them simple. Real, _real_ simple. And if you're extremely diligent _and_ lucky, _maybe_ it will be free from undefined behaviour. For instance, I haven't checked, but there is a chance that perhaps the following program might be safe:
int main()
{
return 0;
}
It's also the fastest C/C++ program I know of. Speed meets safety! Maybe.
---
Okay, real talk, the answer is still simplicity, but also a crapton of property based tests that we run under every sanitiser we can get our hands on.
> a crapton of property based tests that we run under every sanitiser we can get our hands on.
imagine the joy if we could turn those into compile time errors, massively reducing toolchain complexity and deferring of responsibilities
Too bad it’s literally not possible, courtesy of the halting problem. If you want a memory safe language, you must either have a fat runtime with some form of garbage collection, or a form of static analysis (such as a type system) that necessarily limits the expressive power of the language and does so quite severely*. The latter is the Rust way and it makes sense in many contexts, but because of the necessary conservativeness of the static analysis, it will always require escape hatches (in the form of unsafe parts of the language which are simply not automatically analyzed).
(*) It’s also possible to cheat here a bit and offload the static analysis part to a human instead of an algorithm (see proof assistants, dependent types, etc.), but that’s hardly something you can expect now or in the future from even very experienced programmers.
Whenever someone mentions the halting problem like this you can basically just assume they're wrong, because without fail they are wrong every time in practice. The halting problem has to do with statements about all programs: we rarely care about what's true or not for all programs. We have automated, foundational tools (you get a proof of correctness as output) that exist right now that are free to use, for the C programming language. These folks say things like "need a type system" because they know very little about programming and assume incorrect things like C not having a type system. C both has a type system and even implicitly requires sub types in the model. You don't have to limit your use of the C language in any way for the tool to work, you add annotations to your code that describe the features you want to use, from ownership types, dependent types, to whatever else you want to model and have the tool search for a proof for. These people talk a lot and know little. It's the worst of reddit programming discourse in a single comment, so confidently wrong about everything they assert.
okay, since we are dropping the implied /s, i'd much rather prefer 99% of all code to be provably safe at compile time and then being able to search for unsafety in only a few lines of, as you rightfully pointed out, inevitable unsafe code.
Practically, most application code is even 100% safe unless you're doing hardware or optimisations. I will take that amount of surface reduction any day.
C# is a good compromise. You can still do unsafe things if you must… but you have a lot of control over the code and the default is quite optimal. It also can do runtime profile guided optimization that’s fast in .net 8.0.
Yeah there are still times where it wouldn’t be ideal like VLDBs (it doesn’t like hundreds of GB of managed memory) but even gamedev can be done - you just need to be mindful of the GC or stack alloc/pool everything so you don’t get frame stuttering. And obviously system programming is out.
> I don’t even think most compiler writers (even the major ones) really know what volatile is actually supposed to do according to the standard.
The compiler writers are on the standards committee.
That doesn’t mean they understand what the standard actually means as written. From what I could find in the standard, volatile is almost but not quite meaningless if you go solely off of the standard without any further guarantees from your particular compiler.
Everyone has heard the old adage “it says the value could change behind the scene from a driver or something”, but the standard doesn’t guarantee that all. First of all, if you never actually read the *address* of a volatile variable (like “volatile int my_global;”), you can pretty much optimize it/cache it to your heart’s content. And even if you do read the address somewhere, the standard still probably allows it to be optimized away unless it’s exposed externally through IO.
The only really strong thing the standard says about volatile is that doing something with a volatile location counts as doing IO for the purposes of avoiding UB infinite loops (pure infinite loops are UB).
I’ve definitely seen cases where someone reports a bug with volatile, someone disagrees that it is actually a bug, but then the behavior actually does get fixed years later…
I think the problem is that volatile is expected to do a lot and we really need more explicit mechanisms for this stuff. Most recently, C++ got in trouble for trying to remove things like bitwise compound operators (like &=), but it turns out that such volatile operators have been used in embedded to force the compiler to use a single instruction for the read/write as some hardware cares about that. Now, that’s obviously not required by the standard, but is it implicated if you attach a few more seemingly benign requirements to volatile like allowing values to change behind the scenes? People at least seem to think so.
> Simple, cheap and fast wins in the marketplace vs complex, expensive and slow. And safety isn't even on the radar.
I'm 100% **not** trying to proselytize Rust, here... I just don't know what other example to use. The dichotomy you're presenting isn't as inevitable as you seem to think, anymore. There now *seem* to be options for people that want performant languages that also give us better assurances about correctness than C or C++.
The Rust *compiler* is slow (I mean... so is the C++ one, for that matter), but it's a performant language. Nor is it an "expensive" language. In many ways, it's a dramatically less complicated language than C++... though certainly not every way.
I don't want to keep going, cuz I really don't want to sound like a Rust partisan, but the amount of computing resources available to programmers has absolutely exploded since the Bell Labs days, and some of those resources can be devoted to things like ensuring memory safety or program correctness... or at least warning the programmer they might be invoking undefined behavior
>I'll also add that you can write safe and correct C/C++ code if you want
It would be bad if you couldn't write correct code at all. But in the case of c and c++, wanting is far from enough to write safe and correct code.
>Simple, cheap and fast wins in the marketplace vs complex, expensive and slow.
You can probably always find a way how the "winning" solution is either cheaper, simpler or faster, but that neither means that you could predict which thing wins by that metric, nor which of those aspects is currently most important or how to interpret them. Is it faster to develop or faster to run? Cheaper to build or cheaper to maintain? Will that metric even be stable for that application?
> I'll also add that you can write safe and correct C/C++ code if you want.
But can you really? Without a lot of fuzzers, static analysis and live testing on users?
By banning arbitrary pointer arithmetic and use, paired with some kind of garbage collection scheme or some static analysis like region pointers or something more esoteric, what's your point?
UB used to be a compromise to allow portability. Today portability concerns are much less important, and yet the sacrifices we pay to UB have only grown. Something isn't right here, and it's funny how the end result of these kinds of arguments always leads to making it harder for people to understand their machines. I want to assume incompetence, and yet I have encountered so much gnashing of teeth when I try to talk about the problems, so it feels like malice to me anymore.
I agree with the article in general, however I think your comment is categorically false based on every other conversation / experience about C++ I've ever had (though literally speaking; I guess it's my opinion). By extension I feel as though the argument of "non-portable to optimizable" shifts in the article are false (or rather, disingenuous) dichotomies.
I can write completely non-portable code both with and without massive amounts of UB. Always have; always will be.
With massive UB, I'm expecting the compiler to "know what I mean", and do a decent (and decently performing) thing anyway. With minimal / unexpected UB, I expect the compiler to use the assumptions that the standard lets it make, to codegen good machine code.
In either case; on non-portability-- I can write non-portable code that only GCC understands, but will still do the right thing; and from what is UB and what isn't UB will optimize away correctly. E.g., last statement in a parenthesized block; C99 array designators in C++, things that GCC explicitly says "we know it's officially UB, but screw that."
Similarly speaking; the conclusion that is implied at the end of the section of the over-arching article, aka, "the committee put up 'strong resistance'" indicates that UB, and rather, incorrect decisions about it, are issues of the governance model that WG21 has, which is, (AFAIK, non-algorithmic, and loosely defined) consensus. Which I feel generally plagues WG21 and many decisions about the language.
^(N.b. algorithmic consensus is usually reasonable but there are exceptions. Consensus by way of feeling, is not, because humans are emotional, irrational creatures; by which arguments of consensus result in wars of attrition and will.)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475
This is the kind of nonsense I'm talking about that starts to feel malicious after you see so much of it. When people hide behind the standard to push their own agendas, even if well intentioned, that defeats the trust that a standard is supposed to provide. It weaponizes the standard.
I'm firmly in the camp of writing UB code is fine if you know your toolchain and how it behaves, which is necessary to do some jobs. The C abstract machine is insufficient when you care about the implementation details of a real machine. This, however, makes it very difficult to talk to people or ask questions about your situation because of knee-jerk "that's undefined behavior" reactions. No attempt will be made, usually, because the standard provides an easy way to short-circuit the effort required to think about the problem, hence what I meant about these arguments making it harder for people to understand their machines. It is not an appropriate response to people asking for help, and yet it is lauded. Since I'm in rarified air anyway, I've gone all the way on just making my own languages to deal with part of the problem.
Here's a similar situation that also causes harm: I've gone through some really bad things in my life, and they can put me into doom spirals sometimes. I cannot talk about any of these things with most people I know because I will always be recommended to a therapist and then the conversation will end. For some people, sure, that's a good response, but asking me in particular to trust a stranger with something like this doesn't work. I've been fired by therapists for being too difficult before, and it really hurts when I can't even pay someone to listen to something important I have to say. What I want is a friend, but I never get a friend because a side effect of therapy is that one of the most important functions of friendship got professionalized in my society.
What feels malicious isn't UB or therapy, but the way their proponents have worked to monopolize the conversation so no competing concerns can be voiced.
I agree with the general sentiment you have in this comment, but for your example, I'm utterly confused and can't disagree more.
Do you think that the way that felix-gcc is acting in that thread is appropriate?
Integer overflow being UB is not some agenda push, and there was longstanding precedent for keeping with that status quo. GCC, as shown, even lets you treat it safely via `-fwrapv`... they went out of their way to give you something the standard did not. Further, realistically, the major compilers are likely to support it because of shared history. I can't find it but even MSVC has an equivalent option now, according to this [mozilla bug thread about enabling fwrapv](https://bugzilla.mozilla.org/show_bug.cgi?id=1031653).
I get this from the perspective of "maybe the standard and by extension WG21 should decide to change this from undefined, to {unspecified, implementation-defined, the-new erroneous, or even defined}, behavior," but pushing for agendas is unlikely to be at issue here. There are _real_ optimizations that would be missed. Ones that _real_ people rely on.
The problem in this particular case isn't that "it's undefined," or "it's not," or "there are contradictory use cases," but rather "the standard only defines the abstract machine. The best they can do is _suggest_ that implementers have an escape hatch. They can't even suggest the name, the format, or even that it's a compiler flag, because it assumes that your compiler has flags!
This is another issue of the WG21 governance model. The ISO standard is a theoretical specification of an abstract machine and compiler with at best suggestions for implementers. There's no reference implementation, like there is for almost every other standardized language (yes, not necessarily _ISO_ standardized, but still).
The issue is Andrew Pinski wanting to prioritize speed improvements over real world security and correctness concerns. A lot of the software they're talking about was written by people who learned C before the C89 spec was published and the prevailing interpretation was hammered out in the 90s. The C89 rationale talks about how much care went into trying to preserve the traditional spirit of how C was written in the 70s and 80s, which is why undefined behavior had such a clunky definition. It was meant to allow users, compiler vendors, and platform vendors to compromise on situations that you can't reason about a priori, but instead it turned into a political fight with such creative strawmen as nasal demons that were used to pretend the standard existed in a vacuum. Compiler vendors are allowed to handle UB however they want, especially after C99, but instead of helping frustrated programmers whose expectations were being broken without warning the vendors chose to pretend the standard had the power to justify their agendas. In another world those compiler vendors might have chosen correctness over speed, and you would see Felix driven to exasperation because perfectly fine code became too slow to be worth using.
Compiler flags are nice, but if people don't realize when or why they are necessary, then they won't be used. Trivia is not a decent solution for engineering problems. In this case in particular we're talking about a large pile of legacy code that suddenly needed to march to the drumbeat of people like Andrew Pinski who did not care that they were trampling on a stable ecosystem with their hubris.
Say what you want about Felix's conduct, but he was not the one who found himself with the power to reshape the world.
I'm going to write this a bit out of order, because it matters...
I think there's a misunderstanding of where UB-based optimizations occur. Generally speaking, they occur at the IR level, based on patterns that come up. There is no distinction to the optimizer what form of UB it is, or even if it is UB at all or rather some UB and some defined behavior mixing together that provides for a pattern that is otherwise "impossible" so thus elided.
The use of "nasal demons" is not a strawman for a political fight, but rather a colorful analogy telling new users why such should be avoided. Telling people "we decided to optimize on performance, but optimization happens at a level below the AST so we can't pick and choose which things to throw away and which to not, so if you do these otherwise valid syntactic constructs, user-beware." I got tired even typing that out.
> Compiler flags are nice, but if people don't realize when or why they are necessary, then they won't be used. Trivia is not a decent solution for engineering problems.
Treating them as trivia is nonsensical. There's flag after flag added over the years particularly to enable security features, that the standard could not care less one way or the other if they were implemented. Not only that, but at the time, "performance at the cost of security" was still the majority opinion. Are you to tell me that you'd rather have those not exist?
> In this case in particular we're talking about a large pile of legacy code that suddenly needed to march to the drumbeat of people like Andrew Pinski who did not care that they were trampling on a stable ecosystem with their hubris.
1. Standard says "this thing is undefined, user beware"
2. Majority consensus is "performance over everything else"
3. Vendor implementers implement pattern detection and optimization that
4. It hits, among other things, what is stated in #1
5. There's no reasonable way to simply change one specific default without affecting other optimizations and #2 is still in effect
6. Guy complains on the bug tracker with horrendous demeanor about something he doesn't fully understand and he's in the opposite group to that of the majority opinion
> The issue is Andrew Pinski wanting to prioritize speed improvements over real world security and correctness concerns.
I don't see how you can blame Pinski in this way. If you had voting in that thread, way back when, Felix would be downvoted to hell. _Now_, opinions have changed to some extent, though still not black and white. But you can't judge Pinski of then by today's standards. It is not Pinski's agenda, it is the known majority view of that time. Hell, there's known UB optimizations that occur, that have been proven to be 100% meaningless because the only time such IR exists is during that triggered UB, and there are other preferable options (security based and otherwise).
You're complaining about compiler implementers when you should be asking "why has no one sponsored a compiler that follows the standard, where all undefined behavior now has reasonable and secure defaults?"
The answer, is because the users, in majority, say, "we'd rather be faster." The C++ staff of the past 3 organizations I've been employed at, which have several hundred dedicated C++ engineering staff in total, will wholeheartedly agree-- "faster matters more than security." Sure, that's not _everyone_... but not _one_ of them would say the inverse is telling.
> why has no one sponsored a compiler that follows the standard, where all undefined behavior now has reasonable and secure defaults?
That's the important question. Who pays the actual compiler implementers? Can we UB-haters just out-pay them? Despite having much debate on this topic, I've never seen a concrete answer on the financial ground.
I'm willing to pay maybe $1k a year to get rid of the UBs so how many people like me can make a sufficiently-sized team change mind? How many are needed to challenge the standard committee?
With all the verbal bashing on UB it seems an anti-UB startup could get enough funding to be viable. Or it could just be a small C99 compiler funded on Patreon. It could be the internet's bias though.
Look into TCC. It's a small enough compiler that it likely does not make many bizarre assumptions about how it can rearrange code. I do not know if it has anything like register allocation, however.
Looked into it before. It skips IR entirely so anything that looks beyond the "cursor" will be un-ergonomic, which unfortunately includes register allocation.
The same goes for the currently-popular separated AST-IR design. Compiler-side people like u/13steinj keep complaining about it and there is real merit in that argument. There has been too big a gap between AST and IR and I'd probably jump into their boat if I had to emit a sensible-looking warning about `if(x+100<x)` from inside LLVM.
I've been exploring non-AST-IR architectures for years now. Guess I'll continue on that direction.
Man I wouldn't call myself "compiler side," I just understand that there's more than one side to this debate and demonizing any of them doesn't help anyone.
At a very real level, it's a matter of money and performance. Generally speaking people don't use tcc because it's mostly a proof of concept. It's so tiny (hah) that it realistically performs few if any optimizations that people not only want, but rely on.
Put it this way-- there was a long period of time where opinion was you have to write ASM by hand for decent performance. Over the years compilers got better, at the cost of _now_ pissing some people off optimization-wise. You ever work on FPGAs? High level synthesis is still at the stage where many say it's not worth bothering, but I'm sure it eventually will. No one will care about "security" there, and it will be the same cycle all over.
I'm just tired of the rust- and "ub bad full stop"- jerk. Put up or shut up (not you two individually, collectively). Convince the US government to actually _use_ Rust rather than bullshit with some press document. Crowdsource a fork of llvm / Clang (such as Sean Baxter's Circle which among other things intends memory safety).
cpp2/cppfront-- bullshit. Won't ever come to pass. Same for Carbon. The "safety!" community is a vocal minority, despite you all not realizing it.
> Who pays the actual compiler implementers?
The names are public; they are generally paid by the companies that employ them.
> Can we UB-haters just out-pay them?... I'm willing to pay maybe $1k a year to get rid of the UBs so how many people like me can make a sufficiently-sized team change mind?
Possibly, but I doubt that (most of) such people are willing to _actually_ cough up the cash.
> How many are needed to challenge the standard committee?
This is a very, very different question. IIRC they are on the order of 200 people now? Some do, some don't, like (various or all) UB. So, 200 people in the worst case... but there's much more work for members to do than what it takes for you and other UB haters to (pardon the expression) bitch and moan on the internet.
> It could be the internet's bias though.
Absolutely is.
You are still missing the historical context of what these things mean. You cannot just look at the prevailing interpretation of how things work today to understand an argument that is more than 30 years old now. C existed long, long before it was standardized, and people came into the 90s with well established expectations about how it should work. I'm not saying there weren't arguments over style or how things should work, which was one of the motivations for C89, but it was an established culture. C89 was not meant to bodge established practices, but instead to accomodate them. The interpretation of "you cannot invoke UB, ever" is a very modern idea that was the result of a decade of political arguments over what C89's description of UB actually meant. "A well formed C program" does not mean "a legal C program". It means you do not need more than the C abstract machine to execute it, and nothing more. This is an important point because while portability was C's killer feature, not everyone needed or wanted it. There was no problem with targeting a specific machine, and compiler flags and extensions are an example of this still being an established use case for the language today. This kind of thing is tradition in C.
You mention reasonable defaults, and that is exactly my problem with compiler flags: An annoying number of things are opt-out, and it feels similar to Microsoft and Google playing with dark patterns to stop people from opting out of data collection. Until you're bitten by aliasing rules, are you going to understand why the Linux kernel had to strong-arm GCC into adding the `-fno-strict-aliasing` flag? Odds are, like you say, most people will defer to thinking faster is better, especially if they accept the prevailing interpretations of these things, but that's exactly the point I've been trying to get across to you:
Felix and the peers he talked about were all part of that historical group that came in expecting their traditions to be respected by C89's leniency, and now today I cannot seem to get you to understand why the failure to compromise on the importance of the abstract machine feels like such a betrayal to people who come from those traditions. Andrew Pinski, in this case, refused to back down on the damage his work was causing to find a safer way to roll it out, and that was in spite of how questionable the optimizations actually were. Just a safe roll-out was all that was required, and instead Felix felt he had to defend the ticket from Andrew Pinski's judgment to have a chance at getting it resolved decently. THAT is the problem I want you to understand. I'm not trying to change your personal feelings about how UB affects your work or anything like that. I just want you to understand that people do have legitimate cause to push against views like yours being the only correct one. Problem domains, hardware requirements, personal styles, they all affect how this plays out, and C's legacy as a portable language means these things need more respect than they usually get from the Standarati.
As a last remark, I care a lot about performance, too, but my approach does not depend on the compiler being sufficiently smart. I avoid stdlibs like the plague because the single biggest performance sink in my work is one-size-fits-all solutions. I solve the problem by writing the simplest code I can that only addresses what I know I need. This keeps everything lean and makes it trivial to rewrite things as many times as I need to get the performance I want. I follow the philosophy that it is easy to outsmart the compiler when you know more about the situation than the compiler does, and so I do not actually value optimizing compilers very highly beyond the basic things, like register allocation and instruction combining. It is probably easier for me to say "i don't care about doing the wrong thing faster" than it is for other people, which is why my point is to get you to understand why my side of the argument exists, not to convert you to my side.
In that thread, there is no evidence to any real optimizations that real people rely on. In the contrary, it has been repeatedly demonstrated that the optimization has negligible, even negative, impact. No one has bothered to justify this optimization *to the users* before implementing it, silently, behind everyone's backs.
Having an option to disable it is not enough. In any other line of job there is a need to get consent of real users before introducing such an incompatible change. The GCC maintainers only get away with *their behavior* due to C/C++ compiler writers being a small community and having a de facto monopoly.
> there is no evidence to any real optimizations that real people rely on. In the contrary, it has been repeatedly demonstrated that the optimization has negligible, even negative, impact.
For the sake of argument, let's say that this is true (which I can guarantee you I have seen private cases where it is not),
> No one has bothered to justify this optimization to the users before implementing it, silently, behind everyone's backs.
What the hell does this mean? You want the compiler vendor to say "fuck you, WG21, we're going to pick and choose what to follow in your standard?"
Undefined behavior _optimizations_ don't occur at the AST level, they occur at the IR level. It is impossible to tell what the original input code was. You can't complain that GCC is _following the damned standard_ and marking it as undefined behavior. Other, non-integer-overflow undefined behavior can end up producing similar or even identical IR.
> Having an option to disable it is not enough. In any other line of job there is a need to get consent of real users before introducing such an incompatible change.
What the hell do you mean "incompatible change!?" You're not even making any sense. Coming into line with the standard is not an incompatible change. introducing an optimization on some IR patterns; that happen to fit those that are produced by explicitly said undefined behavior, is not an incompatible change. The only possible way you could claim it's an incompatible change is if the standard _changed_, and even then, you can't blame _GCC_. Maybe you can blame the committee and call all of them stupid, but not the vendor just doing what they need to to keep in compliance.
> The GCC maintainers only get away with their behavior due to C/C++ compiler writers being a small community and having a de facto monopoly.
This is _actually_ categorically false. GCC does not have a monopoly. GCC is part of a duopoly on most Unix platforms, and MSVC generally has the monopoly on Windows. They "get away with it" because it's what is set out to them by the standard, which is, for better or worse, a _legal_ document that _government bodies rely on_ as much as it is a programming language specification.
If a compiler decides not to follow the standard, it would not be used (legal, ISO-type standard or not).
Nonsense.
The standard only stated that signed integer overflow is UB and can be utilized. Both sides of this particular arguments are following it to the word.
Optimizing away `if(x+100<x)` is an incompatible change at common sense level. It is a very noticeably different behavior. The standard allowing it doesn't make it *compatible*, it's at most a weak excuse when sales apologizes to the customer.
When the entire industry can push the same agenda ignoring resistance, it is a monopoly, whether or not they appear competing. Especially when they can influence government. Unpaid volunteers or not, this is not much better than a steel or oil trust.
> incompatible change at common sense level.
This is the _true_ nonsense. We're engineers. Do not appeal to authority, do not appeal
> When the entire industry can push the same agenda ignoring resistance, it is a monopoly, whether or not they appear competing.
You don't know what a monopoly is...
Being engineer isn't an excuse to elude common sense, pal. You screw people up, they screw you back. That's why responsible engineers avoid incompatible change.
UB has nothing to do with portability. UB is UB regardless of the platform. Portability is handled by implementation defined behavior, which is a separate category.
I am not a fan of your philosophy about these things. You seem to care about the standard more than you care about why these ideas were important back in the day, which requires understanding multiple perspectives. Academics cared about Formal Language theory, yes, and they cared about optimizations, but vendors and users had their own competing priorities that cannot be dismissed if you want to be part of this argument in good faith. C's killer feature was that if you were aware of the language's bespoke limitations, then you could treat it as a portable assembly language during the Cambrian explosion of ISAs. Vendors wanted to have their own C compilers because that was a selling point for their hardware: You could upgrade without rewriting everything. Dennis Ritchie recognized this was one of C's biggest strengths in the market, which is why the C89 rationale pays so much respect to the spirit of how C was used in those days. To only recognize the academic point of view is a betrayal of the agreements that allowed C to build the world we enjoy today, and no one who works for a living really gives that much of a damn about ivory towers.
Please follow John Regehr's advice and spend time with real engineers so you can learn from them.
> UB has nothing to do with portability. UB is UB regardless of the platform.
The main reason some behaviour is left undefined is to allow different platforms to make different choices (e.g. with regards to performance). The two are definitely linked.
No, the main reason to leave behavior undefined is that providing a defined behavior, even a platform defined behavior, would incur a runtime cost. For example of array out of bounds errors must result in a runtime error, then array accesses must be checked, which incurs a runtime cost. Likewise for dereferencing a null pointer, every pointer would have to be checked against null before dereferencing at runtime, which would be expensive. These behaviors are left undefined so that the compiler can always output optimized code based on the *assumption* that the programmer is correct.
What you are describing is once again implementation defined behavior. It is completely different from undefined behavior.
There is no RAII in C which makes it way easier to have leaks because of inattention. There are also no std containers, so you must reinvent all the wheels you use, which can do errors. For average use, you control the memory more closely than in C++, which makes C code the bane of security.
The main problem with UB is that some UB tends to work correctly for quite a while, then retroactively fail after a compiler update. People write UB without realizing it then suddenly they wake up and the code fails.
It's like the compiler had a shady deal with the programmer to overlook some minor bad behavior. Then suddenly the compiler goes back on the deal and starts accusing the programmer for something they did decades ago. Technically the programmer broke the rules but it still feels like a dick move for the compiler.
Compilers could generate a mandatory warning if they did any optimization that depends on UB, the message of which should give an option to disable that optimization.
> Compilers could generate a mandatory warning if they did any optimization that depends on UB, the message of which should give an option to disable that optimization.
The problem here is that UB is simply the result of an optimization pass expecting certain invariants to be upheld, while the user's code violates those invariants. If compilers did as you suggest, then even this simple, completely safe, and UB-free Rust function would result in four separate warnings:
pub fn foo(a: &mut i32, b: &mut i32) -> i32 {
*a = 5;
*b = 6;
*a
}
It would be undefined behaviour for both of those references to be null, so each dereference would emit a warning that the optimizer is assuming they're not. Additionally, it would be undefined behaviour for those references to alias, so there would be a warning when the optimizer replaces the second dereference of `a` with the constant `5`, because that's only valid if they don't alias.
What should really be done, is for the front-end of the compiler to check as much of the user's code as it can, and enforce that it doesn't violate the invariants required by the optimizer.
That's an invalid example with many logic holes:
- Rust actually ensures a and b are not null and cannot alias. It's *defined behavior*, DB.
- NULL dereference does not have to be UB. It's just LLVM assuming it. One can just as easily define it as an actual memory access to address 0. That's how it worked before LLVM.
- The same goes for aliasing. It's yet another LLVM artifact. Just don't let the compiler touch memory accesses and everything will be fully defined. My suggestion only involves a warning when they actually move `*b` to before `*a` or optimize `*a` to 5.
> Rust actually ensures a and b are not null and cannot alias. It's defined behavior, DB.
The `rustc` front-end does, but in LLVM-IR it's just a pointer with some flags set stating which invariants it's allowed to assume. Clang also
> NULL dereference does not have to be UB. It's just LLVM assuming it. One can just as easily define it as an actual memory access to address 0. That's how it worked before LLVM.
True, but LLVM will be optimizing accesses around these pointers under its assumption that accessing address 0 will never happen.
> The same goes for aliasing. It's yet another LLVM artifact. Just don't let the compiler touch memory accesses and everything will be fully defined. My suggestion only involves a warning when they actually move *b to before *a or optimize *a to 5.
But that's exactly what LLVM produces:
foo:
mov dword ptr [rdi], 5
mov dword ptr [rsi], 6
mov eax, 5
ret
That last `*a` is replaced with a constant `5`, which is only valid when the the pointers don't alias. By your own admission, this perfectly safe, UB-free code should emit a warning. Almost every single reference in Rust is marked `noalias`, so you would end up getting warnings all over the place.
I guess I missed your points in my first reply. Are you talking on the practical side that if UB-related warnings were added to LLVM, it will do collateral damage to Rust where it's less of an issue? I agree with that. I'm just making wild suggestions to a hypothetical, idealized compiler.
For your assembly example, yes I do *want* a warning that `*a` became `5` when LLVM generates it compiling my .cpp file dating back to 2015, or 2005. I won't want it to warn on Rust.
> Compilers could generate a mandatory warning if they did any optimization that depends on UB, the message of which should give an option to disable that optimization.
The problem here is that every single optimization depends on UB. Optimizations, by definition, change the runtime of your code which in turn could influence data races as one or more participants of the race are now faster. The solution is to simply assume that data races can never happen.
Out of bounds reads are another UB area which underpins tons of optimizations. Without assuming that you can say goodbye to inlining, dead code elimination, alias analysis, and much more. Basically everything that affects memory (stack or heap) in any way.
So in the end you would probably get a warning for every single optimization making the whole thing pointless.
> The problem here is that every single optimization depends on UB
No, because:
1. Many optimisations don't depend on UB (so not "every single", just "some").
2. The correct wording is not "depends on UB", it's "depends on the assumption that UB is never invoked by the programmer".
> So in the end you would probably get a warning for every single optimization making the whole thing pointless.
Once again, no. The answer is a little more nuanced than that.
The problem is that WG<whatever> is stuffed with compiler authors. Compiler authors care more about performance than correctness. Since they dominate (in numbers) the Standards working group WG<whatever>, every time a proposal comes up to make something Implementation Defined they lobby as a single large voting bloc to make it Undefined Behaviour instead.
This results in modern compilers having a worse expected-behaviour model to previous (and previous versions) of compilers, which results in **silent** unrecoverable errors.
I've used different C compilers since the 90s, (Borland, Watcom, Intel, gcc then egcs then gcc again, clang, and a ton of different ones for various embedded devices such as motorola 68xxx chips and z80).
All the big wins in optimisation (C) were, frankly, already gained well before 2010. Since then we have had **tiny** gains, but at the cost of expected behaviour.
Take a look at this example from the article:
#include <stdio.h>
int f(int x) {
if(x+100 < x)
printf("overflow\n");
return x+100;
}
Previous optimising compilers (including previous gcc versions) would emit the instructions for that code as-is, which results in the correct behaviour on both 1s complement and 2s complement machines. However, modern compilers don't emit the check at all, which is funny, because *there is no scenario in which eliminating that check results in expected behaviour, but many scenarios in which emitting that check results in expected behaviour!*
Let's look at a more real-world example, that led to an upgrade in GCC for the Linux kernel building with an exploit:
struct mystruct_t *mystruct = NULL;
...
mystruct->field = "some value";
...
if (mystruct == NULL) {
printf ("oops\n");
}
There are three possible modern "optimisations" of the above code:
1. Remove the dereferencing of `mystruct`.
2. Remove the null-check.
3. Remove both.
What GCC did was #2 - removing the NULL-check. *This is broken behaviour*, which was not done in the previous versions of GCC. The performance difference between keeping the NULL-check and removing it is so small it's not even measurable!
Here's the problem - the compiler authors are bordering on malicious, for no measurable performance gains.
Look at that snippet again and ask yourself, is it not better to see `oops` printed after the fact than not at all? After all, that code getting called in some unit-test is going to cause the test to fail if the `oops` is printed, but with the optimisation the test is instead passed because the compiler eliminated the NULL-check.
When you write a piece of code like above, it doesn't matter that you did the NULL-check after the UB, you still want the code for that to be emitted, not eliminated.
This is just **one** example of an optimisation that has no real impact on performance but has a large impact on correctness. The compiler authors are taking the view that, if anything in the execution path is UB, then the entire execution path can be discarded, including code that may try to detect UB after the fact.
It's actually worse in C++, because time-traveling UB is allowed in C++, and it is **NOT** allowed in C.
If you're programming in C, the impact of UB is altogether a lot lower than in C++. It also doesn't help that C++ has many times more categories of UB than C.
>Look at that snippet again and ask yourself, is it not better to see `oops` printed after the fact than not at all?
NO!
The only scenario in which "oops" is printed is if mystruct is null. If mystruct is null, your code is broken because you dereferenced null EARLIER. Honest question: what do you want the program to do when you dereference the null mystruct? I want it to crash. I certainly don't want it to attempt to assign "some value" to some arbitrary point in memory and then continue to execute.
But again, in either case, it's bad code. It's broken code. You wrote broken code, and then got mad that the optimizer optimized it a particular way. There's no correct way to optimize this broken code. Don't write to NULL. If you dereference a pointer, it must not be NULL. There's no good reason to check after the fact, the error has already occurred, the program has already failed, you can't dereference NULL. If you dereference a pointer, you are telling the compiler "hey, at this point, it's safe to assume this pointer is not NULL".
The optimizer *assumes* that the programmer isn't a moron. The optimizer assumes that you knew what you were doing when you wrote that shitty code. You dereferenced a pointer, you, the human, MUST KNOW that the pointer isn't NULL. It's trying to work with you. If it seems it's malicious, it's because you wrote broken code. Stop writing broken code.
> If you dereference a pointer, you are telling the compiler "hey, at this point, it's safe to assume this pointer is not NULL".
Right, and if this was in a multithreaded context, then some other thread can set the value to non-NULL before the dereference, and a different thread can set it back to NULL before the NULL check.
After all, the compiler is supposed to trust the programmer, right?
That would be a data race, which is also undefined behavior. Even Rust won't let you write code like that, it will force you to use some synchronization mechanism. And if you use those synchronization mechanisms in C++, then your null check won't get optimized out because the compiler know that it is legal for the pointer value to change.
> I want it to crash. I certainly don't want it to attempt to assign "some value" to some arbitrary point in memory and then continue to execute.
That's the point - it *doesn't crash* **AND** it doesn't print out oops.
This is a real example from the Linux kernel that resulted in a vulnerability because the code neither crashed nor performed the null-check.
If the over-eager optimiser had not removed the null-check, there would be no vulnerability because testing would have revealed the error.
But the CODE ITSELF WAS WRONG. You cannot correctly optimize incorrect code. There is nothing you can do that would lead to "intended" behavior in this situation, optimized or not, because the "intended" behavior is nonsensical.
You are absolutely correct that it would be better for error *detection* if the "over-eager" optimizer hadn't removed such a check, but it wouldn't have stopped the *error* because the *error* *already occurred* because the CODE IS WRONG. Why didn't they disable optimizations while testing their wrongly written code? I'm not blaming the optimizer for doing its job.
I, frankly, don't care what happens to wrong code. Programmer issue, not optimizer issue. The optimizer did the correct thing given the information.
> But the CODE ITSELF WAS WRONG.
So? Using uninitialised variables "IS WRONG" too, yet the compiler issues a warning.
> Why didn't they disable optimizations while testing their wrongly written code?
It was an opt-out in a new version of gcc. They didn't do anything but upgrade the compiler to get a security vulnerability.
> > "Daddy, go faster"
>>
>> "No daddy, that's too fast!"
That's not what happened. I think you are perhaps not familiar with the circumstances, and the resulting changes that were made to the build to prevent similar incorrectness being propagated by GCC.
>Using uninitialised variables "IS WRONG" too, yet the compiler issues a warning.
Yes, because that's the best thing to do in that situation. Because, contrary to your feelings, compiler's aren't trying to act maliciously when they do decide to "remove" UB, they are trying to act smart. Smarter than the typical programmer. I **regularly** ask my compiler to do improper type casting/punning, technically UB, and it happily obliges without warning nor removing any of my code. It's not trying to hurt me, it's trying to help me.
You seem shocked by UB. You seem surprised by it. It seems to be an issue for you. I'm trying to show you a perspective by which it becomes obvious when and how UB is treated so that you can be less shocked by it, and produce fewer errors as a programmer. And that perspective is the one that the committee itself takes: When UB is encountered, the optimizer does not assume your code is wrong and try to correct it. It assumes your code is right, and adjusts according to that premise.
The compiler's job is to warn. The optimizer's job is to optimize. Yes, sure, gcc did a naughty because it should have warned/errored on such an obvious mistake. I don't use gcc and I don't consider what gcc does to be indicative of C++ or UB as a whole.
>
>That's not what happened.
That's EXACTLY what happened. Optimizations are, literally, asking daddy to go faster. Deleting necessary null checks was Daddy going too fast. The "resulting changes" were to disable that particular optimization.
To you! You wrote these words:
>if anything in the execution path is UB, then the entire execution path can be discarded
You wrote that in the context of the exact optimization we are discussing now. Except, in the situation we are discussing, literally nothing is discarded as a result of being UB. Not the dereference itself, not the code before it, not the code after it, and not the null check.
What is actually happening is that the compiler sees the dereference and says "the only possible way for this to happen is for this pointer not to be null". There is no consideration of the form "this is null and thus UB". None.
In that exact same context you also wrote:
>Here's the problem - the compiler authors are bordering on malicious, for no measurable performance gains.
and
>This is just **one** example of an optimisation that has no real impact on performance but has a large impact on correctness
But it's not malicious, and your description of the optimization is just flat out incorrect. This optimization has literally no bearing on correctness because the code was NOT CORRECT to begin with. The optimizer cannot fix wrong code!
>In fact, until recently (about 15 years ago) I had your exact perspective.
I'm not sure you even understand my perspective honestly, so, \[X\].
> Except, in the situation we are discussing, literally nothing is discarded as a result of being UB. Not the dereference itself, not the code before it, not the code after it,
Just to be clear, you are claiming that the compiler is *emitting the NULL-check*?
No, I am not. The NULL-check is not emitted. I am claiming that the *reason* it is not emitted is not a consequence of "undefined behavior in the execution path".
That's what I'm trying to get at. When you describe the handling of UB incorrectly, it sounds insane and incorrect. When you say
>
In the context of the two examples you present, you make them sound insane and incorrect. And they would be, if this quote here had anything to do with it. But it doesn't. In neither of the two examples does this sentence apply. The weird thing to me is that at the very start of that comment you were absolutely correct. You said:
>\\2. The correct wording is not "depends on UB", it's "depends on the assumption that UB is never invoked by the programmer".
And this is exactly the case. Accessing mystruct's field is UB if mystruct is NULL, and since the program is assumed to be correct, we must assume that mystruct is not NULL the moment we dereference it.
I don't want to put words in your mouth, but your quote and descriptions give me the impression that you think the compiler sees the NULL dereference, calls it UB, and then goes "ah, I can do whatever I want now" and proceeds to discard ... whatever it wants (up to and including the entire execution path). But that's not what it sees. It sees the *possibility* of UB *were* mystruct to be NULL, and comes to the perfectly valid conclusion that mystruct mustn't be NULL. And *since* mystruct is not NULL, the check for it is redundant. The check is omitted *not* because of "undefined behavior," but because of perfectly defined behavior: it is *just* as logically unnecessary as the second check in "if ((p!=NULL)&&(p!=NULL))".
Now, should the compiler have emitted a better warning in this specific case (where the NULL is provable)? Uhh, yes. I think mine does. But *every* time it removes a NULL check though? IDK, maybe, I don't think so, and I've explained why elsewhere: correct functions have checks, and a smart optimizer should attempt to remove them when it makes sense to do so. Should the compiler have an option to throw an error instead? Yeah, I'm down for that. Should compiler vendors have better documentation about how they handle the various forms of UB? Oh my god, yes, I'm 100000% in agreement with you there, including the idea that much UB should be instead moved to Implementation defined.
But allllll those shoulds come prior to the optimizer. The optimization being discussed is logically sound, and does not turn correct code into an incorrect program, and your insinuation that it does is wrong.
>In neither of the two examples does this sentence apply.
To put my money where my mouth is, I'll explain the first example too. Here's the example:
int f(int x) {
if(x+100 < x)
printf("overflow\n");
return x+100;
}
Your quote *makes it sound* like the print statement is discarded because it lies in the path of UB in the overflow. But that's just not true. The overflow is UB, the compiler decides to read this *as if it were a mathematical expression, not a system-and-type-dependent-one* and concludes that x+100 is, of course, never less than x. So this is a false statement.
if(false){print} is not undefined behavior, and can be removed for obvious reasons.
>The compiler authors are taking the view that, if anything in the execution path is UB, then the entire execution path can be discarded, including code that may try to detect UB after the fact.
This is not what is happening in your example. It just isn't, period. There is no UB. The entire point of the optimizer is that assumes that UB *does not occur.* The optimizer is not looking at your code and saying "this is wrong, so we can just do whatever we want with it" and then deleting shit at random.
It's looking at at and saying "hmm, this must be right. If it's right, what can we assume? Oh, well, we use this pointer here, so we can assume it's not null." There's a fucking HUGE difference between these two perspectives. The optimizer is not *deleting* undefined behavior. It's deleting *redundancies* based on assumptions that certain kinds of UB are not being invoked. These assumptions are okay to make because the alternative is that the code is ***OTHERWISE wrong****.* The assumption is that the code is right.
> The optimizer is not deleting undefined behavior.
Did you perhaps intend to post this to some other comment? I didn't say that.
Most of what you posted as well does not make sense in the context of my post.
I mean, it's not verbatim, but I think it's a fair paraphrasing of:
> if anything in the execution path is UB, then the entire execution path can be discarded
I am sorry if I have misinterpreted this somehow.
> if anything in the execution path is UB, then the entire execution path can be discarded
>
> I am sorry if I have misinterpreted this somehow.
That's actually what the C++ standards document says, which is why I said it: any UB in an execution path renders the entire execution path as UB, *including everything prior to the UB!*.
C is a little bit better in this regard, as UB in an execution path only makes further nodes on that path UB[1].
[1] From the standards document: "Note 3 to entry: Any other behavior during execution of a program is only affected as a direct consequence of the concrete behavior that occurs when encountering the erroneous or non-portable program construct or data. In particular, all observable behavior (5.1.2.4) appears as specified in this document when it happens before an operation with undefined behavior in the execution of the program."
Good comment, and I agree that current compilers have made UB a completely different thing than it used to be.
However, I think it's a bit unfair to look at a single example where it goes wrong to judge how smart or stupid the compiler is. Optimization consists of a large number small transformations being applied to the program, and you can easily have that these all both make sense in isolation and also on average save significant amounts of time, but still end up looking obtuse or downright evil in some examples. For example, removing the check of null above won't save much time, but the same rule might remove an if test in the innermost loop in a different program, and then it would be worth it.
> For example, removing the check of null above won't save much time, but the same rule might remove an if test in the innermost loop in a different program, and then it would be worth it.
Well, yes, that's the theory. Maybe it works out that way.
In *my* reality, any time a null-check is eliminated, that's an error that should be reported. The programmer made an error that the compiler detected, and at the point of detection the compiler still holds a reference to the source line, so the compiler *should report it!*
Would this result in a large number of warnings in existing programs? Sure, but at least the program still runs with the same behaviour. The intent of the warning is to let the programmer know that they did something that has no effect, which is almost certainly not what they intended.
Programmers don't go around writing lines with no effect just for shits and giggles - if they were warned about lines with no effect, they'd go back and check their assumptions.
The problem still goes back to the standards committee, in which the compiler vendor representatives always vote for *more* UB. Right now, you can get rid of almost all UB by changing the wording to read Implementation Defined instead.
This doesn't mean that the compiler vendor has to change their compiler, but it *does* mean that they have to document what they do. Instead of responding with a catchall "You wrote UB, so we can do anything" *after* the fact, they have document it on a case-by-case basis *before* the fact, with "This is how a dereference to a NULL-pointer is handled".
There was a time, way back when, when the C compilers behaved in an expected and consistent manner.
When C89 was being drafted, the intention behind Undefined Behaviour was "We don't know what all the existing compilers do for this case, so we leave it as undefined".
When C1x/C2x was in draft, the reasoning for Undefined Behaviour changed to *"Whatever the compiler does, it's within spec"*.
Also note that the original draft for the first C standard listed behaviour that is considered UB. For the most parts, C89 and C99 vendors used that list as an exhaustive set of options to take with UB.
Now, the compiler vendors spend time in the committee meetings arguing that, not only is that list not exhaustive, it's also not representative of what UB is.
>The intent of the warning is to let the programmer know that they did something that has no effect, which is almost certainly not what they intended.
Says who?
If I have some small helper function, and I want it to be correct, I write the function with a check.
But then I want to use that function later, say in a tight loop, I want the compiler to hoist that check out, if possible.
I don't want to write two versions of this function, I want the compiler optimizer to do its job and optimize it. Without warning me.
It's kinda the same problem as autocorrect vs. spellcheck. I'd be much more interested in "UB does not happen" techniques if they were suggestions and not so silent. It would be neat, actually, to verify optimizations and then watch a cascade of new optimizations be suggested.
And what makes it worse is that talking about these problems often leads to arguments like this https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475 where the standard is used as a shield for an agenda with no real world considerations. If nasal demons were the agenda, then the standard could be used to argue for punishing people with nasal demons.
But isn’t it possible to go in the opposite direction?
Doesn’t rust avoid UB by giving programmers the ability to prove to the compiler that the optimisations can be done safely rather than having to assume they are leading to UB?
Though yeah I get the obsession with safety in rust with probably does lead it leaving performance on the table that C or C++ can get through UB.
Rust is generally as fast as C and C++, sometimes slower, sometimes faster, but always in the same ballpark.
"does UB help optimization" is a tricky question. Sometimes, it can! There is a thread on /r/rust right now where some Rust is slower than similar C++, due to more robust integer overflow checking. However, there are other ways to write that code that makes the difference disappear, that do not require the Rust to use unsafe.
Furthermore, Rust's safety sometimes makes writing more optimized code more tractable in the real world, due to the compiler helping you get the details right. Famously, Mozilla tried adding multithreading to a tricky part of Firefox, and failed to do so twice in C++ before succeeding with it in Rust.
Also, many optimizations do not rely on UB to happen. Inlining, for example, is called "the mother of all optimizations," because it often unlocks many other optimizations. And it doesn't rely on UB to work. Well, in most languages anyway: you can fall afoul of UB with inlining in C and C++, now that I think about it. Sigh.
> The solution is to simply assume that data races can never happen.
That’s the lazy solution. The actual solution is a better langage which surfaces that property statistically.
> Out of bounds reads are another UB area which underpins tons of optimizations. Without assuming that you can say goodbye to inlining, dead code elimination, alias analysis, and much more.
All of that is nonsense, even fucking Go can do inlining and DCE. Bounds checks *can* prevent other optimisations, and the compiler is not always able to elide them, but IME that’s a rare case rather than a common one.
A somewhat sane middle ground might be to see those kinds of checks as constraints. "I am assuming this value might be erroneous at runtime, so the compiler should respect that." If a compiler can detect when people write their own memcpy implementations, then it seems trivial to detect this kind of thing. I dunno. Haven't thought about it.
> For instance in C/C++ dereferencing a null pointer is UB, so when the compiler sees a dereference it can infer and propagate a “pointer is non-null” constraint, thus removing other null checks.
But isn't this the complaint? That the compiler is assuming that errors cannot happen, instead of warning?
In this particular example (compiler infers that ptr cannot be NULL due to dereference), shouldn't every NULL check that is eliminated result in a warning, regardless of who/what/how that "pointer is not-null constraint" got created?
After all, anytime the compiler tells you "this line is being removed because it is has no effect", the programmer goes and checks their assumptions, *because* they assumed it had an effect.
The compiler does this for `if (myUint < 0)`, it does this for unused variables, etc. What's so special about NULL-checks that causes silent elimination?
> That’s the lazy solution. The actual solution is a better langage which surfaces that property statically.
This comment doesn't make any sense to me. Imagine the following exchange:
Alice: *"My brother has a mental health problem that affects our relationship."*
Bob: *"Maybe convince him to see a therapist."*
/u/masklinn: *"That's a lazy solution. The actual solution is to get a new brother."*
I mean, it's not even a non-sequitor, in the same way that an answer is *"Not Even Wrong"*.
First, the important optimizations don't depend on that much UB. Common sub-expressions and loop strength weakening are fine on variables that never got their address taken. Register allocation and instruction selection are also fine if one doesn't reorder memory access.
Also, it's still useful to start with 100 optimization warnings on every single C file we write. It's not that hard to disable warnings on UBs we're aware of when writing the code. And the compiler feels less bad after one reads and consents to its terms. And any newly-added term will become obvious after updates.
Wait, how do out-of-bounds reads affect inlining? The only thing that comes to mind is that most of inlining's value comes from register allocation, which does make assumptions about when and how values can be accessed.
They're talking about compilers reordering memory access. It can be useful quite often but it does depend on race conditions being UB.
The consequence is when race condition were intentional (e.g. let the camera capture race the renderer), one has to use dedicated atomic intrinsics. Not as bad as the NULL one or the int overflow one.
Data races are kind of a weird place to argue about this. It's kinda in the "well duh you can't tell me what will happen" class of UB, which is distinct from something like integer overflow where someone often can tell you exactly what would happen. I think a lot of the arguments about UB come from the term being too broad and containing too many different flavors.
This is just false. There are some optimizations that do rely on UB, but there are also just as many, if not more which do not require any UB at all.
With this line of thinking, one would arrive at the conclusion that rust, which focuses on correctness and avoiding UB, would be one of the slowest languages ever. Yet, it's as fast as c and c++ where UB is around every corner.
I really wish programmers would move away from this idea that there's a roving "UB" optimization pass and that compiler developers go out of their way to add more "UB" passes with each new version just to break your code.
There was never a deal to overlook your shady behavior as a coder. If you used UB and it happened to work, it was a happy accident. When it doesn't work in the next version, it's still a happy accident. Both times the compiler writers are assuming you're not using UB.
Once you violate that assumption, no one knows where or how it will then pop up later in the interplay of hundreds of passes that transform and optimize your source code, least of all the compiler. Running dead code elimination 3 times instead of 2 times, or even just reordering the same optimization passes, may be all that's needed for UB to turn into an actual problem in the binary.
> Compilers could generate a mandatory warning if they did any optimization that depends on UB, the message of which should give an option to disable that optimization.
Are you ready to see thousands of lines of warnings even at `-O0`? Because that's what you're talking about. A great many important optimizations work only because they can assume UB does not occur in the code.
By the time the optimizers are running, it may be difficult or impossible to validate that UB wasn't actually present in the actual source code, so they'd have to show you the warning even if your code was actually just fine.
I wrote my own SSA compiler in the 2008s before LLVM were a thing. Yes, compilers can be finicky. And no, if you didn't make outrageous assumptions like NULL will be never accessed in the first place, it would never get to the point where a 3rd DCE passes could break something. The fact that `-fno-delete-null-pointer-checks` works is a solid evidence.
On an excessively optimizing compiler, I WOULD expect to see thousands lines of warnings at -O0. That would demonstrate just how overzealous it is and push people to a better alternative. It's sad, considering how every BLAS author still has to write assembly after thousands of correctness-breaking tradeoffs. It's sadder, considering how little the AI money went to the underlying BLAS, and how little of BLAS money went to motivate compiler writers into better support them.
> By the time the optimizers are running, it may be difficult or impossible to validate that UB wasn't actually present in the actual source code, so they'd have to show you the warning even if your code was actually just fine.
I didn't ask to find UB. That's undecidable. I just want a warning if any pass that depends on UB-not-existing did anything. It's doable in LLVM today.
OK but the issue is that a general solution is precisely what is required for what people are asking for (that the compiler should need to be able to tell if a loop halts rather than simply assuming it does).
Sure, there are specific cases the compiler can prove a loop does or does not halt, but proving this in general is undecidable and better left to the programmer.
The halting problems only applies to general code. It's possible to prove termination for subsets of the language (that's for example what languages with nondivergence effects do) or prove termination manually (what some proof assistants require you to do if their algorithms can't automatically prove it)
The issue isn't strictly that it will elide infinite loops, but rather that it will make optimizations based on the assumption that an infinite loop won't happen. If you write a loop without side effects it's potentially legal for the compiler to assume the condition is false and propagate that assumption elsewhere, causing a situation the defies reasonable debugging attempts. I'm not an expert on this particular example of UB, but this algebraic way of thinking about code is what people like me actually rail against. It is not disciplined enough.
> It is not disciplined enough.
Disagree, it is very disciplined. The issue is that actual programmers are not theorem-provers, and thus do not want to code against abstract machine using increasingly more powerful ad-hoc provers as compilers.
I don't see why you think we disagree. Should the tool not meet the nature of its users halfway?
Either way, look into optimization errors with pointer provenance. It is surprisingly difficult to axiomatize optimizations well enough to know that optimizations that are provably correct in isolation are correct when applied together. These kinds of issues with "autocorrect"-style optimizations are why I'm doing my own independent research into manual optimization techniques, and another conversation in this thread has me interested in "spellcheck"-style optimizations that would let you see a live refactoring of your code. I'm also interested in SAT-based hyper-optimization techniques that use a model of the target machine to verify the bitwise equivalents of data transformations.
The halting problem doesn't say that proving termination is impossible in some cases; in fact the wiki page mentions simple cases that can be proven. I'm not a Computer Scientist in the official academic sense but I'm not sure I buy this argument or the others ITT dismissing the relevance of the Halting Problem here, namely that deciding whether a given loop is inifinite is indeed "very difficult" because "no general algorithm exists that solves the halting problem for all possible program–input pairs". That is not exclusive of an algorithm that solves for *some* program-input pairs, which is exactly what compilers do.
I have the impression that r/programming thinks the Halting problem impossibility proof says more than it actually does. From the page you linked:
> A key part of the formal statement of the problem is a mathematical definition of a computer and program, usually via a Turing machine. The proof then shows, for any program f that might determine whether programs halt, that a "pathological" program g exists for which f makes an incorrect determination. Specifically, g is the program that, when called with some input, passes its own source and its input to f and does the opposite of what f predicts g will do. The behavior of f on g shows undecidability as it means no program f will solve the halting problem in every possible case.
As you can see, this involves a self-referential situation where the program that's being analysed is itself allowed to run the analyser program. It's in the same spirit as analysing the truth value of statements like "this statement is false". Such self-referential situations are surely very rare in real programs, so the proof that a general solution to the Halting problem is impossible has no impact on the feasibility of automated halting analysis on any program you would encounter in practice.
That doesn't mean writing the analyser wouldn't be very difficult, but that difficulty wouldn't have anything to do with this impossibility proof.
A lot of people don't have much concept of the difference between general cases and specific cases. Here the difference isn't so important because the problem from the programmer's perspective is that figuring out if your assumptions involve an infinite loop is potentially non-trivial, which means the compiler's assumption that an infinite loop(without side effects) won't happen might unexpectedly conflict with how you're thinking about what you're doing. We are talking about the general case because we're talking about a tool that's trying to make generalized assumptions.
You don't have to solve the general problem for a compiler though.
It's trivial to deduce `for(;;);` as an infinite loop *as a special case* and it's only a little bit un-ergonomic for a compiler to detect and avoid dropping that (need to do AST matching and propagate that to IR).
Back then an infinite loop was the only way to guarantee a visible behavior change when debugging a crashing CUDA kernel. It was hugely annoying when compiler drops that randomly.
The issue isn't strictly that it will elide infinite loops, but rather that it will make optimizations based on the assumption that an infinite loop won't happen. If you write a loop without side effects it's potentially legal for the compiler to assume the condition is false and propagate that assumption elsewhere, causing a situation the defies reasonable debugging attempts. I'm not an expert on this particular example of UB, but this algebraic way of thinking about code is what people like me actually rail against. It is not disciplined enough.
Great projects have been created with C and C++, nothing is perfect and history proves it. It should not surprise developers that we have more than 25 years in the programming field.
> Or, if they were worried about not rejecting old programs, they could insert a zero initialization
Automatic zero-initialization does not fix non-initialization bugs. Zero is not automatically a safe value. In fact I have even seen times where incorrect zero-initialization masked errors, making them harder to detect and debug. The only safe handling is to fail either at compile time or at runtime, however this is not always possible or desirable.
145 Comments
grady_vuckovic@reddit
VeryDefinedBehavior@reddit
Kered13@reddit
VeryDefinedBehavior@reddit
mpyne@reddit
VeryDefinedBehavior@reddit
mpyne@reddit
ehaliewicz@reddit
Kered13@reddit
masklinn@reddit
VeryDefinedBehavior@reddit
PancAshAsh@reddit
grady_vuckovic@reddit
Farlo1@reddit
crusoe@reddit
Heavy-Cranberry-3572@reddit
ImYoric@reddit
VeryDefinedBehavior@reddit
evincarofautumn@reddit
syklemil@reddit
VeryDefinedBehavior@reddit
syklemil@reddit
VeryDefinedBehavior@reddit
syklemil@reddit
VeryDefinedBehavior@reddit
syklemil@reddit
steveklabnik1@reddit
VeryDefinedBehavior@reddit
steveklabnik1@reddit
matthieum@reddit
VeryDefinedBehavior@reddit
matthieum@reddit
Kered13@reddit
K3wp@reddit
elrata_@reddit
dontyougetsoupedyet@reddit
Jump-Zero@reddit
dontyougetsoupedyet@reddit
Jump-Zero@reddit
dontyougetsoupedyet@reddit
loup-vaillant@reddit
naequs@reddit
DevilSauron@reddit
dontyougetsoupedyet@reddit
naequs@reddit
gwicksted@reddit
DevilSauron@reddit
gwicksted@reddit
duneroadrunner@reddit
CallMeAnanda@reddit
VeryDefinedBehavior@reddit
lord_braleigh@reddit
slaymaker1907@reddit
lelanthran@reddit (OP)
slaymaker1907@reddit
ironykarl@reddit
bladub@reddit
PancAshAsh@reddit
wintrmt3@reddit
lelanthran@reddit (OP)
wintrmt3@reddit
moreVCAs@reddit
wintrmt3@reddit
VeryDefinedBehavior@reddit
VeryDefinedBehavior@reddit
13steinj@reddit
VeryDefinedBehavior@reddit
13steinj@reddit
VeryDefinedBehavior@reddit
13steinj@reddit
BibianaAudris@reddit
VeryDefinedBehavior@reddit
BibianaAudris@reddit
13steinj@reddit
13steinj@reddit
VeryDefinedBehavior@reddit
BibianaAudris@reddit
13steinj@reddit
BibianaAudris@reddit
13steinj@reddit
BibianaAudris@reddit
Kered13@reddit
VeryDefinedBehavior@reddit
Additional_Sir4400@reddit
Kered13@reddit
XNormal@reddit
Flobletombus@reddit
BibianaAudris@reddit
MEaster@reddit
BibianaAudris@reddit
MEaster@reddit
BibianaAudris@reddit
sidit77@reddit
lelanthran@reddit (OP)
Infinite-Mix-8794@reddit
Godd2@reddit
Kered13@reddit
lelanthran@reddit (OP)
sqrtsqr@reddit
lelanthran@reddit (OP)
sqrtsqr@reddit
lelanthran@reddit (OP)
sqrtsqr@reddit
lelanthran@reddit (OP)
sqrtsqr@reddit
sqrtsqr@reddit
sqrtsqr@reddit
lelanthran@reddit (OP)
sqrtsqr@reddit
lelanthran@reddit (OP)
amaurea@reddit
lelanthran@reddit (OP)
sqrtsqr@reddit
VeryDefinedBehavior@reddit
VeryDefinedBehavior@reddit
serendipitousPi@reddit
steveklabnik1@reddit
IPromiseImNormall@reddit
masklinn@reddit
VeryDefinedBehavior@reddit
lelanthran@reddit (OP)
BibianaAudris@reddit
VeryDefinedBehavior@reddit
VeryDefinedBehavior@reddit
BibianaAudris@reddit
VeryDefinedBehavior@reddit
Dminik@reddit
mpyne@reddit
BibianaAudris@reddit
mr_jim_lahey@reddit
eveningcandles@reddit
mpyne@reddit
SV-97@reddit
VeryDefinedBehavior@reddit
Dragdu@reddit
VeryDefinedBehavior@reddit
mr_jim_lahey@reddit
amaurea@reddit
VeryDefinedBehavior@reddit
BibianaAudris@reddit
VeryDefinedBehavior@reddit
CollectiveCloudPe@reddit
CollectiveCloudPe@reddit
skulgnome@reddit
Kered13@reddit