It’s very good for bit manipulation and exact packing.
What I do think is that ”it can’t be zero” benefits are less good than we think.
For example we might think that we sanitized input by saying a size is unsigned: only valid values now right? Except most likely anything above say 2\^48 or so is probably wrong. Not to mention easy to accidentally overflow if we do calculations.
If we has signed, we *start thinking* about limits, because there is such an obvious lower limit. But that also helps us start thinking about the *upper* limits, which are just as important to think about for correctness. So often that ”can’t be zero” comes back to haunt you.
This is especially true for a language like C3 with contracts, where it’s trivial to add both boundaries to the contract if you’re adding one. This both documents (contracts are lifted into docs) the valid range and enforces it.
If you write a ring buffer, how do you make sure that calculating offsets are wrapping?
The naive solution is this: index = (start + offset) % length;
This works as long as offset is positive.
It would work even with a negative offset if the % operator was modulo of floored division. I wonder if the C3 developers considered that option (truncated division is almost never what you want when the divisor is negative).
Or, hear me out, offset is unsigned, start is unsigned and length is unsigned, and you ensure they have sizes suitable for their purpose to avoid boundary effects, and you flag operations involving mixed types with your linter as errors. The solution is consistency, understanding what's happening under the hood, and letting tooling flag issues for you.
It's literally always a potential source of vulns to mix data types and sizes. Signed is not gonna save you - it has weird boundary behaviors too and would just result in different bugs in different circumstances. I mean if start and offset are a smaller integer (in terms of bit width) than length you're still gonna have a bad time.
Why not? If you expect a number to never be negative, it's best to ensure it's impossible for it to be negative in the type system. Invalid states should ideally be unrepresentable
I don't get what that has to do with my comment? How does a language implicitly converting between signed and unsigned numbers when you use both in the same operation instead of requiring you to explicitly convert the numbers to the same type beforehand prevent you from unintentionally providing a representation for invalid states‽‽‽
How does a language implicitly converting between signed and unsigned numbers when you use both in the same operation
Because oftentimes languages will let you omit the type of a variable when defining it, and if the inferred variable switches types because you changed one of the types of the variables used in the calculation, it can change the validity of the resulting code
The article is about language design. It mentioned mixed-type integer operations being a problem, giving the following C example:
uint a = 0;
int b = -1;
if (a > b) { … }
The comment I originally replied to suggested linting against that. I suggested that instead the language shouldn't have implemented it in the first place.
I still do not understand how this would give rise to a representation for illegal states.
I'll be honest, I got a couple of different conversations mixed up in my last reply and misremembered the context
To be clear, your original comment was asking why a language should have unsigned and signed at all, correct? That is the point of the article, after all
Assuming this is an accurate: what do you do about smaller integer types? u8? u16? Do you just make those signed integers and pay the cost of the larger one even if all you need is up to the number 240?
To be clear, your original comment was asking why a language should have unsigned and signed at all, correct?
Ah, sorry for the confusion. No. My comment was replying to the comment it was replying to and quoting. It is obvious that a language targeting low-level control wants unsigned types; even in Java (which only has u16 in this regard) this is annoying.
If we're allowed to remove the ability for the code to use negative offsets when the premise assumes it's a necessary feature, then my fix would be for nobody to program anything, thus minimizing any and all bugs entirely.
The problem is that operations such as "unsigned - 1" gives a signed result. So it's very, very easy to introduce signed into an unsigned world. Conversely, no basic operation on signed ints creates an unsigned result. Thus it's better to always stay signed.
Back when ANSI C was first being made, one of the improvements was to allow external functions whose name was more than (IIRC) 6 characters on all standards-compliant compilers.
Which, weirdly, was a big step forward. At the time there existed old architectures where the linker only resolved the first 6 characters.
But ... technically, the standard at the time just required that at least one function name longer than 6 characters be allowed. If a program had two functions with longer names and the compiler would be allowed to reject it.
If you're talking about some Microsoft stuff, then no, it was Microsoft trying to optimize Windows to run on anything. Remember, it was literally 1984, you couldn't just waste memory on function names willy-nilly.
There are functions in Windows like CopyRect, which is literally just a memcpy with a fixed size, that exist only because they make the calling code smaller.
And all the standard library functions from that era indeed have six-letter names at most. It’s not a coincidence that strlen, memcpy, printf, etc all have six letters.
So I just checked and they fixed the precedence of the bitwise operators, nice. Which means they don't care about expression-level compatibility with C or C++ anyway, unlike some languages whose names start with J.
The ideal solution would be to have a type like Index<123> which will give you a 123-element type with wrapping add, making your code index += offset. Where offset should probably be a different type (Offset<123>?), also allowing for negatives. I'm sure the mathematicians have a fancy name for that kind of construct.
Yeah but rings aren't necessarily finite, also we do have sensible multiplication and addition on those things so it's at least a field. I remembered that Galois fields exist but checking wikipedia they have to have p^k elements, where p is prime and k an integer so they don't match this one.
Fwiw, invertible addition and associative multiplication makes a ring. If the multiplication is invertible (except for 0), then it's a field. Basically ring is +-, field is +-/
Yeah that makes sense as multiplication is just repeated addition and inverses are hard.
...and I guess I just assumed that integer division would be mathematically sensible in the presence of wrapping, that's probably not the case and the reason why finite fields must have p^k members.
Yes. Usually it would be called a residue class and if its size is a prime, multiplication actually does have a well-defined multiplicative inverse for every element.
A finite ring then, or a residue class ring (that’s what it’s called in German).
Multiplication and (invertible) addition are what makes a ring. A field also needs a definition of division. To be precise, a ring consists of a set and two operations. It is an abelian group with regards to the addition operation (i.e. every element is invertible) and a semigroup with regards to the multiplication operation (i.e. there is not necessarily an identity element and elements don’t have to be invertible). A commutative unital ring (i.e. a ring where multiplication is commutative and has an identity element) is a field, if and only if every element except the null element has a multiplicative inverse.
I don't think the thing they're describing is very well modeled by a ring.
If we ignore the "type of offset is different than type of index" thing then I think it's just a cyclic group, but with that then I'm not sure. Maybe something something actions?
Yeah I think actions are the way to go, in particular finite Torsors / Principal Homogeneous Spaces (over a finite group) seem like a good fit. We have some index set with (invertible) "offset" operations such that any index can be reached by a suitable offset from any other one, and for each element there is exactly one offset that doesn't do anything.
How are negative offsets getting presented to these methods? You are in control of how data gets into the system so sanitise it to make sure its not nonsense.
Regarding the comparison. I say it's about time we dropped the promotion to unsigned and fixed the intrinsic comparison operators for distinct types. Like this:
bool operator < (signed int a, unsigned int b) noexcept {
if (a < 0)
return true;
if (b > INT_MAX)
return true;
return unsigned (a) < b;
}
But ... that's a lot of extra code that almost always just does the last line. One big advantage of the weird, bad way is that it's a single fast opcode.
IMHO, languages should be designed so that even when compiled in DEBUG mode, they are still fast. We shouldn't use the (expensive) optimizer as a crutch to work around language proglems.
I say this as a compiler engineer working in high-performance computing.
It doesn’t matter how fast the code is if it’s wrong.
If bounds checking is what it costs to make the code work right, then that’s just what it costs. The way to avoid the cost isn’t to drop bounds checking, but to give the programmer more expressive tools for showing the compiler when bounds checking is unnecessary, such as flow typing, or more precise integer range types, or replacing indices altogether, with iterators that are correct by construction.
If bounds checking is what it costs to make the code work right
But it can't. The only thing bounds checking can do is to detect what something is wrong for accessing out of bounds and crash/throw exception. It can't magically fix the code which calculated wrong index.
Yes, that’s fair. In “work right” I am implicitly including properties like memory safety. If you have a bad index, with bounds checking at least you won’t go scribbling over random memory, but in principle this should never be necessary for internally generated indices.
You can do better by pushing all checking to the system boundaries. You use a foreign key type instead of an arbitrary primitive index, like an integer in an array or a string in a hash. You can parse an integer into an array key by bounds-checking it, or serialise an array key to an integer for free, but they are distinct types.
Now any operation on a container like array.find(value) or hash.keys() that would return indices returns keys into that specific container, like Optional<Key<array>> or Set<Key<hash>> respectively. This is a limited case of dependent typing. An operation such as hash.insert(new_key, new_value) or array.insert_after(old_key, value) that modifies the container also implicitly provides the caller with the ability to cast between old and new indices, automatically avoiding rechecking when possible. In this case, old keys of hash are a subset of new keys, so casting is free, while old array keys get a range check and offset as appropriate.
This was exactly what C3 had built in. It was removed now, because there is no need to support it after picking signed by default. Trying to compare signed and unsigned is just an error (unless the unsigned can be safely widened into the signed type)
Let me take your poster child example and turn it into C++....
for (uint x = 10; x >= 0; x--) // Infinite loop!
{ ... }
https://godbolt.org/z/4YjEr1rnq
<source>: In function 'void silly()':
<source>:3:27: warning: comparison of unsigned expression in '>= 0' is always true [-Wtype-limits]
3 | for (unsigned x = 10; x >= 0; x--);
| ~~^~~~
Compiler returned: 0
The obvious solution is of course to support a for (uint x = 10..0) or foreach x in range(10, 0), like most other modern languages do.
If you are going to make a new programming language, why not learn from all the mistakes and improvements of the last 50ish years? C for-loops are the way they are because they are trivial to compile to machine code. We aren't running our compilers on an 8086 any more, having it do some extra work to unburden the programmer is fine!
Or you can just do for (uint x = 10; x > 0; ) { x--; }. Nicely reflects the symmetry between forward and backward iteration (decrement before use vs. increment after use).
Swift abandoned C-style for loops altogether. You iterate over a sequence (whether that’s an array or range), and instead of the buggy for (uint x = 10; x >= 0; x--), you just use for x in 10...0.
Happy to see this – signed by default makes everything so much easier. Unsigned numbers typically only create problems.
Of course the average programmer will suggesting overcomplicating your language and adding 900 different extra syntax features instead of just using signed sized
If you want simple, just use big ints. Those can’t ever over/underflow. If you cannot, you can use fixed-size integers and join the compiler in believing they don’t exist (because it would be UB otherwise). Alternatively, if you want to use signed over unsigned to reduce footguns, why not do due diligence and consider that some arithmetic operations may be erroneous and add basic validation?
I did not tell you to implement bigint, but rather to just use it, because unlike fixed-sized int, bigints do not over/underflow so they are less erroneous, and therefore more simple to use - they can even be implemented using tagged pointers to reduce heap allocation. But of course most don’t use them because of unnecessary performance costs.
Consistent unsigned types for offset, start, and length make boundary checks cleaner. Letting linters enforce this feels like a solid way to reduce off-by-one chaos without overcomplicating things.
TBH, there is very little downside to using signed pointers (other than the pain of switching, which would make them are all the same type again.
This would actually be more correct, as many CPU architectures already specify that memory addresses are signed, with rules on how 32-bit pointers get sign-extended to 64 bits, and often placing rules that bit 47 (or whatever the highest bit is) must be the same as bit 63.
On most operating systems, kernel pointers always have the upper bit set (aka, they are negative) and it's always cleared for userspace pointers (aka positive).
The bits are not “set for compatibility with signed”
They’re set for intentionally getting upper range of address space. It is incorrect to think of kernel space pointers as “signed”. They’re called “negative memory space” just as a nomenclature. It is literally nothing to do with signed compatibility.
Address space losing the upper range being set for user space also has literally nothing to do with signed/unsigned compatibility.
Kernel developers use explicit casts when they are targeting signed or unsigned arithmetic.
It is stupid to point at kernel pointers as some “gotcha” of signed v unsigned.
Sorry, I should have been more clear.
I'm not bringing it up as a "gotcha", to say that pointers should have been signed all along, or to say that kernel pointers are signed (or negative).
I'm bringing it up because the one thing that could theoretically break with singed pointers is a buffer that crosses the boundary between 0x7fff0000 and 0x80000000, as signed overflow is undefined (in c/c++).
But this historic convention conveniently allows us to side-step that problem, as no memory allocations will ever cross that boundary.
Doing unsigned comparisons wrong is programmer error / PEBKAC.
Unsigned should be the default for everything not vice versa.
Indices should ALWAYS be unsigned.
Index bounds checks on unsigned can/should wrap, and requires only one branch check not two. etc
If this is somehow a problem on your codebase, use a static analyzer. Stop (for the love of god) writing raw indexed for loops. (if you need to internalize (while i —> 0 etc instead)
and ideally fire / ban any half trained programmer who doesn’t understand unsigned correctly, from introducing bugs and screwing up your code
Static analysis warnings, because subtracting two unsigned integers like that is inherently dangerous and can/will fail without clamping, making sure to take the min/max, etc
Note that this will still be a bug WITH SIGNED if you aren’t abs-ing that result and were expecting that area() should always be positive.
This is also a bit of a silly example b/c geometric area operations on coordinates is often (albeit not always) operating in real space and/or on coord systems that should be signed.
Again I did not say that you should never use signed, I said you should use unsigned by default and very carefully consider when/where you need to change that.
Area() if anything is a good example of where having the concept of signed-ness in your type systems IS a good idea.
It is generally NOT at all a good idea to write an area function that can return negative integers or floating numbers.
You should handle that correctly in the implementation.
Better richer type systems + constraints are always better than not.
If you could do this stating that eg area is a generic function that takes (R, R) -> UR, where R is any real non infinite non NaN value, and UR is an unsigned / not negative subset of the same, would be a good extension of / to an ideal type system. Ditto having the concept of normalized / non normalized values and so on and so forth.
There are to be clear hazards involved with unsigned integers.
The extra bit of precision (and loss thereof in conversion to signed) is a major hazard if you aren’t aware of and spec around this. eg further restricted integer value ranges, which mind you are easier to test/assert on unsigned than signed integers.
Subtraction operations period are another massive hazard and need to be clearly thought about and restricted particularly when it comes to indices and ranges etc.
All of these unsigned errors ARE ALSO SIGNED ERRORS, and will just have the “fun” advantage of introducing potential runtime failures everywhere with bounds checks etc, courtesy of allowing types (integer types are just intent annotations) that can be negative, when they shouldn’t be negative, that can/will blow up somewhere.
Using unsigned to annotate “this type should never be negative”, running bounds checks, and aggressively contract programming (ie asserts) everything, with restricted integer range limits, is the correct way to do all of this.
And again this is one of several safety hazards / footguns that should be used to keep uncareful programmers out of systems programming.
Yes sure unsigned integers break - somewhat - very well understood features of mathematical numver systems and common operations.
So does having finite integer (nvm floating point) types, in general
And unsigned integer behavior IS a well understood clearly specified hardware behavior (on any sane compiler - note thwt this obviously does not blanket apply to c/c++ compilers). With convenient prooerties. That works well IF you understand and use it correctly.
TDLR; yes better and better compilers type systems + static analysis is the correct fix here.
No removing unsigned from cases where this makes sense does not.
I am going to quite frankly fully, flatly disagree with stroustroup, and heck even sutter etc here.
If you care about this (unsigned AND footguns) just use rust. Which adds all kinds of performance degrading runtime checks in at least debug builds (IIRC) to help prevent programmers from hurting themselves. And yes does use u64 everywhere for indices. Which is correct.
This is (of course a range error), but why? It is not because v is subscripted by the negative integer -2. The subscript to vector::operator[] is an unsigned value so that’s not possible. Instead, -2 is the valid subscript 4294967294 which just happens to be too large for that vector. It’s a run-time error (subscript too large). Compilers should warn, but since there is no type error, not every compiler does.
Then fix your compiler. I swear
integer promotion may kill a lot of people, but it also helps a lot of people meet deadlines, so, it;s impossible to say if its bad or not,
I do want to mention an alternative: panicking on overflow.
It has... consequences.
It makes a lot of "trivial" compiler optimizations a lot harder to prove correct: overflowing operations are not commutative and associative on the whole range of their inputs, unlike wrapping ones, so the optimizer need to prove that the input ranges allow for it.
It's incompatible with SIMD, in general, which typically implements wrapping semantics.
So all in all, a whole lot of performance dings left & right.
And of course, it's still all "run-time" detection. So the code passes the test-suite and only fails in production. Erf.
But... honestly, all the alternatives to this problem kinda suck, and panicking may just be the least worse :/
There are two approaches to casts: one is to liberally sprinkle them all over the codebase with the idea that “it’s an explicit conversion, so it’s obvious what happens”. The other is to minimize casts, only using them to signal that something out of the ordinary is happening: “here be dragons”.
I think this ignores the third option, which is that you use a non-truncating cast (e.g. Rust's .into()) which is only implemented for the types where the signed representation is capable of representing all values in the unsigned representation.
I prefer to lock down the code in such a way that an as cast where information is lost is not allowed.
So i32 as u32 is not allowed. Use cast_unsigned, and all of the sudden we no longer have a risk that when someone replaces the u32 by a u16 that the code remains working.
There are 2 places where I allow as casts: in const, when you upcast both to a common size (because try_into is not const) to ensure you don't truncate, and the other is to take a pointer to an fn.
All the rest is try_into() and then you handle it explicitly. Even unwrap is preferred over silent eating of bits.
Explicitness is a choice, and your reward for that is a better system.
What I would want is for there to be a conversion for usize to u64 when ptr size is 64 and vice versa.
A type system can solve this problem. To paraphrase some arricles: don't validate that the operand or result will not error, ensure it can't error through the type system.
Also, I would consider it an invalid state should signed/unsigned logic result in ub or wrong values. That should be prevented beforehand.
Yeah, but if I have a u64 and I need to index then I use try_into() and either handle it if I care about failure or expect() if I don't. Either way, there's no truncation.
This, for some reason newer programming languages keep falling into the same pitfalls that older programming languages did, then use them as defense for repeating their flaws like someone is forcing them to do so.
For the most part, It's the implicit conversions which could change the value. They're incidious. I usually compile with options to catch these, though they're so common that often functions in library headers trigger it.
I've always thought that the way Scala distinguishes between casts as possibly-widening type assertions (:) and casts as possibly-unsafe conversions (cast) made a lot of sense.
I am very much in the 'model reality' camp. I have no indexable things that have anything before index 0, so I have no need for a signed index. And it hardly ever comes up anyway, at least in my Rust code, because there are so few places where I would do an indexed loop. The most common one is to get a slice of the actually used part of a vector, but that's using a length, not an index.
Nice - reminds me of the good old days. I actually had a job interview when someone asked me to implement strcpy(char*, char*, unsigned int). I was out of focus and thought he wanted to see if I can handle plane C pointers and memory. However the pitfall of any memcpy/strcpy is when there is an overlap between the "copy from" and "copy to" buffers as you may ran over the buffer your copying from before reading it to the copy to buffer....
Glad the world is starting to see the light of signed by default. It seems the use case for unsigned because it can be >INT_MAX seems different than the use case where unsigned must be >0, and I wonder if there could be any value found in separating the two, i.e. a way of enforcing integer ranges at the semantic type level, rather than actually using unsigned values. Clearly not in C3s wheelhouse but I wonder if any language has done that.
References are functionally non-null pointers. Non-null smart pointers are not in the standard library because they are essentially incompatible with non-destructive moves.
If you want simple, just use big ints. Those can’t ever over/underflow. If you cannot, you can use fixed-size integers and join the compiler in believing they don’t exist (because it would be UB otherwise). Alternatively, if you want to use signed over unsigned to reduce footguns, why not do due diligence and consider that some arithmetic operations may be erroneous and add basic validation?
Checked on compiler explorer and all compilers can not get rid of the second idiv. Of course I feel confident enough to not make a benchmark because idiv has high latency, but maybe it doesn't matter actually you never actually know with modern architectures.
I do the following that just gets turned into a conditional move
auto index2 = (start + offset) % length;
index = index2 >= 0 ? index2 : index2 + length;
markand67@reddit
Signed ints have their flaws too. Nothing is perfect. Unsigned ints have some benefits that we use a lot.
seqid++) as long as wrapping does not create issueint width, is negative width allowed?)Nuoji@reddit (OP)
It’s very good for bit manipulation and exact packing.
What I do think is that ”it can’t be zero” benefits are less good than we think.
For example we might think that we sanitized input by saying a size is unsigned: only valid values now right? Except most likely anything above say 2\^48 or so is probably wrong. Not to mention easy to accidentally overflow if we do calculations.
If we has signed, we *start thinking* about limits, because there is such an obvious lower limit. But that also helps us start thinking about the *upper* limits, which are just as important to think about for correctness. So often that ”can’t be zero” comes back to haunt you.
This is especially true for a language like C3 with contracts, where it’s trivial to add both boundaries to the contract if you’re adding one. This both documents (contracts are lifted into docs) the valid range and enforces it.
cheese_karate@reddit
So, it seems to me, that has no clue whatsoever, that c3 is a bad language and no one should be using it.
All hail the Rust!
WHY_DO_I_SHOUT@reddit
It would work even with a negative
offsetif the % operator was modulo of floored division. I wonder if the C3 developers considered that option (truncated division is almost never what you want when the divisor is negative).RegisteredJustToSay@reddit
Or, hear me out, offset is unsigned, start is unsigned and length is unsigned, and you ensure they have sizes suitable for their purpose to avoid boundary effects, and you flag operations involving mixed types with your linter as errors. The solution is consistency, understanding what's happening under the hood, and letting tooling flag issues for you.
It's literally always a potential source of vulns to mix data types and sizes. Signed is not gonna save you - it has weird boundary behaviors too and would just result in different bugs in different circumstances. I mean if start and offset are a smaller integer (in terms of bit width) than length you're still gonna have a bad time.
lizardhistorian@reddit
Provide signed overload declarations with no definition.
0x564A00@reddit
Why should the language define them in the first place
stumblinbear@reddit
Why not? If you expect a number to never be negative, it's best to ensure it's impossible for it to be negative in the type system. Invalid states should ideally be unrepresentable
0x564A00@reddit
I don't get what that has to do with my comment? How does a language implicitly converting between signed and unsigned numbers when you use both in the same operation instead of requiring you to explicitly convert the numbers to the same type beforehand prevent you from unintentionally providing a representation for invalid states‽‽‽
stumblinbear@reddit
Because oftentimes languages will let you omit the type of a variable when defining it, and if the inferred variable switches types because you changed one of the types of the variables used in the calculation, it can change the validity of the resulting code
0x564A00@reddit
The article is about language design. It mentioned mixed-type integer operations being a problem, giving the following C example:
The comment I originally replied to suggested linting against that. I suggested that instead the language shouldn't have implemented it in the first place.
I still do not understand how this would give rise to a representation for illegal states.
stumblinbear@reddit
I'll be honest, I got a couple of different conversations mixed up in my last reply and misremembered the context
To be clear, your original comment was asking why a language should have unsigned and signed at all, correct? That is the point of the article, after all
Assuming this is an accurate: what do you do about smaller integer types?
u8?u16? Do you just make those signed integers and pay the cost of the larger one even if all you need is up to the number 240?0x564A00@reddit
Ah, sorry for the confusion. No. My comment was replying to the comment it was replying to and quoting. It is obvious that a language targeting low-level control wants unsigned types; even in Java (which only has
u16in this regard) this is annoying.60hzcherryMXram@reddit
If we're allowed to remove the ability for the code to use negative offsets when the premise assumes it's a necessary feature, then my fix would be for nobody to program anything, thus minimizing any and all bugs entirely.
jcelerier@reddit
The problem is that operations such as "unsigned - 1" gives a signed result. So it's very, very easy to introduce signed into an unsigned world. Conversely, no basic operation on signed ints creates an unsigned result. Thus it's better to always stay signed.
garnet420@reddit
Assuming we're talking C, unsigned - 1 is still unsigned.
vytah@reddit
But it's defined to use truncated division, in both C and C++ (it used to be implementation-defined in C, but they changed it in C99).
prehensilemullet@reddit
It used to be implementation defined? Wow, that’s almost as insane as having data type sizes be implementation defined
vytah@reddit
I mean, C doesn't even guarantee the number of bits in a byte, or that an array can have 40000 elements, so...
lizardhistorian@reddit
DSPs can easily have 32 bit bytes. Back in the day many had 24b.
rsclient@reddit
Let's talk about the length of function names!
Back when ANSI C was first being made, one of the improvements was to allow external functions whose name was more than (IIRC) 6 characters on all standards-compliant compilers.
Which, weirdly, was a big step forward. At the time there existed old architectures where the linker only resolved the first 6 characters.
But ... technically, the standard at the time just required that at least one function name longer than 6 characters be allowed. If a program had two functions with longer names and the compiler would be allowed to reject it.
Kirides@reddit
Wait, is that the reason for Ordinal function exports?
vytah@reddit
If you're talking about some Microsoft stuff, then no, it was Microsoft trying to optimize Windows to run on anything. Remember, it was literally 1984, you couldn't just waste memory on function names willy-nilly.
There are functions in Windows like CopyRect, which is literally just a memcpy with a fixed size, that exist only because they make the calling code smaller.
Sharlinator@reddit
And all the standard library functions from that era indeed have six-letter names at most. It’s not a coincidence that strlen, memcpy, printf, etc all have six letters.
valarauca14@reddit
signed integers aren't guaranteed to have 2's complement layout until C23 (ISO/IEC 9899:2024).
lizardhistorian@reddit
No sign-magnitude machine exist today outside of a museum.
vip17@reddit
that's why the
divfunction exists to avoid implementation-defined behavior when you need truncated divisionmax123246@reddit
The landscape of hardware for CPUs was very much the wild west back then. There was no standardization
prehensilemullet@reddit
Fortran is even older but from skimming some Google search results, its solutions to these problems seem a bit more elegant than the way C does it
WHY_DO_I_SHOUT@reddit
These guys are creating a brand new language though. They could make their % operator different.
vytah@reddit
So I just checked and they fixed the precedence of the bitwise operators, nice. Which means they don't care about expression-level compatibility with C or C++ anyway, unlike some languages whose names start with J.
barsoap@reddit
The ideal solution would be to have a type like
Index<123>which will give you a 123-element type with wrapping add, making your codeindex += offset. Whereoffsetshould probably be a different type (Offset<123>?), also allowing for negatives. I'm sure the mathematicians have a fancy name for that kind of construct.Good ole design by wishful thinking.
Schmittfried@reddit
Of course. Not incidentally, it’s called a ring.
barsoap@reddit
Yeah but rings aren't necessarily finite, also we do have sensible multiplication and addition on those things so it's at least a field. I remembered that Galois fields exist but checking wikipedia they have to have p^k elements, where p is prime and k an integer so they don't match this one.
PhilipTrettner@reddit
Fwiw, invertible addition and associative multiplication makes a ring. If the multiplication is invertible (except for 0), then it's a field. Basically ring is +-, field is +-/
barsoap@reddit
Yeah that makes sense as multiplication is just repeated addition and inverses are hard.
...and I guess I just assumed that integer division would be mathematically sensible in the presence of wrapping, that's probably not the case and the reason why finite fields must have p^k members.
So it's just a finite ring? That easy?
Schmittfried@reddit
Yes. Usually it would be called a residue class and if its size is a prime, multiplication actually does have a well-defined multiplicative inverse for every element.
Schmittfried@reddit
A finite ring then, or a residue class ring (that’s what it’s called in German).
Multiplication and (invertible) addition are what makes a ring. A field also needs a definition of division. To be precise, a ring consists of a set and two operations. It is an abelian group with regards to the addition operation (i.e. every element is invertible) and a semigroup with regards to the multiplication operation (i.e. there is not necessarily an identity element and elements don’t have to be invertible). A commutative unital ring (i.e. a ring where multiplication is commutative and has an identity element) is a field, if and only if every element except the null element has a multiplicative inverse.
philh@reddit
I don't think the thing they're describing is very well modeled by a ring.
If we ignore the "type of offset is different than type of index" thing then I think it's just a cyclic group, but with that then I'm not sure. Maybe something something actions?
SV-97@reddit
Yeah I think actions are the way to go, in particular finite Torsors / Principal Homogeneous Spaces (over a finite group) seem like a good fit. We have some index set with (invertible) "offset" operations such that any index can be reached by a suitable offset from any other one, and for each element there is exactly one offset that doesn't do anything.
Plank_With_A_Nail_In@reddit
How are negative offsets getting presented to these methods? You are in control of how data gets into the system so sanitise it to make sure its not nonsense.
GameCounter@reddit
With respect to the quote, it actually doesn't always work for positive offset, because of the pesky thing we like to forget about: overflow.
angelicosphosphoros@reddit
I don't know why anybody would think that. It is actually generator of footguns.
Proper solution is to disallow mixing signed and unsigned integers in an expression.
EfOpenSource@reddit
But but but math library code becomes less beautiful!
AverageHot2647@reddit
And more correct. Which seems like the more important thing for math 😛
Tringi@reddit
Regarding the comparison. I say it's about time we dropped the promotion to unsigned and fixed the intrinsic comparison operators for distinct types. Like this:
rsclient@reddit
But ... that's a lot of extra code that almost always just does the last line. One big advantage of the weird, bad way is that it's a single fast opcode.
IMHO, languages should be designed so that even when compiled in DEBUG mode, they are still fast. We shouldn't use the (expensive) optimizer as a crutch to work around language proglems.
(says the person programming in Python :-) )
evincarofautumn@reddit
I say this as a compiler engineer working in high-performance computing.
It doesn’t matter how fast the code is if it’s wrong.
If bounds checking is what it costs to make the code work right, then that’s just what it costs. The way to avoid the cost isn’t to drop bounds checking, but to give the programmer more expressive tools for showing the compiler when bounds checking is unnecessary, such as flow typing, or more precise integer range types, or replacing indices altogether, with iterators that are correct by construction.
carrottread@reddit
But it can't. The only thing bounds checking can do is to detect what something is wrong for accessing out of bounds and crash/throw exception. It can't magically fix the code which calculated wrong index.
evincarofautumn@reddit
Yes, that’s fair. In “work right” I am implicitly including properties like memory safety. If you have a bad index, with bounds checking at least you won’t go scribbling over random memory, but in principle this should never be necessary for internally generated indices.
You can do better by pushing all checking to the system boundaries. You use a foreign key type instead of an arbitrary primitive index, like an integer in an array or a string in a hash. You can parse an integer into an array key by bounds-checking it, or serialise an array key to an integer for free, but they are distinct types.
Now any operation on a container like
array.find(value)orhash.keys()that would return indices returns keys into that specific container, likeOptional<Key<array>>orSet<Key<hash>>respectively. This is a limited case of dependent typing. An operation such ashash.insert(new_key, new_value)orarray.insert_after(old_key, value)that modifies the container also implicitly provides the caller with the ability to cast between old and new indices, automatically avoiding rechecking when possible. In this case, old keys ofhashare a subset of new keys, so casting is free, while oldarraykeys get a range check and offset as appropriate.angelicosphosphoros@reddit
It is not really that bad. It can be compiled into 3-4 operations in machine code.
Also, it is rare case. Most comparisons are between integers with same signess.
Tringi@reddit
What a great proof of concept! I'll borrow your godbolt link for my papers page, if you don't mind.
Also switching the compiler to MSVC and seeing the output is just sad.
angelicosphosphoros@reddit
I don't mind, you can use it.
Nuoji@reddit (OP)
This was exactly what C3 had built in. It was removed now, because there is no need to support it after picking signed by default. Trying to compare signed and unsigned is just an error (unless the unsigned can be safely widened into the signed type)
angelicosphosphoros@reddit
It is insane to remove a good thing. Making signed default doesn't change this.
Nuoji@reddit (OP)
If comparisons between unsigned and signed are no longer allowed, then they have no place.
RumbuncTheRadiant@reddit
Let me take your poster child example and turn it into C++....
for (uint x = 10; x >= 0; x--) // Infinite loop! { ... }https://godbolt.org/z/4YjEr1rnq
What problem?
KittensInc@reddit
The obvious solution is of course to support a
for (uint x = 10..0)orforeach x in range(10, 0), like most other modern languages do.If you are going to make a new programming language, why not learn from all the mistakes and improvements of the last 50ish years? C for-loops are the way they are because they are trivial to compile to machine code. We aren't running our compilers on an 8086 any more, having it do some extra work to unburden the programmer is fine!
BjarneStarsoup@reddit
Or you can just do
for (uint x = 10; x > 0; ) { x--; }. Nicely reflects the symmetry between forward and backward iteration (decrement before use vs. increment after use).bzbub2@reddit
I think the loop example is a hook to get your attention but many more sneaking behind automatic conversions
chucker23n@reddit
This.
Swift abandoned C-style for loops altogether. You iterate over a sequence (whether that’s an array or range), and instead of the buggy
for (uint x = 10; x >= 0; x--), you just usefor x in 10...0.tav_stuff@reddit
Happy to see this – signed by default makes everything so much easier. Unsigned numbers typically only create problems.
Of course the average programmer will suggesting overcomplicating your language and adding 900 different extra syntax features instead of just using signed sized
MindlessU@reddit
If you want simple, just use big ints. Those can’t ever over/underflow. If you cannot, you can use fixed-size integers and join the compiler in believing they don’t exist (because it would be UB otherwise). Alternatively, if you want to use signed over unsigned to reduce footguns, why not do due diligence and consider that some arithmetic operations may be erroneous and add basic validation?
tav_stuff@reddit
> If you want simple use big ints
What the fuck bruv. Simple? Have you implemented big ints before?
MindlessU@reddit
I did not tell you to implement bigint, but rather to just use it, because unlike fixed-sized int, bigints do not over/underflow so they are less erroneous, and therefore more simple to use - they can even be implemented using tagged pointers to reduce heap allocation. But of course most don’t use them because of unnecessary performance costs.
tav_stuff@reddit
I don’t think you know what simplicity actually means. Simplicity != Easy
MindlessU@reddit
I think you should elaborate on what you mean, because you haven’t rebutted on any of the facts I brought up.
Electrical-Rise-5433@reddit
Consistent unsigned types for offset, start, and length make boundary checks cleaner. Letting linters enforce this feels like a solid way to reduce off-by-one chaos without overcomplicating things.
phire@reddit
TBH, there is very little downside to using signed pointers (other than the pain of switching, which would make them are all the same type again.
This would actually be more correct, as many CPU architectures already specify that memory addresses are signed, with rules on how 32-bit pointers get sign-extended to 64 bits, and often placing rules that bit 47 (or whatever the highest bit is) must be the same as bit 63.
On most operating systems, kernel pointers always have the upper bit set (aka, they are negative) and it's always cleared for userspace pointers (aka positive).
EfOpenSource@reddit
Erm? Have you looked at the reasons that the upper bits are set at kernel level addresses? It has nothing to do with signedness.
phire@reddit
Doesn't matter why that convention started.
Point is that it's compatible with signed pointers, especially for userspace stuff.
EfOpenSource@reddit
The bits are not “set for compatibility with signed”
They’re set for intentionally getting upper range of address space. It is incorrect to think of kernel space pointers as “signed”. They’re called “negative memory space” just as a nomenclature. It is literally nothing to do with signed compatibility.
Address space losing the upper range being set for user space also has literally nothing to do with signed/unsigned compatibility.
Kernel developers use explicit casts when they are targeting signed or unsigned arithmetic.
It is stupid to point at kernel pointers as some “gotcha” of signed v unsigned.
phire@reddit
Sorry, I should have been more clear.
I'm not bringing it up as a "gotcha", to say that pointers should have been signed all along, or to say that kernel pointers are signed (or negative).
I'm bringing it up because the one thing that could theoretically break with singed pointers is a buffer that crosses the boundary between
0x7fff0000and0x80000000, as signed overflow is undefined (in c/c++).But this historic convention conveniently allows us to side-step that problem, as no memory allocations will ever cross that boundary.
zapporian@reddit
Unbelievably dumb / wrong take.
Doing unsigned comparisons wrong is programmer error / PEBKAC.
Unsigned should be the default for everything not vice versa.
Indices should ALWAYS be unsigned.
Index bounds checks on unsigned can/should wrap, and requires only one branch check not two. etc
If this is somehow a problem on your codebase, use a static analyzer. Stop (for the love of god) writing raw indexed for loops. (if you need to internalize (while i —> 0 etc instead)
and ideally fire / ban any half trained programmer who doesn’t understand unsigned correctly, from introducing bugs and screwing up your code
max123246@reddit
If area takes 2 unsigned integers, what do you propose should happen if height 2 is greater than 1?
EfOpenSource@reddit
I am actually almost flabbergasted that this is presented as a real argument.
This is some shopping network level shit right here.
angelicosphosphoros@reddit
Just write it like this: area(absdiff(height1, height2), absdiff(length1, length2))
zapporian@reddit
Static analysis warnings, because subtracting two unsigned integers like that is inherently dangerous and can/will fail without clamping, making sure to take the min/max, etc
Note that this will still be a bug WITH SIGNED if you aren’t abs-ing that result and were expecting that area() should always be positive.
This is also a bit of a silly example b/c geometric area operations on coordinates is often (albeit not always) operating in real space and/or on coord systems that should be signed.
Again I did not say that you should never use signed, I said you should use unsigned by default and very carefully consider when/where you need to change that.
Area() if anything is a good example of where having the concept of signed-ness in your type systems IS a good idea.
It is generally NOT at all a good idea to write an area function that can return negative integers or floating numbers.
You should handle that correctly in the implementation.
Better richer type systems + constraints are always better than not.
If you could do this stating that eg area is a generic function that takes (R, R) -> UR, where R is any real non infinite non NaN value, and UR is an unsigned / not negative subset of the same, would be a good extension of / to an ideal type system. Ditto having the concept of normalized / non normalized values and so on and so forth.
There are to be clear hazards involved with unsigned integers.
The extra bit of precision (and loss thereof in conversion to signed) is a major hazard if you aren’t aware of and spec around this. eg further restricted integer value ranges, which mind you are easier to test/assert on unsigned than signed integers.
Subtraction operations period are another massive hazard and need to be clearly thought about and restricted particularly when it comes to indices and ranges etc.
All of these unsigned errors ARE ALSO SIGNED ERRORS, and will just have the “fun” advantage of introducing potential runtime failures everywhere with bounds checks etc, courtesy of allowing types (integer types are just intent annotations) that can be negative, when they shouldn’t be negative, that can/will blow up somewhere.
Using unsigned to annotate “this type should never be negative”, running bounds checks, and aggressively contract programming (ie asserts) everything, with restricted integer range limits, is the correct way to do all of this.
And again this is one of several safety hazards / footguns that should be used to keep uncareful programmers out of systems programming.
Yes sure unsigned integers break - somewhat - very well understood features of mathematical numver systems and common operations.
So does having finite integer (nvm floating point) types, in general
And unsigned integer behavior IS a well understood clearly specified hardware behavior (on any sane compiler - note thwt this obviously does not blanket apply to c/c++ compilers). With convenient prooerties. That works well IF you understand and use it correctly.
TDLR; yes better and better compilers type systems + static analysis is the correct fix here.
No removing unsigned from cases where this makes sense does not.
I am going to quite frankly fully, flatly disagree with stroustroup, and heck even sutter etc here.
If you care about this (unsigned AND footguns) just use rust. Which adds all kinds of performance degrading runtime checks in at least debug builds (IIRC) to help prevent programmers from hurting themselves. And yes does use u64 everywhere for indices. Which is correct.
KrazyKirby99999@reddit
Read Subscripts and sizes should be signed by Bjarne Stroustrup
https://open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf
matthieum@reddit
So, who do we have in the signed size camp:
gsl::spanhad a signed size!).Nah, it's probably all hogwash, not even worth considering...
CrossFloss@reddit
Bringing up someone who always choses the wrong defaults for his language is not as good an argument as you think...
valarauca14@reddit
This is a rather funny paper
Then fix your compiler. I swear
matthieum@reddit
I do want to mention an alternative: panicking on overflow.
It has... consequences.
It makes a lot of "trivial" compiler optimizations a lot harder to prove correct: overflowing operations are not commutative and associative on the whole range of their inputs, unlike wrapping ones, so the optimizer need to prove that the input ranges allow for it.
It's incompatible with SIMD, in general, which typically implements wrapping semantics.
So all in all, a whole lot of performance dings left & right.
And of course, it's still all "run-time" detection. So the code passes the test-suite and only fails in production. Erf.
But... honestly, all the alternatives to this problem kinda suck, and panicking may just be the least worse :/
CJKay93@reddit
I think this ignores the third option, which is that you use a non-truncating cast (e.g. Rust's
.into()) which is only implemented for the types where the signed representation is capable of representing all values in the unsigned representation.CramNBL@reddit
You never had to use a u64 and also use it to index into a vec (usize)? Or convert to/from f64?
.into() is preferred but let's not pretend the author doesn't have a point that there's a lot of casting between numeric types in Rust code.
AnnoyedVelociraptor@reddit
I prefer to lock down the code in such a way that an as cast where information is lost is not allowed.
So i32 as u32 is not allowed. Use cast_unsigned, and all of the sudden we no longer have a risk that when someone replaces the u32 by a u16 that the code remains working.
There are 2 places where I allow as casts: in const, when you upcast both to a common size (because try_into is not const) to ensure you don't truncate, and the other is to take a pointer to an fn.
All the rest is try_into() and then you handle it explicitly. Even unwrap is preferred over silent eating of bits.
Explicitness is a choice, and your reward for that is a better system.
What I would want is for there to be a conversion for usize to u64 when ptr size is 64 and vice versa.
Iggyhopper@reddit
I like this approach and was going to add:
A type system can solve this problem. To paraphrase some arricles: don't validate that the operand or result will not error, ensure it can't error through the type system.
Also, I would consider it an invalid state should signed/unsigned logic result in ub or wrong values. That should be prevented beforehand.
CJKay93@reddit
Yeah, but if I have a
u64and I need to index then I usetry_into()and either handle it if I care about failure orexpect()if I don't. Either way, there's no truncation.MindlessU@reddit
This, for some reason newer programming languages keep falling into the same pitfalls that older programming languages did, then use them as defense for repeating their flaws like someone is forcing them to do so.
RRumpleTeazzer@reddit
laughs in rust, were u16->u32 is into() and u64->u32 is try_into().
chucker23n@reddit
But isn’t that good? It makes it clear that the latter is dangerous in a way the former isn’t.
RRumpleTeazzer@reddit
yes, "laughs in X" means it is not a problem at all in X.
chucker23n@reddit
Oh, I thought you were saying you felt Rust messed it up.
effarig42@reddit
For the most part, It's the implicit conversions which could change the value. They're incidious. I usually compile with options to catch these, though they're so common that often functions in library headers trigger it.
clhodapp@reddit
I've always thought that the way Scala distinguishes between casts as possibly-widening type assertions (
:) and casts as possibly-unsafe conversions (cast) made a lot of sense.Dean_Roddey@reddit
I am very much in the 'model reality' camp. I have no indexable things that have anything before index 0, so I have no need for a signed index. And it hardly ever comes up anyway, at least in my Rust code, because there are so few places where I would do an indexed loop. The most common one is to get a slice of the actually used part of a vector, but that's using a length, not an index.
Ok_Issue_6675@reddit
Nice - reminds me of the good old days. I actually had a job interview when someone asked me to implement strcpy(char*, char*, unsigned int). I was out of focus and thought he wanted to see if I can handle plane C pointers and memory. However the pitfall of any memcpy/strcpy is when there is an overlap between the "copy from" and "copy to" buffers as you may ran over the buffer your copying from before reading it to the copy to buffer....
sammymammy2@reddit
Just make u63 and u31 instead of u32 and u64
angelicosphosphoros@reddit
Agree.
mascotbeaver104@reddit
Glad the world is starting to see the light of signed by default. It seems the use case for unsigned because it can be >INT_MAX seems different than the use case where unsigned must be >0, and I wonder if there could be any value found in separating the two, i.e. a way of enforcing integer ranges at the semantic type level, rather than actually using unsigned values. Clearly not in C3s wheelhouse but I wonder if any language has done that.
TheSkiGeek@reddit
You can make custom types in C++ that do things like this. A common one that isn’t in the standard library is not-allowed-to-be-null pointers.
Kered13@reddit
References are functionally non-null pointers. Non-null smart pointers are not in the standard library because they are essentially incompatible with non-destructive moves.
MindlessU@reddit
If you want simple, just use big ints. Those can’t ever over/underflow. If you cannot, you can use fixed-size integers and join the compiler in believing they don’t exist (because it would be UB otherwise). Alternatively, if you want to use signed over unsigned to reduce footguns, why not do due diligence and consider that some arithmetic operations may be erroneous and add basic validation?
ppppppla@reddit
Checked on compiler explorer and all compilers can not get rid of the second idiv. Of course I feel confident enough to not make a benchmark because idiv has high latency, but maybe it doesn't matter actually you never actually know with modern architectures.
I do the following that just gets turned into a conditional move
https://godbolt.org/z/6z8fehWq8
axilmar@reddit
It is no fault to use unsigned for sizes.
The problem is bounds checking.
A signed number can also be invalid as a size.
Programming languages should not allow indexing without proof that the index is valid.
I.e. an operation of type
where 'y' has not been ensured to be within range should yield one of the following: