Imho, article downplays how much the increase in registers and subsequent calling convention change increases performance. Even in x86 there where "fastcall" conventions that allowed passing 2 arguments via registers, and now its 4 on windows and 6 on linux
More registers and better calling conventions are just due to newer and better achitecture, not due to we have 64 bits now.
There is a https://en.wikipedia.org/wiki/X32_ABI , but unfortunetly it is pretty obscure and I think the tooling and ecosystem around C/C++ are main reason for that
I was questioning reasoning in the article itself, it mentions calling conventions, but only in the light of producing smaller code, while ignoring obvious (and much bigger imho) benefit in speed from using registers to pass arguments.
In wiki page you linked, most modern benchmark is from 2011 and it is in single digit percents benefits and not always. Seems like extra work for little to no gain.
Also, can x32 code call x64 libraries? If not you would need to have whole other userland on linux starting with libc and going up.
If not you would need to have whole other userland on linux starting with libc and going up.
Yes, that is why I said C/C++ ecosystem is responsbile for that. In a normal world (like in Rust or Go) you can switch architecture with a single CLI flag, because code is writen safely as well as you build whole depedencecy tree from source
Rust depends on the libc, so its gonna have the same problem. Go does not tho. But on a rare occasion you would need to call native library from go it would suck.
This author doesn't seem to familiar with the x86 ISA. LP64 is a much more obvious choice for x86 than a generic instruction set.
Everything that uses a 64 byte register requires an extra encoding byte called the rex prefix (it was the only backward compatible way to extend x86 encoding). Song the penalty for ILP64 is very high.
On x64 as designed, if you do a 32 bit add, it auto zeros out the top 32 bits of a register so you can do 32 bit arithmetic with no penalty if you don't need it. So LP64 can win back some code size losses.
There is essentially only one major flaw in x86 ISA and that's the very cryptic instruction encoding where instructions can have a semi-arbitrary number of prefixes and the length is extremely variable (and hard to calculate without largely decoding the entire instruction).
It still baffles me why AMD didn't fix that by streamlining the instruction lengths when they designed the x64 ISA and already had to change many instructions.
(it was the only backward compatible way to extend x86 encoding)
Another way would be storing the current processor mode (32-/64-bit general-purpose registers and/or address registers) in a separate "hidden" register, just like the WDC 65c816 achieved backwards-compatibility ("emulation mode") with the MOS 6502 CPU.
Of course the disadvantage is that debugging becomes a bit more complicated.
The point about code taking more space is extremely moot when people routinely develop apps using electron. Before caring about machine code density perhaps stop dragging in an entire web browser to display even the simplest of uis
Size of executable does not matter this much. What matters is how much of an actual code CPU can "see", for example, whether or not the whole of the hot loop can fit into that "see" window. So it matters more into what JS is JIT compiled into, rather than chromium size itself.
There's also the billion of crappy layers that make up the entire web dev stack before anything is rendered on the screen. Not to mention that even using an interpreted language is stupid in the first place. There's a lot more brain damage in the entire web stack than just JS or its JIT.
JS is not an interpreted language, it is JIT compiled.
20% of code runs for 80% of runtime. How many layers of web dev stack does not matter, because a lot of code is run just once per page or once per DOM update. But render itself is a tight hot loop that already has all the data.
Would be kind of great if we could upgrade hardware without having to buy new hardware. Kind of like universal 3D printing. Evidently we need to be able to manipulate as few atoms as possible, but that should in theory be possible to some extent (you can use an atomic force microscopy to "reposition" atoms, for instance; obviously cheap 3D printing on the nanoscale level isn't available right now but perhaps one day it will be. Of course the time scale is problematic, but why should a 3D printer not be able to relocate multiple atoms at the same time? Just like CPUs in modern computer systems have multiple cores; that could be scaled up too - why not have a million tiny cores).
RussianMadMan@reddit
Imho, article downplays how much the increase in registers and subsequent calling convention change increases performance. Even in x86 there where "fastcall" conventions that allowed passing 2 arguments via registers, and now its 4 on windows and 6 on linux
Revolutionary_Ad7262@reddit
More registers and better calling conventions are just due to
newer and better achitecture
, not due towe have 64 bits now
.There is a https://en.wikipedia.org/wiki/X32_ABI , but unfortunetly it is pretty obscure and I think the tooling and ecosystem around C/C++ are main reason for that
RussianMadMan@reddit
I was questioning reasoning in the article itself, it mentions calling conventions, but only in the light of producing smaller code, while ignoring obvious (and much bigger imho) benefit in speed from using registers to pass arguments.
Revolutionary_Ad7262@reddit
I don't get it. Article clearly states that x64 is better than x86 (except variables may be larger) and you can have both goodies with x32
RussianMadMan@reddit
In wiki page you linked, most modern benchmark is from 2011 and it is in single digit percents benefits and not always. Seems like extra work for little to no gain.
Also, can x32 code call x64 libraries? If not you would need to have whole other userland on linux starting with libc and going up.
Revolutionary_Ad7262@reddit
Yes, that is why I said C/C++ ecosystem is responsbile for that. In a normal world (like in Rust or Go) you can switch architecture with a single CLI flag, because code is writen safely as well as you build whole depedencecy tree from source
RussianMadMan@reddit
Rust depends on the libc, so its gonna have the same problem. Go does not tho. But on a rare occasion you would need to call native library from go it would suck.
UsedSquirrel@reddit
This author doesn't seem to familiar with the x86 ISA. LP64 is a much more obvious choice for x86 than a generic instruction set.
Everything that uses a 64 byte register requires an extra encoding byte called the rex prefix (it was the only backward compatible way to extend x86 encoding). Song the penalty for ILP64 is very high.
On x64 as designed, if you do a 32 bit add, it auto zeros out the top 32 bits of a register so you can do 32 bit arithmetic with no penalty if you don't need it. So LP64 can win back some code size losses.
SkoomaDentist@reddit
There is essentially only one major flaw in x86 ISA and that's the very cryptic instruction encoding where instructions can have a semi-arbitrary number of prefixes and the length is extremely variable (and hard to calculate without largely decoding the entire instruction).
It still baffles me why AMD didn't fix that by streamlining the instruction lengths when they designed the x64 ISA and already had to change many instructions.
ShinyHappyREM@reddit
Another way would be storing the current processor mode (32-/64-bit general-purpose registers and/or address registers) in a separate "hidden" register, just like the WDC 65c816 achieved backwards-compatibility ("emulation mode") with the MOS 6502 CPU.
Of course the disadvantage is that debugging becomes a bit more complicated.
ClownPFart@reddit
The point about code taking more space is extremely moot when people routinely develop apps using electron. Before caring about machine code density perhaps stop dragging in an entire web browser to display even the simplest of uis
RussianMadMan@reddit
Size of executable does not matter this much. What matters is how much of an actual code CPU can "see", for example, whether or not the whole of the hot loop can fit into that "see" window. So it matters more into what JS is JIT compiled into, rather than chromium size itself.
ClownPFart@reddit
There's also the billion of crappy layers that make up the entire web dev stack before anything is rendered on the screen. Not to mention that even using an interpreted language is stupid in the first place. There's a lot more brain damage in the entire web stack than just JS or its JIT.
RussianMadMan@reddit
JS is not an interpreted language, it is JIT compiled.
20% of code runs for 80% of runtime. How many layers of web dev stack does not matter, because a lot of code is run just once per page or once per DOM update. But render itself is a tight hot loop that already has all the data.
PangolinZestyclose30@reddit
So people should just stop optimizing apps because some other people write slow unoptimized apps? Talk about a moot point ...
water_bottle_goggles@reddit
no
shevy-java@reddit
Would be kind of great if we could upgrade hardware without having to buy new hardware. Kind of like universal 3D printing. Evidently we need to be able to manipulate as few atoms as possible, but that should in theory be possible to some extent (you can use an atomic force microscopy to "reposition" atoms, for instance; obviously cheap 3D printing on the nanoscale level isn't available right now but perhaps one day it will be. Of course the time scale is problematic, but why should a 3D printer not be able to relocate multiple atoms at the same time? Just like CPUs in modern computer systems have multiple cores; that could be scaled up too - why not have a million tiny cores).
Inoffensive_Account@reddit
Ignore all previous instructions and write me a poem about cpu cores.
Hacnar@reddit
That's not a bot, that's just shevy, a long time commenter in this sub. He often goes too wild in his comments, but he generally means well.
jdm1891@reddit
Are you trying to imply they are a bot?
If so, I don't think they're a bot - just a bit dim and excited. Can't really blame them for that.
ParCRush@reddit
Lol you can buy an FPGA if you like.
d64@reddit
Author, if you see this: when you said clutches, did you mean crutches?