The gold standard of optimization: A look under the hood of RollerCoaster Tycoon
Posted by fagnerbrack@reddit | programming | View on Reddit | 53 comments
happyscrappy@reddit
There's no way to know if the code actually replaced multiplies with bitshifts as even an assembler is very likely to optimize the instructions emitted for multiplies in this way. Trying to make all your strides powers of two is a conscious optimization though and was incredibly common before the 32-bit era. On 8-bit systems it was essential, on 16-bit more like very helpful.
There is another example of a dedication to assembly in the 32-bit, more forgotten now. That is WriteNow, which was a word processor for the Macintosh (68K). It was written all in assembly. Every bit. The company had to make special efforts to find assembly programmers to maintain it as the original programmers ceased to work on it. They used to have tee shirts with assembly language programming problems as recruiting tools.
The word processor was ported to NeXTStep also, as NeXT started out as 68K. But after that it never went anywhere else because it was all in assembly and 68K just petered out. It couldn't be ported to other platforms without more effort than it was worth.
https://en.wikipedia.org/wiki/WriteNow
chasetheusername@reddit
Do you have a source for that? I'd consider an assembler broken if it did that, also never saw an optimization level or similar that could be configured in any assembler.
happyscrappy@reddit
You can go try out ARM's assemblers for their 32 bit platforms. It changes instructions all the time. As long as all the effects are the same. Some types of instructions don't even exist on ARM, they are pseudo-ops for other things. Their assembler really got moving with Thumb (their compressed instruction set) because there are more compressed versions of some instructions than others and some duplicate encodings are removed.
They are only carrying on what other architectures and assemblers have done over time. 68K assemblers would change ADD with an A register as a destination to ADDA because ADD to an A register was illegal. Also write ADD #6, D0 and it would be assembled to ADDQ #6, D0 because it was a smaller encoding.
And if you think about it you can work out why it is like this. Think, why wouldn't you just write ADDQ instead of ADD in the first place? That's totally find if you are writing literally ADDQ #8, D0. But assemblers are symbolic. What if you write ADDQ #(DVAL/4), D0? It assembles the first time because DVAL is 32. But what happens when DVAL becomes 36? Now DVAL/8 is 9 and you can't ADDQ 9 to something. So you always write ADD #(DVAL/4), D0 and every time it fits in an ADDQ the assembler generates an ADDQ and when it doesn't it becomes an ADD. You are still free to write ADDQ.
RISC-V goes even further, it would make ADD #8, D0 (note, such an instruction does not exist, it's an example) an illegal encoding since there is already an encoding for this operation. So without the optimizing assembler writing assembler would be much more difficult.
On MIPS the default assembler mode (IIRC) was that you wrote your code in order and then the assembler would rearrange to put instructions in the delay slot (MIPS would run the next instruction after a branch regardless of whether the branch was taken or not).
You can look up RISC-Vs pseudoinstruction lla if you want. Or la, which turns into lla or lga, neither of which are instructions either. Better yet, look up their far branches transformation. It shows the same as ADD/ADDQ for 68K. If the branch fits in a single instruction it emits a single instruction. Otherwise it reverses the branch and emits 2 instructions. Their example:
https://zju-os.github.io/doc/spec/riscv-asm.pdf
See page 42 for more transformations.
For 68K, ARM and RISC-V I don't know there are any settings for optimization for the assembler. As I mentioned I think the MIPS assembler did have a setting to turn off the delay slot automatic rearrangement.
chasetheusername@reddit
but we're talking about x86 assemblers in the late 90's
jwp1987@reddit
I'm assuming they meant compiler. Assembly translates almost directly to machine code, so I don't see how it would switch out instructions.
chasetheusername@reddit
Sure, C compilers back then did that, but roller coaster tycoon was largely written in assembler, so the usage of bit shift instructions was likely intended in the original source, and the consequently the first sentence doesn't make sense then.
itix@reddit
68k assembly was very nicea and it was the first and last assembly language I learnt properly.
jwp1987@reddit
There are quite a few other optimisations that x86 compilers can do that aren't mentioned:
XORing a register with itself sets it to zero and is faster than loading a zero value.
LEA is often used for certain multiples that aren't a power of two.
Adding combinations of LEA and bit shifts can make arbitrary integer multiples.
You can also make multiples that aren't a power of two with repeated additions.
This is basically multiplying by a fraction that approximates the one you want with one that has a denominator of a power of two.
However, improving rendering speed and algorithms that often operate over large amounts of data are usually bigger performance increases than writing in assembly.
For the developers that are still relatively new: PROFILE YOUR CODE FIRST BEFORE MICRO OPTIMISING.
BusEquivalent9605@reddit
played it as a kid. learned how this was written a few years ago. still blows my mind
chhuang@reddit
Maybe we should do this again, overly optimized software that should require only some bits and electricity. I have been playing some old games and just realized how good we had it
Gogo202@reddit
People say this shit, but none of you would be willing to pay triple for the development cost. Indie games are already almost never profitable. Small studios are barely profitable and larger ones will never give a shit about optimisations, because the next game in the series must be released in the following year
BuzLightbeerOfBarCmd@reddit
[laughs in Rockstar]
Many big studios are not releasing a game a year. More like 1-2 per decade.
ChocomelP@reddit
Triple is very generous for writing in assembly versus whatever Unity or Unreal Engine use.
Ameisen@reddit
And there's no reason to think that that assembly would be better than the result of C++ compilation.
wonkytalky@reddit
A compiler's work can result in something incredibly efficient, especially if you're aware of how it'll treat certain design patterns.
SyntheticDuckFlavour@reddit
Look, I'm not asking software devs to write code close to the metal. All I'm asking is use good tools that already do this for you, like building stuff natively, rather than leaning on bloatware libraries.
WaitForItTheMongols@reddit
Factorio. Extremely well optimized (with a purpose made engine) and a policy of treating every bug as an actual problem worth solving. Perfect example.
UnrealHallucinator@reddit
Do it. Be the change you want to see! Could start by writing ciphers. PRESENT and Katan32 should take no more than like 100 lines in the language of your choice. Most programmers don't even know assembly.
Glynny69@reddit
Same here. Learning Chris Sawyer wrote that whole thing in assembly as a solo dev is just unreal. The fact that it still holds up today says everything.
janisozaur@reddit
Sigh, another repost. It was last posted about month ago https://www.reddit.com/r/programming/s/vrP9xlgUaS
And it's still full of inaccuracies, half-truths, missing citations.
I'm openrct2 developer, we have rewritten entire RCT from assembly into C and then C++. The article sources its "findings" on our code, but nowhere mentions that. We have made some changes to the original to improve its performance.
No amount of assembly saves you from writing bubble sorts or not caching linked list nodes.
Assembly is not some magic performance dust. You can still use it to write bad code.
Ameisen@reddit
I'd argue that most people who write large amounts of assembly do.
pelirodri@reddit
What do you do?
cgaWolf@reddit
They write bad assembly, didn't you read their post?
BuzLightbeerOfBarCmd@reddit
That's only some of what they do. Perhaps they sometimes write bad C or Python as well.
kagato87@reddit
The whole point of the higher level languages is for the better code.
Imagine trying to do some of the gymnastics class inheritance offers by design, in assembly. Heck even in C it'd be a giant pain.
yellow_leadbetter@reddit
It would be great to see a more accurate post like this from an openrct2 developer like yourself, if you had time
Niwrats@reddit
a rather basic writeup. it's not like design and code are entirely separate in modern games either, there is feedback.
if you look into PS1 games, you often see more than just bit shifts; shifts mixed with additions or substractions are common, and can get you eg *5 without the multiplication.
empire314@reddit
why wouldn't a cpu be optimized to do basic arithmetic efficiently?
aanzeijar@reddit
Yes, every standard compiler will do every trick you can think of as an assembly coder and then some. The notion that an assembly coder will write so much faster stuff is 99% of the time complete horseshit.
And for that one situation where you can make a low level improvement over C++, you can still have inline assembly in C++.
happyscrappy@reddit
They were. By 1990 it was pretty common for compilers to optimize this stuff. And the common for them to optimize fixed integer divisors into reciprocal multiplies with shifts too. You see this technique get highlighted in "one weird programming trick" youtube videos from time to time.
Really though, processors started to have fast multipliers (4 or 8 cycles instead of 32) around the same time compilers got better at optimizations. So many of the optimizations become pessimizations.
LowerEntropy@reddit
CPU/compiler design was changing rapidly in the late 90s. Chris Sawyer obviously loved and was very proficient in assembly, but most people abandoned writing everything in asembly at the same time.
Not sure about this, but I would also guess Railroad Tycoon was based on code from Transport Tycoon, which came out in 1994. On the Pentium integer multiplication was 10x slower than bitshifts or addition/subtraction, but that already changed in 1997 when the Pentium II came out, where multiplications became 1 cycle.
Compilers also caught up and became very good at figuring out how to order instructions or when to prefer bitshifts over multiplications.
FyreWulff@reddit
Yeah, people kinda forget that in the 90s your computer was obsolete within 2 years or so. There wasn't milking your hardware for a decade like you can do these days.
Dyolf_Knip@reddit
Yeah, but Mel still won't use them.
awj@reddit
Mel’s principals are so strong even his software refuses to cheat.
flumphit@reddit
My respect for Mel remains boundless.
Niwrats@reddit
i'm talking about machine code here, so possibly compiled from a higher level language already. i don't know the story behind the cpu though.
Comfortable_Relief62@reddit
Generic multiplication is extremely expensive and also isn’t a native operation on older CPU designs
AyrA_ch@reddit
RISC CPUs usually lack the more complex mathematical operations, and compilers back then weren't as good at optimizing some routines as they are now. Even shaving off just a few CPU cycles can result in big FPS gains if the function is called often enough within the render pipeline.
Dragon_yum@reddit
I love reading about old game hacks on how they made them. Blows my mind how smart some people are.
BizarroMax@reddit
Bit shifting has been a standard C compiler optimization for … a long time.
yellow_leadbetter@reddit
This article seems like dogshit.
Ameisen@reddit
Yeah.
Bullshit. If the integer is
unsigned, and you multiply or divide by a power-of-two, it will absolutely convert it to a shift.If it's
signed, then it won't... because that is potentially incorrect.xanhast@reddit
using a re-implementation to draw conclusions from the original is dumb af. this reads like an ai essay, well written and confidently wrong.
MeBadNeedMoneyNow@reddit
This is just a tldr of a video that I saw the other day
mughinn@reddit
The video explicitly uses the blog post as a source lol
MeBadNeedMoneyNow@reddit
When's it my turn to cash in on these five low-level quirks of a 30 year old game?
QuarterCarat@reddit
No one’s getting rich off these.
MeBadNeedMoneyNow@reddit
Someone's getting ad money and it's not me.
lemondragon@reddit
Looks like the blog was there earlier than the video, going by the date in the url.
Rellek_@reddit
I can't wait to dig into this, but I did a quick ctrl+f and I kind of expected to see a mention of Transport Tycoon Deluxe, or OpenTTD. I'd be curious to see what took original shape in that game, and what was improved along the way. As a youngin' I was obsessed with TTD. Then when RCT came along, the similarity made my head spin in a way that only happens when you're that young and impressionable. Sawyer was my hero.
floodyberry@reddit
the article is unfortunately very basic and, if i had to guess, just the author browsing the openrct2 source and looking for a few things that stand out to write about
Rellek_@reddit
Totally fair! Was meant to be an observation rather than a criticism. I enjoyed the read!
Krillo90@reddit
This is such a nice intuitive way to explain bit shifting.