Too much Discussion of the XOR swap trick

[-]

HighRelevancy@reddit

The compiler has seen straight through both swaps

Here's the secret that the government doesn't want you to know (/s): the compiler will almost always see through anything clever you're doing and generate the same code anyway. Provided that your high level algorithm decisions are correct, programming in the plainest and most obvious terms usually generates the ideal code and has the additional benefit that other people (and AI tools if you're into that) can read your code even before their morning coffee. Programming "tricks" are almost always bad.

[-]

christian_regin@reddit

> the compiler will almost always see through anything clever you're doing and generate the same code anyway

This is only true for very simple stuff. Of course, the compiler is getting smarter, but it veeeeeery far from perfect.

[-]

HighRelevancy@reddit

"Simple" is a very vague definition
No, not really. You're probably still wrong for any reasonable definition of simple. Compilers are capable of large scale optimisations in many cases. But they're also limited in that optimisations must still do the same observable effects your code said. They won't refactor a bad design for you. See: "Provided that your high level algorithm decisions are correct".
I did say "of this scale".

Sometimes this can feel untrue. C++ looooves a good footgun in regard to point 2, it's not always obvious what can have "observable effects". But I promise you this holds true the majority of the time.

If you have specific counterexamples I'd love to talk about it.

[-]

christian_regin@reddit

One optimization that used to not be done (and still isn't on msvc) is

bool foo(bool a, bool b) {  
   return a && b;
}

branches instead of being optimized to a & b

[-]

HighRelevancy@reddit

Those aren't strictly equivalent operations. Granted a bool is supposed to be represented as a 0 or 1 but in practice sometimes that doesn't work out. Might be something uninitialised, or a deserialisation that's too trusting, for example. Not optimising that gives you more predictable outcomes in the face of UB, specifically that if A is truty and B is truthy, then A && B will be truthy (whereas A & B might not be).

[-]

foxsimile@reddit

ಠ_ಠ

But muh bithacks

[-]

trmetroidmaniac@reddit

I was expecting this post to touch on the XOR linked list trick. That can actually be useful, no?

[-]

imachug@reddit

As far as I'm aware, XOR lists are quite slow because the CPU cannot automatically prefetch the data before dereference early enough, so you encounter memory access stalls more often. And in a scenario where you need to conserve memory, you'd probably use something different from a linked list anyway.

[-]

juhotuho10@reddit

All linked lists have massive performance overhead by the very nature of not begin in contiguous memory with predictable access patterns

In my opinion they should never be used over simple arrays

[-]

imachug@reddit

Eeh, I don't think there are any absolutes in technology.

Linked lists require only fixed-size allocations and thus behave more predictably when allocating data, compared to what happens with arrays -- is there still reserved space? is the space after the array in memory free or do you need to memcpy data? does your allocator still have memory or does it need to call mmap? what about the allocator in the kernel?

[-]

slaymaker1907@reddit

Linked lists also have the advantage that you can embed them directly in the data type if you need to maintain a global list of the objects. Besides performance, it’s also nice since it guarantees that if the object gets allocated, you will be able to place it in the global list even if you go OOM.

[-]

Ameisen@reddit

A pointer to an element in a list isn't invalidated by insertion.

A pointer to an element in an array/vector absolutely can be invalidated by insertion.

[-]

rysto32@reddit

Not really, no. It’s been a long time since saving one pointer from a linked list node is a significant amount of memory.

[-]

Zahand@reddit

The importance of optimization depends largely on the level at which you program and the resources available to you. I'm hypocritical, since I often fall into the same habits, but I am not a fan of how we frequently ignore compute and storage requirements. Knowing about and understanding these low level optimizations is a valuable skill imo.

This 'resource-blind' mindset has led (anecdotally) to a noticeable decline in software quality. We see it in mobile apps that hog system memory for simple tasks, websites that have ballooned in size due to unoptimized assets and redundant frameworks, and modern games that require massive storage footprints while still struggling with performance. By treating hardware as an infinite resource, we've inadvertently traded sleek, performant software for bloated applications that frustrate the end-user

[-]

simonask_@reddit

It’s valuable to know about these things, but it’s also very easy to miss the forest for the trees.

Linked lists are almost never the correct tool for the job, and when they are, 8 bytes per node is completely irrelevant. If it’s not, your lists are either too long (killing performance that way), or you have too many of them (in which case you can save another 8 bytes by using a flat array). Or both.

[-]

moefh@reddit

You're thinking only about high-end systems like x64 and ARM64.

I know it's not discussed too much here, but a lot of us work with MCUs where memory is measured in kilobytes instead of gigabytes, and the CPUs don't need data caches because the main memory is SRAM, which has very low latency.

In these systems, using linked lists is way more viable because there are no cache misses to kill performance. And since there's so little RAM, saving a few bytes per node is very relevant.

[-]

simonask_@reddit

I’m willing to bet that 95% of uses of linked lists are because they are the easiest data structure to use in C, and that’s the main reason MCU programming uses them at all, despite the fact that they are objectively worse in almost every way. If you were using a better language, I don’t think you’d be spending that extra pointer per entry on that.

[-]

moefh@reddit

"Objectively worse in every way" is meaningless when you don't know the problem domain. Power consumption and code size are actual things we have to deal with, not (just) memory use.

You can easily use an array instead of a linked list, and then the product's battery life is cut in half because of all the memory copying the code is doing all the time.

Or you can easily use a fancier data structure that doesn't need to copy memory around, and then the code doesn't fit in your 2KB ROM anymore.

It's really easy to think people are lazy and stupid when you haven't tried to solve the problems they're solving.

[-]

simonask_@reddit

I don’t know why you’ve chosen this particular hill to die on, but you have every right to use linked lists as much as you want. Nothing about arrays mean that you have to do any copying, though.

[-]

moefh@reddit

What a weird comment.

I'm just showing how different constraints lead to different solutions. I guess it's hard to be convinced when you've never been hit with a real hardware limitation.

[-]

n7tr34@reddit

Yeah 100%. The vast majority of deployed computers out there are embedded MCUs with minimal resources. Far more common than general purpose CPUs.

You can buy new up-to-date designs equipped with single-digit kilobytes of SRAM. These are not relics of ages past :)

[-]

PancAshAsh@reddit

With modern CPUs and compilers that isn't necessarily even an optimization, is the problem.

[-]

hardware2win@reddit

It is still better to have slow sw than no sw

[-]

Kered13@reddit

And the use case is being and to iterate forwards or backwards, but having no need to stay iterating from the middle of the list, is very unusual. In fact linked lists in general are much slower than arrays, unless you need jump straight into the middle of the lost and perform insertions and deletions.

[-]

sweetno@reddit

It can, but how often it would?

[-]

somebodddy@reddit

No need for XOR just to swap two variables. Just do this:

write(open("swapfile", O_WRONLY | O_CREAT | O_SYNC), &a, sizeof(a));
a = b;
read(open("swapfile", O_RDONLY), &b, sizeof(b));

https://onlinegdb.com/CQCUUOMiR

[-]

minecon1776@reddit

Use a VHF antenna and point it at the moon. Send a's value in binary, then do a = b. Then when the signal returns 2 seconds later, read it into b.

[-]

somebodddy@reddit

How would you do that without additional variables?

[-]

DogsAreAnimals@reddit

Nice, but this won't work well for distributed systems. That's why I use dns txt records for the tmp space.

[-]

TTachyon@reddit

So if you have a spare register, there is no reason to use XOR swap — three moves and three XORs are the same cost, and moves are clearer. The only situation where XOR swap wins is when you are genuinely out of spare registers and need to avoid spilling a value to memory.

I'm not convinced that xor would win here. Using xors here adds a data dependency between the used registers. Reading/writing to memory doesn't, and things can be executed in parallel. It might be easier to write, sure, but from a perf point of view, not having the dependency is usually faster.

[-]

andynzor@reddit

It has to faithfully emit the XOR operations, reloading from memory after each store, because each write through one pointer might be changing the value that the other pointer sees.

If you insist on using C where pointer aliasing gets on your way, I suggest you use Jens Gustedt's type-generic swap macro. There is no behavior hidden behind an API, so the compiler can see and happily optimize it away - no XOR tricks needed. I have manually verified the assembly output and the two registers just swap places in the emitted code in simple cases. Zero-copy.

[1] https://gustedt.wordpress.com/2010/10/23/a-generic-swap-implementation/

[-]

TexZK@reddit

I remember it as a ^= b ^= a ^= b. Anyway, this was obsolete, if not outright slower than a trivial register swap, already 25+ years ago.