Branch-Avoidant Quicksort in C - faster than std::sort and pdqsort
Posted by chkas@reddit | programming | View on Reddit | 14 comments
Posted by chkas@reddit | programming | View on Reddit | 14 comments
vip17@reddit
Modern SIMD sort are much faster
chkas@reddit (OP)
These projects are x86-only (AVX/AVX-512) and don't work on ARM or other architectures.
vip17@reddit
The idea is the same, it's easy to port to other architectures including ARM. And there are already implementations for them:
chkas@reddit (OP)
Thanks for the links, but I’m looking for actual, ready-to-use C/C++ code. Most of these projects are just complex wrappers, very theoretical, or full of pseudocode. Do you have any examples that use direct SIMD intrinsics (ARM NEON) for sorting, so I can compare it to my Branchless-Quicksort?
vip17@reddit
Google Highway is a cross-platform vector library that you can already use. And on a mac you can check out https://github.com/davidesantangelo/vsort
chkas@reddit (OP)
I just tested the library: it is indeed very fast for integers, but it uses an internal Radix Sort. Since this is a non-comparison sort, it feels like cheating in a direct performance comparison.
Furthermore, when using
double, the library fails to sort the array correctly and produces an 'ERROR ORDER', which makes it unreliable.vip17@reddit
do you mean vsort? Because Google Highway uses sorting network for its implementation, similar to Intel sort
chkas@reddit (OP)
I used the repository you sent me the link for.
Furthermore, while it performs well for float32, it relies on Apple’s GCD for parallelization, which makes a fair benchmark against my single - threaded Quicksort unfair. I have also developed a multi-threaded version of my Quicksort that is twice as fast for floats (0.24s vs. 0.50s) compared to the vsort version.
PlusLoquat1482@reddit
Nice example of the gap between “same big-O” and “same performance.”
The branch-avoidant partitioning and sorting networks for small partitions are doing the interesting work here. I’d be careful calling it broadly faster than
std::sort/pdqsortwithout more distributions and data types, but for random integers this is a very plausible win.programming-ModTeam@reddit
No content written mostly by an LLM. If you don't want to write it, we don't want to read it.
funny_falcon@reddit
heap_sort is not complete! Only “heapify” part is implemented. Popping one-by-one max element from the heap is completely missed.
I’ve already seen this mistake in other article. Looks like they copy from same broken implementation.
chkas@reddit (OP)
It's not broken:
funny_falcon@reddit
Oh, I see: I misread the code. Both parts (heaping and popping) are bolted into the single loop.
Excuse me for wrong claims.
itix@reddit
I wonder if code duplication could be avoided with Duff's device.