Over-engineering 5x Faster Set Intersections in SVE2, AVX-512, & NEON
Posted by ashvar@reddit | programming | View on Reddit | 9 comments
Posted by ashvar@reddit | programming | View on Reddit | 9 comments
Dragdu@reddit
Have you tried VPINTERSECT with new Zens? It is supposed to be down to 1 cycle.
ashvar@reddit (OP)
I’ve heard the rumors and saw some interesting comments on the Zen5 teardown thread on HN, but not sure if any such CPUs are actually available for testing. I use SPR and Zen4 on AWS, those are the newest x86 chips they provide. Do you know a better place to get them?
Dragdu@reddit
Dunno if you can get them in clouds yet, I just know that some of my friends have bought them for personal use and are drooling about them since :v
ashvar@reddit (OP)
Nice! I’ve just asked on Twitter if anyone I know has Zen5. Would be very interesting to compare! I think I can avoid at least 2 jumps if this instruction is indeed so fast. If anyone here is eager to try, would love to experiment together 🤗
camel-cdr-@reddit
Since your benchmark only accumulates the count, have you tried replacing the emulated vp2intersect with a something like simple cmplt that gives the wrong result, but would estimate the performance. This shouldn't change branching or memory access behavior.
ashvar@reddit (OP)
Emulation is performed exactly with that kind of comparisons or do you mean something else?
camel-cdr-@reddit
I mean replace vp2intersect with a single comparison, or any other single operation that has similar throughout and latency to vp2intersect on zen5.
LlamaJet@reddit
I’ve heard the rumors and saw some interesting comments on the Zen5 teardown thread on HN, but not sure if any such CPUs are actually available for testing. I use SPR and Zen4 on AWS, those are the newest x86 chips they provide. Do you know a better place to get them?
ashvar@reddit (OP)
I’ve heard the rumors and saw some interesting comments on the Zen5 teardown thread on HN, but not sure if any such CPUs are actually available for testing. I use SPR and Zen4 on AWS, those are the newest x86 chips they provide. Do you know a better place to get them?