How SIMD improved vector search performance in Elasticsearch
Posted by chegar999@reddit | programming | View on Reddit | 20 comments
Posted by chegar999@reddit | programming | View on Reddit | 20 comments
coderemover@reddit
Quite expected, considering they implemented it in C and they compare to a competition in Java.
chegar999@reddit (OP)
The blog compares to 3 alternatives, only one of which is Java. The other 2 are C.
coderemover@reddit
Yes, but the difference to the other 2 is much, much smaller
chegar999@reddit (OP)
Correct. The point is not necessarily to be like-for-like faster, on the single distance comparison path (in fact we're not), but to demonstrate that in well architected designs the Java downcall to native is largely amortised when taken together with other factors on the system. So it is possible to efficiently interoperate with low-level systems kernels (written in C or rust), while keeping the larger algorithmic logic in Java.
coderemover@reddit
> while keeping the larger algorithmic logic in Java
Yeah, possibly. But why do it?
Rust is a better language for implementing algorithmic logic anyways. 😉
joemwangi@reddit
I think OP is aware the the direction java is taking towards a purely data driven programming. Especially once value classes become default to the language. SIMD types and their ops, together with pattern matching (through future carrier classes) will make this argument of Rust simpler and faster (no FFI cost), to be a bit moot.
coderemover@reddit
I don’t think so. Java is doomed by its portability philosophy. That’s why Panama vector api is such a performance failure - even though many years of engineering have been already put into it.
Anyway, regardless of performance, Rust is a much better designed language in terms of developer productivity and correctness, and also, quite unexpectedly, likely best suited for AI assisted coding.
joemwangi@reddit
That's not it. It's because they require value classes. That's the first sentence in their every preview. Once value classes come to preview, it's when they will do a proper implementation. But I understand, not many people love reading. SIMD types need to have no identity to become fully scalarised in cpu registers with limited stack allocation. I'm actually interested to see how performant they are since I've played with the prototype and they are super fast. Also, saying it has no unsigned types, what do you think value classes solve? You can even implement your own unsigned types for free. Operator overloading will by done by future java type classes which are similar to Haskell or traits in Rust, and thus operator overloading. Already being prototyped.
This is usually based on hype until one starts scaling.
coderemover@reddit
Value classes had been in works for 12 years now. I’ve had enough. Why wait for them if other languages have had them for years?
joemwangi@reddit
12 years, yet they managed to do it without breaking incompatibility for a 30 year old language, without introducing new bytecode nor introducing incompatible semantics. As a matter of fact, we get null-restricted types out of this. Which language has ever achieved that. None!!! And as a matter of fact, they have already started to migrate them to the mainline.
Immutable values are much better because they get super optimised. It's why value classes can stay in cpu registers rather than stack allocation that is common in C#, C++ etc. Efficient and easy to create types don't really require immutability where pattern matching dominates. I thought Rust has shown this clearly why immutability is essential.
coderemover@reddit
C compilers can still compile code from 1980. That’s much longer than Java.
Also Java did break backwards compatibility multiple times, e.g. migrating a lot of software from Java 8 to 9 was a nightmare. There is still plenty of software which can’t work on anything newer than Java 8. Maybe with Valhalla it’s going to be different, but it definitely hasn’t been a norm.
Immutability is useful but so is mutability. Computers mutate memory so many algorithms are better expressed by mutating state.
floodyberry@reddit
your llm slop already got removed once, why re-submit it
there's not even any programming or simd code, it's just an ad for elastisearch
chegar999@reddit (OP)
> there's not even any programming or simd code, it's just an ad for elastisearch
My motivation for posting in this subreddit, is to highlight that from a programming point of view, Panama FFI is a game changer for Java applications to interoperate with low-level systems programming. Even in highly performance sensitive applications, like e.g. Elasticsearch.
floodyberry@reddit
the title is "How we built Elasticsearch simdvec to make vector search one of the fastest in the world". there is no simd in the article. there are almost no explanations for what you did that made it fast other than "being fast is important. so that's what we did". there is no code at all in the article. there is a lot of stilted filler that is not how humans talk.
coderemover@reddit
> Panama FFI is a game changer for Java applications to interoperate with low-level systems programming.
As we can see in the linked blog post, it is not.
Panama was the biggest loser of this benchmark.
chegar999@reddit (OP)
Panama FFI is the downcall from Java to native - replaces the old JNI. Jvector uses Panama Vector API - the JDK's Java API atop SIMD primitives. The former is faster than the latter, for the cases that we benchmarked.
commenterzero@reddit
What makes you think an LLM wrote it
TinyLebowski@reddit
Yeah if that's AI generated, I'm super impressed.
chegar999@reddit (OP)
Thanks you - I take it as a compliment.
chegar999@reddit (OP)
I can assure you that I am a real person, and wrote that blog myself (along with the listed co-authors).