Built a High-Performance Key-Value Datastore in Pure Java
Posted by theuntamed000@reddit | programming | View on Reddit | 11 comments
Hello everyone, I am excited to share a small milestone, it's the project I have been working in my free time during weekends since past 2 years.
DataStore4J a key value datastore entirely written in Java, inspired by Google's LevelDB, its still under development.
I’ve published some benchmarks results The performance is on par with LevelDB, and for comparison I also included Facebook's RocksDB (which is a different beast altogether)
I’ve also written some documentation on the internals of the DB
The aim was to get it to a good comparable performance level with levelDB.
Lots of learning from this project, from database internals to Java's concurrency, to using JMH for benchmarks and Jimfs for testing.
I’m the sole developer on this, so I’m sure I’ve misused Java in places, missed edge cases, or even obvious bugs. I'd love to hear any feedback, and issues from those who've tried it out.
Thank you all.
noswag15@reddit
I was looking for something similar to this but with support for streams instead of byte arrays ... I checked rocksdb but it seems to expect the key and value to both be byte[] ... from the readme on this project, this library also seems similar ... does stream support exist or is planned for the future ?
A library like this could be very useful as a temporary storage/cache for large files and blobs (potentially downloaded from external sources) but if they first have to be eagerly read into memory as byte[] before being stored in the cache, it may not work well.
Familiar-Level-261@reddit
have you heard of file systems ? those can stream AND are KV stores!
noswag15@reddit
I'm not sure what you're implying here. Of course I know of filesystems. What I'm looking for is to be able to store files downloaded from external systems and have them indexed by some id. Think of user profile images for example. The advantage of putting it in a key-value store like this is that values can be memory mapped so if memory is available, it will be as good as reading from memory and the system takes care of swapping content. Furthermore, if the key-value store supports size/TTL based eviction, I don't have to worry about cleaning up files. Essentially, what I'm looking for is an LRU cache which can serve content as fast as possible if memory is available and if not, fallback to disk and handle swapping/eviction of both keys and value chunks.
Familiar-Level-261@reddit
In context of specifically what I answered to (I'm not denying other useful cases for KV), namely temp download files
that's a file name
You can just mmap a file
If you remove a file while still holding FD, it will exist up to the point of closing FD or app exiting i.e. auto cleanup.
If you want persistence TTL is also very easy to do on disk files.
The point is using DB for simple and temporary stuff is some massive overkill as you're essentially making worse file system
noswag15@reddit
Well that's the point. I don't want to wait for JVM exit for file cleanup. I don't have infinite disk storage so I need to evict disk files after they reach a certain maxTotalSize.
Well yeah if I have to implement everything by hand, I can if there's no other option (which is exactly what I ended up doing). But if a library can do it for me, and if it's lightweight enough, why wouldn't I use it ?
theuntamed000@reddit (OP)
hmm, you really have nice usecase there.
idk if i am going to support streams in near future, but what if the streams get broken in the middle of data transfer ? we might want to discard it which is a transactional property, its like 1 or 0, nothing in middle.
So i might think to add transactional property first, as its widely required feature.
But currently still the focus on increasing performance and concurrency throughput.
noswag15@reddit
makes sense. I'll keep an eye on the project. thanks.
psychelic_patch@reddit
Hei man ; i'm also writing databases ; i'm not using java but feel free to reach out i'm using paper and benching lot of behaviors before-hand ; will ppb not be testing your app but who knows maybe we can still help each other ; would have mainly some questions concerning your choices here tbh ; truly inspiring work tbh keep it up !
theuntamed000@reddit (OP)
Thanks man,
would like to hear your questions
HosseinKakavand@reddit
Impressive work. To build trust, add a JMH suite with pinned CPU and warmed JVM, and publish p50, p95, p99 plus tail latency. Compare to RocksDB, Chronicle Map, and MapDB with identical durability. Document crash consistency, WAL or checkpoints, compaction, file layout, and GC vs off heap. A YCSB profile and failure testing would make the story compelling.
We’re experimenting with a backend infra builder, think Loveable but for your infra. In the prototype, you can: describe your app → get a recommended stack + Terraform, and managed infra. Would appreciate feedback (even the harsh stuff) https://reliable.luthersystemsapp.com
theuntamed000@reddit (OP)
hmm yeah i'll think about that.