FSCache - I created a new lightweight software for file caching on our home servers

Posted by Meisgoot312@reddit | linux | View on Reddit | 28 comments

Hey everyone!

tl;dr fscache - Lightweight Linux FUSE caching software that caches any existing FS.

You might have seen me from my Plex post here. Since then, a few people have reached out to me to ask if I could make this library generic. After spending a few days working on refactoring the codebase and testing non-stop, I've finally gotten to a point I can present it as a new binary FSCache. One of the core principles I had when developing this was that "it just works" with minimum effort. Would love to get some feedback and bug reports. My dream is to eventually see this on any ole apt command. Now that it's generic, it makes sense to post in r/Linux.

In my homelab journey, I wanted to have a simple file caching software that 1) Mounted on an existing filesystem, 2) Was filesystem agnostic, and 3) Had some rules I could tune. Unfortunately, existing solutions had too much "churn" for me to truly do what I want. B-Cache only works on new filesystems, MergerFS requires tiering and custom scripts, LVMCache is not really compatible with SnapRaid, etc. There was no perfect solution.

That's why I created FSCache. 3 lines of config edits and execute. The benefit of FSCache is that it works using FUSE overmounting, it sits on top of ANY number of existing filesystems and allows you to cache files to another drive (SSD cache) based on a set of rules. At the moment it has two modes, prefetch mode, which is basically just a generic cacher and plex-episode-prediction mode (which handles Plex specific setups). When a file is moved into Cache, the cached file it delivered to the requestor instead of the backing file. The requesting software has zero awareness of what's happening.

There are run commands for FSCache. There is fscache start --config and there is fscache watch. Start simply starts the caching daemon, this can be setup as a service. fscache watch opens up a gui and attaches to the daemon - this of it as top or nvidia-smi if you've used that before.

The generic cacher works with any rule you setup, Ex. If you have a game drive that people access quite often, you can set it up to cache the hit file + neighboring files, you can set it up to cache hits only, and you can even ask it to cache the entire parent folder + all subdirs.

The Plex Cacher intercepts I/O and has special integrations that cache plex specific file I/O. The specific logic is to ignore scans and only focus on real sessions. There may be some misses, but would love to see bug reports for these. It's very hard to chase these issues down.

This tool is still in development, so please report any bugs you might see. I have done testing myself and have extensive system level tests in the codebase, but the amount of testing can do alone is only so much.

Big thanks to u/trapexit, author of MergerFS. He gave me some comments about my original code and inspired me to use FUSE via MergerFS. I look forward to more conversations!

As always, be careful. This tool was build to be non-destructive, heavily tested (incl. E2E tests), and read-only (outside of cache), but as with all FS Operations, please be careful of software in development.

https://github.com/DudeCmonMan/fscache

A bit of background on myself

I'm a homelab enthusiast, I am lucky enough to enjoy the monotony of working on a server that provides to people. I'm a Software Engineer with a background in hardware and embedded systems, so this kind of stuff is fun for me. The work I do for my career and my hobbies are directly aligned, I am blessed that I find comfort in messing with servers.

I generally write in Python, but I've recently moved to Rust and will probably be using Rust completely going forward. It's good to back to compiled binaries. I've come full circle from C++ as my "native language" to C#, to Python, even VBA, now back to a compiled language Rust. Being language agnostic is great, especially in the age of AI.

I've worked on a ton of codebases, but this is my first opensource one that I want to share with the world.

For the more technical

FSCache uses these main layers:
FUSE -> Action Engine (event emitter) -> Preset Integration + SQLite Cache Database

FUSE is the fundamental underlying magic here, it allows us to integrate filesystem handles from userspace. It IS magic.
Action Event injects events based on the handles we have on FUSE, so that higher level libraries have a common abstraction that they can event handle.
Preset Integration is where we apply all of our custom logic, prefetcher, plex-episode-predictor, etc.
All of our caching logic and storage is handled in the sqlite cache database

[-]

pastelfemby@reddit

I have to critique the naming some, FS-Cache is an existing kernel module for arbitrarily caching file from existing filesystems. In documents it's often already shortened sans the hyphen to fscache.

Its not a great idea to make a vibe coded app with essentially the same name, and aiming for similar functionality to something that already exists in-kernel.

Also to that extent, the daemon used with FS-Cache works great for samba, nfs, etc to the point that even redhat advises on it's use.

[-]

Meisgoot312@reddit (OP)

You're right - someone else brought that up too and I agree with the sentiment. For the moment, I'm keeping it FSCache until I have a more stable codebase, then I'll rename it to CacheFS. It's just I prefer fleshing out features a bit more to make the core software competent before I work on branding - it's already changed from plex-hot-cache to fscache and might change a few more times.

The difference between the kernel caching layer is that it's made for network drives and is fundamentally not really user accessible. My software mounts on ANY filesystem, ANY directory. It doesn't care if your directory is backed by cifs, ext4, xfs, etc. It only cares that it sees a filesystem with files. After mounting, you can define the rules for WHEN files are cached into another drive (even memory if you want to get fancy with /shm). Very generic mounting procedure and one command, "./fscache start".

Meisgoot312@reddit (OP)

I don’t understand what you’re trying to say, is there something wrong with my .gitignore?

[-]

mykesx@reddit

Claude. You aren’t open about using it, and people should know about the quality and or laziness of the development of the software.

[-]

Meisgoot312@reddit (OP)

Implying that AI makes a dev lazy is a gross mischaracterization, take a look at the commit history, you can tell how much thought went into something by the decisions they make. Whether to include testing, what features to add, what safety guards, how deep the integration with software is, what options to add in the config, incorporating feedback, etc.

If you treat every use of AI the same way everywhere, you’re missing out. I have no problem saying that I use AI, I use it for work and don’t know of a single SWE that doesn’t have some sort of AI assist (I’m not making the claim that there aren’t). I just don’t think that calling it out is anything special. As you can see from your own post, since everything is open source, it’s in the code. If I tried to hide it, I would’ve just removed the folder from the commit manually.

AI magnifies, if you’re a garbage dev that doesn’t think about the system, it makes more garbage. If you’re quite good at what you do, it makes you go fast and it makes you 10x.

[-]

mykesx@reddit

Stage 1 of AI slop spammers is to go on the defense, arguing they’re good programmers or what not.

Stage 2 is childish personal attacks.

[-]

Meisgoot312@reddit (OP)

I didn’t personally attack you, I’m making a statement that AI magnifies, I’m not saying you’re a garbage dev. I’m saying garbage devs generate garbage code (with or without AI for that matter). I’m not interested in getting personal, you can check my comment history. I’ve gotten worse comments. I think you misunderstood my comment.

[-]

mykesx@reddit

You haven’t gotten to stage 2 yet.

I posted the truth. People should know about software made with little care, little testing (unit tests don’t count), and a project that is unlikely to be maintained for long.

[-]

Meisgoot312@reddit (OP)

Seems like you’re there already. I used a tool to help me turn an idea into software and that alone told you everything you need to know about me? My project, my motivations, my laziness?

I’m excited to be creating something useful for the community, there is a lot of thought put into it. I’ve reached out to experts and have gotten opinions, collected feedback, and have spent many hours thinking about edge cases.

This suffers from the same thing as a lot of the other solutions I mentioned have. I love bcache, lvmcache, MergerFS, and this looks really cool! The problem is that I have 100 TBs of data and I can't easily switch filesystems. My program attaches onto ANY existing filesystem to provide a cache.

Bcachefs is very cool though, it's kind of like an advanced LVM. Thanks for the refernce, I didn't know about it - only bcache.

[-]

ElvishJerricco@reddit

You should post performance metrics. Hard to know if it's useful if there's no info about what improvements you'll actually achieve. My initial impression was that FUSE might be too much overhead to make caching worth it in a lot of cases, but I'd have to see numbers to know.

[-]

Meisgoot312@reddit (OP)

That's a good idea - I'll work on capturing some metrics, but the FUSE implementation is actually lightweight, the only thing that's majorly intercepted is the open and accessed handles. The actual delivery of data is through normal means of read() syscalls on an fd, FUSE just delivers the fd either from cache or from backing. I'm sure though that there could be some improvements, I can already think of a few, but I am sure that most of the performance gains will be the difference between an SSD and an HDD.