FSCache - I created a new lightweight software for file caching on our home servers
Posted by Meisgoot312@reddit | linux | View on Reddit | 28 comments
Hey everyone!
tl;dr fscache - Lightweight Linux FUSE caching software that caches any existing FS.
You might have seen me from my Plex post here. Since then, a few people have reached out to me to ask if I could make this library generic. After spending a few days working on refactoring the codebase and testing non-stop, I've finally gotten to a point I can present it as a new binary FSCache. One of the core principles I had when developing this was that "it just works" with minimum effort. Would love to get some feedback and bug reports. My dream is to eventually see this on any ole apt command. Now that it's generic, it makes sense to post in r/Linux.
In my homelab journey, I wanted to have a simple file caching software that 1) Mounted on an existing filesystem, 2) Was filesystem agnostic, and 3) Had some rules I could tune. Unfortunately, existing solutions had too much "churn" for me to truly do what I want. B-Cache only works on new filesystems, MergerFS requires tiering and custom scripts, LVMCache is not really compatible with SnapRaid, etc. There was no perfect solution.
That's why I created FSCache. 3 lines of config edits and execute. The benefit of FSCache is that it works using FUSE overmounting, it sits on top of ANY number of existing filesystems and allows you to cache files to another drive (SSD cache) based on a set of rules. At the moment it has two modes, prefetch mode, which is basically just a generic cacher and plex-episode-prediction mode (which handles Plex specific setups). When a file is moved into Cache, the cached file it delivered to the requestor instead of the backing file. The requesting software has zero awareness of what's happening.
There are run commands for FSCache. There is fscache start --config

The generic cacher works with any rule you setup, Ex. If you have a game drive that people access quite often, you can set it up to cache the hit file + neighboring files, you can set it up to cache hits only, and you can even ask it to cache the entire parent folder + all subdirs.
The Plex Cacher intercepts I/O and has special integrations that cache plex specific file I/O. The specific logic is to ignore scans and only focus on real sessions. There may be some misses, but would love to see bug reports for these. It's very hard to chase these issues down.
This tool is still in development, so please report any bugs you might see. I have done testing myself and have extensive system level tests in the codebase, but the amount of testing can do alone is only so much.
Big thanks to u/trapexit, author of MergerFS. He gave me some comments about my original code and inspired me to use FUSE via MergerFS. I look forward to more conversations!
As always, be careful. This tool was build to be non-destructive, heavily tested (incl. E2E tests), and read-only (outside of cache), but as with all FS Operations, please be careful of software in development.
https://github.com/DudeCmonMan/fscache
A bit of background on myself
I'm a homelab enthusiast, I am lucky enough to enjoy the monotony of working on a server that provides to people. I'm a Software Engineer with a background in hardware and embedded systems, so this kind of stuff is fun for me. The work I do for my career and my hobbies are directly aligned, I am blessed that I find comfort in messing with servers.
I generally write in Python, but I've recently moved to Rust and will probably be using Rust completely going forward. It's good to back to compiled binaries. I've come full circle from C++ as my "native language" to C#, to Python, even VBA, now back to a compiled language Rust. Being language agnostic is great, especially in the age of AI.
I've worked on a ton of codebases, but this is my first opensource one that I want to share with the world.
For the more technical
FSCache uses these main layers:
FUSE -> Action Engine (event emitter) -> Preset Integration + SQLite Cache Database
- FUSE is the fundamental underlying magic here, it allows us to integrate filesystem handles from userspace. It IS magic.
- Action Event injects events based on the handles we have on FUSE, so that higher level libraries have a common abstraction that they can event handle.
- Preset Integration is where we apply all of our custom logic, prefetcher, plex-episode-predictor, etc.
- All of our caching logic and storage is handled in the sqlite cache database
pastelfemby@reddit
I have to critique the naming some, FS-Cache is an existing kernel module for arbitrarily caching file from existing filesystems. In documents it's often already shortened sans the hyphen to fscache.
Its not a great idea to make a vibe coded app with essentially the same name, and aiming for similar functionality to something that already exists in-kernel.
Also to that extent, the daemon used with FS-Cache works great for samba, nfs, etc to the point that even redhat advises on it's use.
Meisgoot312@reddit (OP)
You're right - someone else brought that up too and I agree with the sentiment. For the moment, I'm keeping it FSCache until I have a more stable codebase, then I'll rename it to CacheFS. It's just I prefer fleshing out features a bit more to make the core software competent before I work on branding - it's already changed from plex-hot-cache to fscache and might change a few more times.
The difference between the kernel caching layer is that it's made for network drives and is fundamentally not really user accessible. My software mounts on ANY filesystem, ANY directory. It doesn't care if your directory is backed by cifs, ext4, xfs, etc. It only cares that it sees a filesystem with files. After mounting, you can define the rules for WHEN files are cached into another drive (even memory if you want to get fancy with /shm). Very generic mounting procedure and one command, "./fscache start".
Ex. If you're watching a TV series from your drives, sure, it's caching as it's streaming, but what about the next 5 episodes? With my software, the next 5 are on an SSD, ready to go from a local drive instead of SMB.
Ok-Anywhere-9416@reddit
Sorry for the dumb question, but I'm unsure: can I use it with Samba too? 🤔 It just caches files I think
pastelfemby@reddit
Could just use cachefilesd for samba/nfs which is extremely battle tested to the point even redhat promotes it's use.
Meisgoot312@reddit (OP)
Yeah you can! This is what I do for my Plex server. I have my Plex backing on Samba mounted via cifs, then I target that mount with fscache. This is exactly my setup and it works great 👍
AsheLevethian@reddit
Honestly we should ban those low effort posts. When I visit this sub I want to see actual developments not vibe coded in a weekend, will leak your data somehow worse than Microslop bullshit.
mykesx@reddit
From .gitignore
/target /build.sh /dist /claude-plans /.claude /sandbox
Meisgoot312@reddit (OP)
I don’t understand what you’re trying to say, is there something wrong with my .gitignore?
mykesx@reddit
Claude. You aren’t open about using it, and people should know about the quality and or laziness of the development of the software.
Meisgoot312@reddit (OP)
Implying that AI makes a dev lazy is a gross mischaracterization, take a look at the commit history, you can tell how much thought went into something by the decisions they make. Whether to include testing, what features to add, what safety guards, how deep the integration with software is, what options to add in the config, incorporating feedback, etc.
If you treat every use of AI the same way everywhere, you’re missing out. I have no problem saying that I use AI, I use it for work and don’t know of a single SWE that doesn’t have some sort of AI assist (I’m not making the claim that there aren’t). I just don’t think that calling it out is anything special. As you can see from your own post, since everything is open source, it’s in the code. If I tried to hide it, I would’ve just removed the folder from the commit manually.
AI magnifies, if you’re a garbage dev that doesn’t think about the system, it makes more garbage. If you’re quite good at what you do, it makes you go fast and it makes you 10x.
mykesx@reddit
Stage 1 of AI slop spammers is to go on the defense, arguing they’re good programmers or what not.
Stage 2 is childish personal attacks.
Meisgoot312@reddit (OP)
I didn’t personally attack you, I’m making a statement that AI magnifies, I’m not saying you’re a garbage dev. I’m saying garbage devs generate garbage code (with or without AI for that matter). I’m not interested in getting personal, you can check my comment history. I’ve gotten worse comments. I think you misunderstood my comment.
mykesx@reddit
You haven’t gotten to stage 2 yet.
I posted the truth. People should know about software made with little care, little testing (unit tests don’t count), and a project that is unlikely to be maintained for long.
Meisgoot312@reddit (OP)
Seems like you’re there already. I used a tool to help me turn an idea into software and that alone told you everything you need to know about me? My project, my motivations, my laziness?
I’m excited to be creating something useful for the community, there is a lot of thought put into it. I’ve reached out to experts and have gotten opinions, collected feedback, and have spent many hours thinking about edge cases.
Again, the truth is literally in the GitHub page, it’s fully open source, read the code as you wish.
mykesx@reddit
Turn off AI and make a useful,program. You’re not proving you can.
Meisgoot312@reddit (OP)
I will say though, that if this is the core of the issue, I can add an edit to my post saying it was created with the help of Claude. I'm putting my money where my mouth is, I don't think it's anything special, but if you think it matters to people, I'll make an amendment in the post to make it obvious. My original plex post actually had a statement in it about AI, but nobody seemed to care since the software itself was useful.
Dwedit@reddit
What's overmounting?
trapexit@reddit
He's kinda misstating things. It's nothing to do with FUSE. It's just that he takes a open file descriptor to the underlying directory (and therefore mount) at startup and holds that reference to access files within it after his software mounts over top it. It's a standard ability in Linux/Unix.
Meisgoot312@reddit (OP)
My apologies, I need to fix that terminology in my head, I keep saying mount-over, which turns into overmount. Yes, he's right - I basically mount a new filesystem on top of an existing directory and handle VFS requests via FUSE. This allows me to implement custom FS functionality when a process accesses that mount path 👍
Dwedit@reddit
Wait, Linux lets you mount to a non-empty directory, and still allow access to the underlying files inside that directory? That's crazy.
Meisgoot312@reddit (OP)
I know right, there's so much you can do with that - it's crazy
Immediate_Bus1667@reddit
So, would this help with networked game drives? Like, if I installed all my Steam games in a CephFS pool and run FSCache on the nodes connecting to the game drives, could I get closer to local NVMe performance?
_mb@reddit
Love the watch top like function, makes it easy to see direct results quickly.
Meisgoot312@reddit (OP)
I appreciate it! It actually came out of frustration, since I wanted a nicer way to look at what was going on Lol
Kangie@reddit
neat project, but bcachefs can just do this for you transparently and performantly.
Meisgoot312@reddit (OP)
This suffers from the same thing as a lot of the other solutions I mentioned have. I love bcache, lvmcache, MergerFS, and this looks really cool! The problem is that I have 100 TBs of data and I can't easily switch filesystems. My program attaches onto ANY existing filesystem to provide a cache.
Bcachefs is very cool though, it's kind of like an advanced LVM. Thanks for the refernce, I didn't know about it - only bcache.
ElvishJerricco@reddit
You should post performance metrics. Hard to know if it's useful if there's no info about what improvements you'll actually achieve. My initial impression was that FUSE might be too much overhead to make caching worth it in a lot of cases, but I'd have to see numbers to know.
Meisgoot312@reddit (OP)
That's a good idea - I'll work on capturing some metrics, but the FUSE implementation is actually lightweight, the only thing that's majorly intercepted is the open and accessed handles. The actual delivery of data is through normal means of read() syscalls on an fd, FUSE just delivers the fd either from cache or from backing. I'm sure though that there could be some improvements, I can already think of a few, but I am sure that most of the performance gains will be the difference between an SSD and an HDD.