shrinking filesystems still feels way too painful in 2026
Posted by DahliaDevsiantBop@reddit | linuxadmin | View on Reddit | 42 comments
ran into this again today and just need a sanity check from other linux admins.
we have a few linux boxes on ec2 and some bare metal that run data-heavy services. one job went sideways during a patch/cleanup window and dumped a bunch of temp data/logs. disk usage got high, so the volume got expanded to keep things from falling over.
cleanup finished later and actual usage dropped way back down.
so now we have a big mostly-empty volume sitting there.
growing the thing was easy. shrinking it back down is where everything gets annoying.
with xfs, there’s no shrink. with ext4, you’re basically looking at unmounting and doing it carefully. in practice that usually turns into:
- new smaller volume
- rsync data over
- stop services
- final sync
- swap mounts/uuids
- pray the old app doesn’t hate you
monitoring/cost tools can tell us “hey, you’re wasting storage,” but from the linux side the answer is usually “yeah, and i’d rather waste storage than break a stable system.”
how are people handling this now?
do you just accept that live filesystems are mostly a one-way street, or has anyone found a cleaner way to reclaim space without doing the whole migration dance?
bitcraft@reddit
Shrinking a filesystem is very rare. These days it’s usually a matter of changing the config and rebuilding the image. If you run into this problem often, maybe it’s better to fix your process to avoid needing it.
orev@reddit
They pretty clearly outlined a decent use-case on what happened, so saying they need to completely rearchitect their whole system because of a one-off issue isn’t helpful at all. Neither is the fantasy that every single system can be reduced down to a container/image that is fully idempotent with never a need to login to it interactively.
Sure, it’s possible for people running at huge scale with hundreds of clones of the same system, but it’s not realistic for more dynamic and varied environments.
We need to stop acting like everything can be solved by moving to a fully automated, read-only, devops style approach, and stop giving advice as if that’s the only way to do things.
alive1@reddit
Look..no matter your architecture, treating systems like unique unicorns that can't just be shot in the head and reprovisioned on the spot is an anti pattern. If you are at that level of babying a system, you can probably afford the extra disk storage or have the engineering capability to right-size your storage to begin with. Having to shrink your file system presumes so many anti patterns that nobody competent enough to implement it will need the feature in the first place. This is the reality of why shrinking filesystems is so rare.
orev@reddit
As I said, there are situations where the “just reprovision it” idea works, but there are many more where it just doesn’t. Usually people who think that it’s the only way also conveniently ignore all the other ancillary systems that need to be there to support the “main” system. While it might make sense for the “main” system to be setup this way, (fully automated, etc.), it’s just not the reality that *every* system is like this. It’s just not possible given most companies pressures and (lack of) resources.
wossack@reddit
we came from AIX with jfs2 filesystems (a long time ago) and we had a lot of (bad in retrospect) processes around the dynamic growth/contract of filesystems that provides..
We changed to Red Hat & xfs >10 years ago, and we were intially concerned alright, but we all adapted very quickly - it rarely comes up these days.
ksquires1988@reddit
I miss very little about aix , one exception being the ability to shrink filesystems on the fly
rmeman@reddit
No it's not. I do it routinely with zfs, remove vdevs, add vdevs. It automatically rebalances. I think btrfs can do it too
Hotshot55@reddit
If you're routinely shrinking filesystems, then all that really means is you're routinely over-provisioning your infrastructure.
oracleofnonsense@reddit
Zfs/btrfs — these are also relatively rare compared to the use of xfs and ext3/4.
kai_ekael@reddit
I smell lack of LVM. Too bad.
Cracknel@reddit
LVM doesn't help if the filesystem doesn't support shrinking 😅
user3872465@reddit
There should never be a reason to shrink your FS.
Start small:
If you have a big enough service its not usable anymore:
kai_ekael@reddit
'Never be a reason' except when there is.
Yes, and there should never be bugs in software..... /S
user3872465@reddit
Sure, and wheren there is a reason, there is a way.
Create new, copy, delete old.
Not that complicated.
mgedmin@reddit
Back before Linux supported anti-aliased fonts, Linux forums were full of advice of the sort "anti-aliased fonts are bad you don't need them".
When Xft got introduced and Linux suddenly could do anti-aliased fonts, those posts suddenly stopped appearing.
nut-sack@reddit
Observability system. Someone creates a high cardinality metric, chews up 3TB before you can make the team kill it. Now you either delete the metrics, or let them expire out. What do you do about that extra space?
Your alternative was to just let it run out of space and impact observability for the whole org, or you grow it to buy you some time.
Hotshot55@reddit
Are you saying the solution to a bad data collection rule is just to limit how much space it can waste? It's still going to max out the allowed usage and cause a problem either way.
user3872465@reddit
Such a system should have been a Statge system. Further it should have the Data stored onto a Seperate Volume from its root.
In which case you delete and give a new smaller volume.
If it was production because you tested in prod, well then You leave it to have the 3TB beause you tend to observer more in the future where you probably need that space anyway.
Or: Create new, copy, delete.
Tho Personally I would just let the system run out into its limit since its on a different volume you can look at it fix it and it shouldt impact much.
Unnamed-3891@reddit
Newsflash: sometimes a need for space that used to valid ceases to be
user3872465@reddit
Or Better: If it ceses to be you can delete it entirely
Unnamed-3891@reddit
Isn’t it great how you forseen all possible circumstances and deems it so for all of us. Thank you, m’lord.
user3872465@reddit
Anytime. Else you get so much legacy stuff and so much extra work for yourself.
I mean you can do whatever you wanna do ofc. But my point is a matter of best practice.
Unnamed-3891@reddit
Totally going to be planning datastore migrations after a single large file or directory are deleted 🤦♂️
user3872465@reddit
How big are your files or directories in your DB that you would entertain Shrinking a volume to be worth it? 500TB?
Generally your DB or Fileservers have a %per Year Groth Sure you delete some stuff sometimes, but that sometimes should be caught up by the groth of the system at large.
Unless of you use your production system to migrate a file/db X times the size of the system itself in which case you are doing something wrong and using the wrong tool
chkno@reddit
If data needs to be stored, it goes in some data-storage service. Local filesystems are temporary, ephemeral things that last for the duration of an instance. Every software update creates fresh instances. No pets.
autogyrophilia@reddit
Shrinking filesystems should be a last resort.
Trim the filesystem and use thin provisioning.
eltear1@reddit
Can you explain your sentence
trim the filesystem?meditonsin@reddit
Trim is a mechanism to let the underlying hardware know which blocks are not used by the filesystem anymore. This came originally about for SSDs, so their wear leveling knows which blocks it can re-use, but it's also useful for virtual disks.
Without trim, thin provisioned disks will likely eventually grow into their allocated maximum size. With trim, they only take up actually utilized space.
deeseearr@reddit
You can run fstrim on a filesystem to mark any unused blocks as UNMAPped. It's the same command you would use to clean up an SSD, but if you are using virtual disks in vmware and have them set up correctly with thin provisioning enabled (Please RTFM for everything else that you need to do) then that will also tell the ESX host that it can release the unmapped space, shrinking the file which backs that virtual disk on the host.
You won't see any difference on the VM, as it still sees the full size being available, but through the magic of thin provisioning the actual space taken up on the ESX server will be reduced.
dougs1965@reddit
Logs on a separate FS on the original setup before you start live operations.
Normal operation.
Whoops, log FS suddenly got huge for some reason.
Make new empty small file system.
Stop services, unmount old log FS, mount new empty FS, start services. Total downtime probably quite short.
Clear up the mess on the old log FS at leisure, archive what you need and throw away the rest.
Dangle76@reddit
Tbh I’d even say offload the logs into a central log storage system and use logrotate that way you never have to do any of this
peteShaped@reddit
100% this - you should try to avoid storing more than "last N days/weeks" logs on any local system
Rubenel@reddit
Good points.
alexkey@reddit
No one wanted to address shrinking storage because in the past mentality was that the storage only keeps getting cheaper, so no big deal if some wastage happens. Look where we are now in 2026 :)
miscdebris1123@reddit
Trim on thin provisioning and move on.
MightyBigMinus@reddit
the 'volume got expanded' as in the EBS volume?
just nuke the instance and start over, thats the whole point of ec2
netvora@reddit
xfs not supporting shrink has probably wasted more storage across infra teams than people want to admit honestly
we had one analytics box grow during a reindex months ago and now it’s sitting there mostly empty because nobody wants to schedule downtime just to move data between volumes again. technically fixable, practically everyone keeps postponing it forever
have you looked at any of the newer storage tools for this stuff or still keeping it manual?
DahliaDevsiantBop@reddit (OP)
still mostly manual on our side tbh. if it’s important enough we do the usual migrate/swap process during maintenance windows
what are you guys using?
PrimalPettalStash@reddit
a guy on our infra team was talking about something called Datafy a few weeks ago after getting stuck doing another late night storage migration for an old postgres box
haven’t used it myself yet so can’t really vouch for it, but from what he explained the whole point was avoiding the usual “new volume + rsync + maintenance window” mess when reclaiming space from oversized disks
might be worth looking into at least
netvora@reddit
been trying Datafy recently for some lower-risk systems
mostly because we got tired of doing the same copy-data-to-new-volume routine every time we wanted to reclaim space. still testing carefully but so far it’s been a lot less annoying than the old way
wezelboy@reddit
Better to build a temporary filesystem and symlink to it than to grow one in this case.
swing-line@reddit
Yeah now throw LUKS into the mix good luck