shrinking filesystems still feels way too painful in 2026

Posted by DahliaDevsiantBop@reddit | linuxadmin | View on Reddit | 42 comments

ran into this again today and just need a sanity check from other linux admins.

we have a few linux boxes on ec2 and some bare metal that run data-heavy services. one job went sideways during a patch/cleanup window and dumped a bunch of temp data/logs. disk usage got high, so the volume got expanded to keep things from falling over.

cleanup finished later and actual usage dropped way back down.

so now we have a big mostly-empty volume sitting there.

growing the thing was easy. shrinking it back down is where everything gets annoying.

with xfs, there’s no shrink. with ext4, you’re basically looking at unmounting and doing it carefully. in practice that usually turns into:

new smaller volume
rsync data over
stop services
final sync
swap mounts/uuids
pray the old app doesn’t hate you

monitoring/cost tools can tell us “hey, you’re wasting storage,” but from the linux side the answer is usually “yeah, and i’d rather waste storage than break a stable system.”

how are people handling this now?

do you just accept that live filesystems are mostly a one-way street, or has anyone found a cleaner way to reclaim space without doing the whole migration dance?

[-]

bitcraft@reddit

Shrinking a filesystem is very rare. These days it’s usually a matter of changing the config and rebuilding the image. If you run into this problem often, maybe it’s better to fix your process to avoid needing it.

[-]

orev@reddit

They pretty clearly outlined a decent use-case on what happened, so saying they need to completely rearchitect their whole system because of a one-off issue isn’t helpful at all. Neither is the fantasy that every single system can be reduced down to a container/image that is fully idempotent with never a need to login to it interactively.

Sure, it’s possible for people running at huge scale with hundreds of clones of the same system, but it’s not realistic for more dynamic and varied environments.

We need to stop acting like everything can be solved by moving to a fully automated, read-only, devops style approach, and stop giving advice as if that’s the only way to do things.

[-]

alive1@reddit

Look..no matter your architecture, treating systems like unique unicorns that can't just be shot in the head and reprovisioned on the spot is an anti pattern. If you are at that level of babying a system, you can probably afford the extra disk storage or have the engineering capability to right-size your storage to begin with. Having to shrink your file system presumes so many anti patterns that nobody competent enough to implement it will need the feature in the first place. This is the reality of why shrinking filesystems is so rare.

[-]

orev@reddit

As I said, there are situations where the “just reprovision it” idea works, but there are many more where it just doesn’t. Usually people who think that it’s the only way also conveniently ignore all the other ancillary systems that need to be there to support the “main” system. While it might make sense for the “main” system to be setup this way, (fully automated, etc.), it’s just not the reality that *every* system is like this. It’s just not possible given most companies pressures and (lack of) resources.

[-]

wossack@reddit

we came from AIX with jfs2 filesystems (a long time ago) and we had a lot of (bad in retrospect) processes around the dynamic growth/contract of filesystems that provides..

We changed to Red Hat & xfs >10 years ago, and we were intially concerned alright, but we all adapted very quickly - it rarely comes up these days.

[-]

ksquires1988@reddit

I miss very little about aix , one exception being the ability to shrink filesystems on the fly

[-]

rmeman@reddit

No it's not. I do it routinely with zfs, remove vdevs, add vdevs. It automatically rebalances. I think btrfs can do it too

[-]

Hotshot55@reddit

If you're routinely shrinking filesystems, then all that really means is you're routinely over-provisioning your infrastructure.

[-]

oracleofnonsense@reddit

Zfs/btrfs — these are also relatively rare compared to the use of xfs and ext3/4.

kai_ekael@reddit

I smell lack of LVM. Too bad.

Cracknel@reddit

LVM doesn't help if the filesystem doesn't support shrinking 😅

user3872465@reddit

There should never be a reason to shrink your FS.

Start small:

Increase size

If you have a big enough service its not usable anymore:

Migrate and Divide it.

'Never be a reason' except when there is.

Yes, and there should never be bugs in software..... /S

Sure, and wheren there is a reason, there is a way.

Create new, copy, delete old.

Not that complicated.

mgedmin@reddit

Back before Linux supported anti-aliased fonts, Linux forums were full of advice of the sort "anti-aliased fonts are bad you don't need them".

When Xft got introduced and Linux suddenly could do anti-aliased fonts, those posts suddenly stopped appearing.

nut-sack@reddit

Observability system. Someone creates a high cardinality metric, chews up 3TB before you can make the team kill it. Now you either delete the metrics, or let them expire out. What do you do about that extra space?
Your alternative was to just let it run out of space and impact observability for the whole org, or you grow it to buy you some time.

chews up 3TB before you can make the team kill it. Now you either delete the metrics, or let them expire out. What do you do about that extra space?

Are you saying the solution to a bad data collection rule is just to limit how much space it can waste? It's still going to max out the allowed usage and cause a problem either way.

Such a system should have been a Statge system. Further it should have the Data stored onto a Seperate Volume from its root.

In which case you delete and give a new smaller volume.

If it was production because you tested in prod, well then You leave it to have the 3TB beause you tend to observer more in the future where you probably need that space anyway.

Or: Create new, copy, delete.

Tho Personally I would just let the system run out into its limit since its on a different volume you can look at it fix it and it shouldt impact much.

Unnamed-3891@reddit

Newsflash: sometimes a need for space that used to valid ceases to be

Migrate it as its sol old you should redo and re install what you did.

Or Better: If it ceses to be you can delete it entirely

Isn’t it great how you forseen all possible circumstances and deems it so for all of us. Thank you, m’lord.

Anytime. Else you get so much legacy stuff and so much extra work for yourself.

I mean you can do whatever you wanna do ofc. But my point is a matter of best practice.

Totally going to be planning datastore migrations after a single large file or directory are deleted 🤦‍♂️

How big are your files or directories in your DB that you would entertain Shrinking a volume to be worth it? 500TB?

Generally your DB or Fileservers have a %per Year Groth Sure you delete some stuff sometimes, but that sometimes should be caught up by the groth of the system at large.

Unless of you use your production system to migrate a file/db X times the size of the system itself in which case you are doing something wrong and using the wrong tool

chkno@reddit

If data needs to be stored, it goes in some data-storage service. Local filesystems are temporary, ephemeral things that last for the duration of an instance. Every software update creates fresh instances. No pets.

autogyrophilia@reddit

Shrinking filesystems should be a last resort.

Trim the filesystem and use thin provisioning.

eltear1@reddit

Can you explain your sentence trim the filesystem ?

meditonsin@reddit

Trim is a mechanism to let the underlying hardware know which blocks are not used by the filesystem anymore. This came originally about for SSDs, so their wear leveling knows which blocks it can re-use, but it's also useful for virtual disks.

Without trim, thin provisioned disks will likely eventually grow into their allocated maximum size. With trim, they only take up actually utilized space.

deeseearr@reddit

You can run fstrim on a filesystem to mark any unused blocks as UNMAPped. It's the same command you would use to clean up an SSD, but if you are using virtual disks in vmware and have them set up correctly with thin provisioning enabled (Please RTFM for everything else that you need to do) then that will also tell the ESX host that it can release the unmapped space, shrinking the file which backs that virtual disk on the host.

You won't see any difference on the VM, as it still sees the full size being available, but through the magic of thin provisioning the actual space taken up on the ESX server will be reduced.

dougs1965@reddit

Logs on a separate FS on the original setup before you start live operations.
Normal operation.
Whoops, log FS suddenly got huge for some reason.
Make new empty small file system.
Stop services, unmount old log FS, mount new empty FS, start services. Total downtime probably quite short.
Clear up the mess on the old log FS at leisure, archive what you need and throw away the rest.

Dangle76@reddit

Tbh I’d even say offload the logs into a central log storage system and use logrotate that way you never have to do any of this

peteShaped@reddit

100% this - you should try to avoid storing more than "last N days/weeks" logs on any local system

Rubenel@reddit

Good points.

alexkey@reddit

No one wanted to address shrinking storage because in the past mentality was that the storage only keeps getting cheaper, so no big deal if some wastage happens. Look where we are now in 2026 :)

miscdebris1123@reddit

Trim on thin provisioning and move on.

MightyBigMinus@reddit

the 'volume got expanded' as in the EBS volume?

just nuke the instance and start over, thats the whole point of ec2

netvora@reddit

xfs not supporting shrink has probably wasted more storage across infra teams than people want to admit honestly

we had one analytics box grow during a reindex months ago and now it’s sitting there mostly empty because nobody wants to schedule downtime just to move data between volumes again. technically fixable, practically everyone keeps postponing it forever

have you looked at any of the newer storage tools for this stuff or still keeping it manual?

DahliaDevsiantBop@reddit (OP)

still mostly manual on our side tbh. if it’s important enough we do the usual migrate/swap process during maintenance windows

what are you guys using?

PrimalPettalStash@reddit

a guy on our infra team was talking about something called Datafy a few weeks ago after getting stuck doing another late night storage migration for an old postgres box

haven’t used it myself yet so can’t really vouch for it, but from what he explained the whole point was avoiding the usual “new volume + rsync + maintenance window” mess when reclaiming space from oversized disks

might be worth looking into at least

been trying Datafy recently for some lower-risk systems

mostly because we got tired of doing the same copy-data-to-new-volume routine every time we wanted to reclaim space. still testing carefully but so far it’s been a lot less annoying than the old way

wezelboy@reddit

Better to build a temporary filesystem and symlink to it than to grow one in this case.

swing-line@reddit

Yeah now throw LUKS into the mix good luck