What are you using for multi-petabyte backup targets?
Posted by cezaryd@reddit | sysadmin | View on Reddit | 15 comments
We’ve been working on backup storage for large environments for years, and one problem keeps coming back:
Traditional backup targets don’t scale well beyond a certain point — performance drops and cost per TB goes up fast.
I’ve seen deployments with 100+ nodes and \~100+ PB of logical backup data in a single grid that avoid many of those bottlenecks, which made me curious how others are approaching this at scale.
Curious how others here are handling:
- multi-petabyte backup targets
- backup performance vs cost trade-offs
Would really appreciate feedback (or criticism) from people running large environments.
Kumorigoe@reddit
Sorry, it seems this comment or thread has violated a sub-reddit rule and has been removed by a moderator.
Do Not Conduct Marketing Operations Within This Community.
Your content may be better suited for our companion sub-reddit: /r/SysAdminBlogs
If you wish to appeal this action please don't hesitate to message the moderation team.
chefkoch_@reddit
What kind of data do you need to backup?
cezaryd@reddit (OP)
For example all backups of a large bank.
thortgot@reddit
Literally no bank in the world is 100 PB
Sab159@reddit
Smells like next post you'll try to sell us something
gumbrilla@reddit
Well they are only curious, not genuinely curious.. are you sure?
Evil-Bosse@reddit
That's why I store all of my multi petabyte backups on floppy disks, I have 200 dedicated staff that just run around replacing floppies, so I can have full daily backups of my Linux ISO hoard
elatllat@reddit
Backblase does 17:3 data:parity shards on 1,000 PB
NASA did 4 PB via sneakernet
rsync can read just the date of every file, read only the changed files, and only send the changed parts of the files.
rsync can be wrapped in a move detection script but better to just use
btrfs/zfs/postgresql/etc that do cow/wal/etc.
C39J@reddit
Account with one comment and a ChatGPT written post about something relatively specialised... Yeah, there's gonna be a second new account come along that suggests some random abstract service that nobody's every heard of.
petr_bena@reddit
probably vibecoded as well
cezaryd@reddit (OP)
wrong guess
cezaryd@reddit (OP)
It is not am abstract service, the technology is deployed with 100+ node grids.
C39J@reddit
Hahaha wow didn't even deploy the secondary account 🤣
cezaryd@reddit (OP)
No, i am interested what architectures/products are deploeyd to address these challenges.
Skrunky@reddit
This is very clearly astroturfing.