What are you using for multi-petabyte backup targets?

Posted by cezaryd@reddit | sysadmin | View on Reddit | 15 comments

We’ve been working on backup storage for large environments for years, and one problem keeps coming back:

Traditional backup targets don’t scale well beyond a certain point — performance drops and cost per TB goes up fast.

I’ve seen deployments with 100+ nodes and \~100+ PB of logical backup data in a single grid that avoid many of those bottlenecks, which made me curious how others are approaching this at scale.

Curious how others here are handling:

multi-petabyte backup targets
backup performance vs cost trade-offs

Would really appreciate feedback (or criticism) from people running large environments.

[-]

Kumorigoe@reddit

Sorry, it seems this comment or thread has violated a sub-reddit rule and has been removed by a moderator.

Do Not Conduct Marketing Operations Within This Community.

It is not acceptable to advertise a product, service, Blog or FOSS Project within this community outside of authorized threads.
It is not acceptable to perform product research or market research within this community without permission.
The Reddit advertising system exists to help you reach out to new or existing customers.
Product Representatives are free to discuss their product in the context of an existing, naturally-occurring discussion. Astroturfing is not permitted.
As always, users must disclose any affiliation with a product.
Content creators should refrain from directing this community to their own content.

Your content may be better suited for our companion sub-reddit: /r/SysAdminBlogs

If you wish to appeal this action please don't hesitate to message the moderation team.

[-]

chefkoch_@reddit

What kind of data do you need to backup?

[-]

cezaryd@reddit (OP)

For example all backups of a large bank.

[-]

thortgot@reddit

Literally no bank in the world is 100 PB

[-]

Sab159@reddit

Smells like next post you'll try to sell us something

[-]

gumbrilla@reddit

Well they are only curious, not genuinely curious.. are you sure?

[-]

Evil-Bosse@reddit

That's why I store all of my multi petabyte backups on floppy disks, I have 200 dedicated staff that just run around replacing floppies, so I can have full daily backups of my Linux ISO hoard

[-]

elatllat@reddit

Backblase does 17:3 data:parity shards on 1,000 PB

NASA did 4 PB via sneakernet

rsync can read just the date of every file, read only the changed files, and only send the changed parts of the files.

rsync can be wrapped in a move detection script but better to just use

btrfs/zfs/postgresql/etc that do cow/wal/etc.

[-]

Account with one comment and a ChatGPT written post about something relatively specialised... Yeah, there's gonna be a second new account come along that suggests some random abstract service that nobody's every heard of.

[-]