Replacing duplicate files with hard links to save space?

Posted by Zarquan314@reddit | sysadmin | View on Reddit | 38 comments

Whenever I go from one computer to another, I always copy my important directories from my home folder to a backup location (separate from my standard backup solution as a sort-of snapshot of that computer when I stopped using it, which has been very useful). However, these folders often contain backups of previous computers, some of which have been unpacked and placed in the correct location on the computer I am moving out of.

For example, I looked through my backup and found 7 different copies of my entire music library. Most of the songs are exact copies, with some being added over time.

This hasn't been a problem, as storage sizes were increasing faster than my backups were (see XKCD 1718), but I've noticed that this trend has slowed down or stopped, so I was wanting to go through the many generations of old computer backups and do something about the duplicate data.

My thinking that it would be nice to have something that replaces identical copies of files with read-only hard links. That way, everything is still where I expect it in the directory tree, but there aren't a bunch of copies taking up actual disk space. And it being read-only prevents me from accidentally changing my "historical records". Is there a utility that can do that for me so I don't have to do it manually?

Or is there a better solution?

EDIT: I posted this earlier, but accidentally had the wrong title, so I deleted my first post and replaced it.