How to improve record keeping / querying of archived data?

Posted by MESltd@reddit | sysadmin | View on Reddit | 0 comments

Hi all, I am looking for some advice on how we can improve our data archiving and restore processes. My main question is how do people maintain records of what data they have stored?

---------

TLDR - Our current approaching of scanning drive directory structures and writing the output to html isn't fit for purpose when it comes to searching for archived files. Looking for advice for an alternative method that would allow end users to more efficiently search for/ know what data is available to them in older projects

---------

Currently we have 25 hard disks, storing approximately 120TB of data. These disks are duplicated, so we have 25 hard disks on site in a fire safe and a further 25 duplicate hard disks off site in a fire safe.

To record what is on each disk, we use an application called Snap2HTML which scans the drive and creates a navigable html file containing files and folders stored on the disk. If a user wants to request data to be restored, they go through these html files searching for what they need, then provide us with the hard disk number and path to the file(s) they want restored.

We have been experiencing some problems with hard disks failing to be read when we come to restore data, so we are hoping the paired off site disk is fine to restore the requested data and rebuild the on site disk.

To get around this, we are planning to assess different cloud providers and store this data with them instead of relying on our hard disks. We also want to improve how we document the archived files and make it easier for users to search our archive records for files. I am looking to find something that would work for us and our users. Ideally some form of database but I don't have much faith in our users being comfortable writing search queries beyond filling in a text box with a file/ project name.

This data isn't needed for disaster recovery or regulatory reasons. This is purely stored in case an old piece of work/report/file would be useful for a new, ongoing piece of work.

Thanks