HIRING: Scrape 300,000 PDFs and Archive to 128 GB VERBATIM Discs

Posted by Atronem@reddit | Python | View on Reddit | 0 comments

Budget: 700$ plus required materials cost

We are seeking an operator to extract approximately 300,000 book titles from AbeBooks.com, applying specific filtering parameters that will be provided.

Once the dataset is obtained, the corresponding PDF files should be retrieved from the Wayback Machine or Anna’s Archive, when available. The estimated total storage requirement is around 4 TB. Data will be temporarily stored on a dedicated server during collection and subsequently transferred to 128 GB Verbatim or Panasonic optical discs for long-term preservation.

The objective is to ensure the archive’s readability and transferability for at least 100 years, relying solely on commercially available hardware and systems.