Looking for a tool to split up large PDF files.
Posted by WizardOfIF@reddit | sysadmin | View on Reddit | 10 comments
I have to work around two applications with some interesting limits. The first one produces a PDF files that is over 2 GBs in size. The second one will only ingest files of less than 250 MBs. I need to get the excessively large PDF into the restricted long term storage application.
For text files I just break them up into 100 MB size files and import them piece by piece and then stitch them back together in the long term storage.
I'm looking for a tool that I can call from a script and have it either break apart a PDF file by byte size adjusted for page breaks if possible or by page count if necessary.
Has anyone found a good tool that could handle this? The recombining of files is handled natively by the long term storage. It will pieve PDF files together but it isn't able to break them apart, not programatically anyway.
Mysterious-Guess-690@reddit
For a quick no-install web option, HugMyPDF (hugmypdf.com) has a Split PDF tool where you can split by page range or extract specific pages. Might not handle 2GB files directly in browser, but could be handy for smaller chunks once you've done an initial size-based split via script.
For your scripting use case specifically, pdftk or qpdf via CLI would be your best bet for byte-size splitting both free and scriptable. You could pipe the output through HugMyPDF's compress tool afterward to further shrink each chunk under that 250MB limit too. 👍
C_pyne@reddit
Pdf24
whitoreo@reddit
Arj will split up a file to an arbitrary size and can be used to re-combine the file after transport.
itskdog@reddit
Another shout for PDFsam. If you don't have Adobe licences that's the best tool I've found for tweaking PDFs.
nico851@reddit
Not sure if applicable for your case, but I normally just open the file and then selectively print pages like 1-5, 6-10 and so on.
Adam_Kearn@reddit
First I would try contacting the people who made the 2nd tool to see if the limit can be removed….
For now I would just create a powershell script that would “print to pdf” the first X amount of pages that is under the file limit.
Then print the remaining pages appending “ _part01” to each file incrementing the number.
Only-An-Egg@reddit
r/techsupport
timpkmn89@reddit
That would be complicated for PDFs because of how they're are structured, especially if there's embedded fonts/images that are repeated between pages.
Splitting by page count however could trivially be done with a few lines of Python code
Ferro_Giconi@reddit
I've never tried this on excessively large PDFs, but I do like this tool for quick and easy PDF page manipulation and splitting in smaller files. It may be worth a shot to see if it can handle the large PDFs you have. https://pdfsam.org/
One of the bullet points in their marketing is - Split PDF files visually selecting pages to split at, or split at given bookmarks level or **in files of a given size **
cirquefan@reddit
PDFSam is great BUT you have to be very careful when installing to DEselect the "premium" features, otherwise you're installing a trial version of their paid product. Not that it's not a good deal, because it is, but if you just need/want the free functionality you need to be careful when you install.