PDF Utilities


There’s been plenty of times I’ve had to work on PDFs, be it taking out certain pages to sign and adding them back, or even creating PDFs from scratch with images. Common advice would direct you to websites online with loads of different tools for this, but privacy concerns combined with having to wait for uploads and so on make them pretty inconvenient. I’ve found a couple of command line tools which make working with PDFs almost painless.


Img2pdf

Ever needed to combine a bunch of images into a PDF? There are different ways of doing this. For example ImageMagick can convert images into PDFs, however it’s lossy process and images will come out blurry. For a lossless conversion, I found Img2pdf. It produces crisp images with a simple to use command, while also having a lot of flexibility. I recommend checking out it’s -h output, it goes into an incredible amount of detail about what you can do.

 img2pdf img1.png img2.jpg -o output.pdf
or..
 img2pdf *.png -o output.pdf

PDF Toolkit

PDF Toolkit, or PDFTK, is a brilliant command suite of tools for working with PDFs in the terminal. I’ve used this for many things, even adding index metadata to a PDF that didn’t have any. There’s also much more features like rotating pages, password protecting files and so on that I haven’t gotten around to trying yet. I’ll like to an article below that covers all that.

Removing Pages

 pdftk input.pdf cat [pages-to-keep] output output.pdf
Example, remove page 2 and 6:
 pdftk input.pdf cat 1 3-5 7-end output output.pdf

Combining PDFs

pdftk input1.pdf input2.pdf output output.pdf

Separate PDF

This splits up an inputted PDF into individual PDF files.

pdftk input.pdf burst

Pulling Metdata

Updating metadata with pdftk requires firstly extracting the information from our chosen PDF. Then we add or remove what we want before combining the file with our PDF.

Data Dump

pdftk input.pdf \
data_dump \
output metadata.txt

Adding Bookmarks

Adding bookmarks to our metadata file follows a simple pattern. A single bookmark starts with the BookmarkBegin marker, and then we add BookmarkTitle which is the text displayed in the index and to the user. BookmarkLevel determines if the bookmark is a parent or child. Lastly BookmarkPageNumber is the page the bookmark links to. Here’s an example:

BookmarkBegin
BookmarkTitle: Chapter One
BookmarkLevel: 1
BookmarkPageNumber: 2
BookmarkBegin
BookmarkTitle: First Point
BookmarkLevel: 2
BookmarkPageNumber: 4

Updating Metadata

Once we’re done, we can update the original PDF and add the metadata file we just added our bookmarks to.

pdftk input.pdf \
update_info metadata.txt \
output output.pdf

Links