There’s been plenty of times I’ve had to work on PDFs, be it taking out certain pages to sign and adding them back, or even creating PDFs from scratch with images. Common advice would direct you to websites online with loads of different tools for this, but privacy concerns combined with having to wait for uploads and so on make them pretty inconvenient. I’ve found a couple of command line tools which make working with PDFs almost painless.
Ever needed to combine a bunch of images into a PDF? There are different ways of doing this. For example ImageMagick can convert images into PDFs, however it’s lossy process and images will come out blurry. For a lossless conversion, I found Img2pdf. It produces crisp images with a simple to use command, while also having a lot of flexibility. I recommend checking out it’s -h output, it goes into an incredible amount of detail about what you can do.
img2pdf img1.png img2.jpg -o output.pdf or.. img2pdf *.png -o output.pdf
PDF Toolkit, or PDFTK, is a brilliant command suite of tools for working with PDFs in the terminal. I’ve used this for many things, even adding index metadata to a PDF that didn’t have any. There’s also much more features like rotating pages, password protecting files and so on that I haven’t gotten around to trying yet. I’ll like to an article below that covers all that.
pdftk input.pdf cat [pages-to-keep] output output.pdf Example, remove page 2 and 6: pdftk input.pdf cat 1 3-5 7-end output output.pdf
pdftk input1.pdf input2.pdf output output.pdf
This splits up an inputted PDF into individual PDF files.
pdftk input.pdf burst
Updating metadata with pdftk requires firstly extracting the information from our chosen PDF. Then we add or remove what we want before combining the file with our PDF.
pdftk input.pdf \ data_dump \ output metadata.txt
Adding bookmarks to our metadata file follows a simple pattern. A single bookmark starts with the BookmarkBegin marker, and then we add BookmarkTitle which is the text displayed in the index and to the user. BookmarkLevel determines if the bookmark is a parent or child. Lastly BookmarkPageNumber is the page the bookmark links to. Here’s an example:
BookmarkBegin BookmarkTitle: Chapter One BookmarkLevel: 1 BookmarkPageNumber: 2 BookmarkBegin BookmarkTitle: First Point BookmarkLevel: 2 BookmarkPageNumber: 4
Once we’re done, we can update the original PDF and add the metadata file we just added our bookmarks to.
pdftk input.pdf \ update_info metadata.txt \ output output.pdf