mat2 - a CLI tool for PDFs everyone should have nowadays

rado84

Well-Known Member
Joined
Feb 25, 2019
Messages
1,083
Reaction score
921
Credits
8,303
Nowadays the most of the PDF files were made in Windows by a Windows program. Windows spies on you, the programs for it take advantage of the status quo and also spy on you via the PDF files. As a result, one PDF could grow in size exponentially bc of hidden metadata that is included in it. In the past few days I saw 2 PDF files of 20+ pages that tried to kill my soul (I think they were meant as presentations of something but I didn't dig into them to read them bc reading them wasn't my job, it was fixing them) and one of them had the astronomical size of... 87.7 MiB!!! With that size a few PDF readers for Linux simply crashed attempting to load it and 3 different browsers completely froze until they fully loaded the file. All because of the spyware metadata in it. And, as if all that wasn't enough, some older models of printers spit errors and refused to print the document.

But all these 11 years with Linux taught me one thing:
And if at first you don't succeed
Then dust yourself off and try again
You can dust it off and try again, try again
(the above quote is from a song: Aaliyah feat. Timbaland - Try Again)

So I didn't give up and found another way - the small but very powerful and easy to use CLI tool named mat2. In Arch Linux it's called mat2, in other distros the package name might be a little different.

In principle the syntax to clean a PDF file from all the hidden data cound't be simpler than this:
Code:
mat2 filename.pdf
It will read the file and wipe out ALL hidden metadata, then it will create a new file named filename.cleaned.pdf. For a single PDF file this syntax is OK. But if you work with LOTS and LOTS of PDF files a day, it WILL become annoying very quickly, so I wrote a loop script that doesn't require for you to copy and paste any names every time a spying PDF comes your way:

Code:
      for name in *.pdf; do
        newname="${name%.pdf}"
        mat2 "$newname".pdf
      done

You can put that in a file and save it as (example) fixpdf.sh (and ofc make it executable: chmod +x fixpdf.sh, otherwise it won't work). CD (changedir) to where the PDF files are and call it either by alias or directly: /path/to/fixpdf.sh filename.pdf.

Guess what happened to that ginormous presentantion after I cleaned it with this tool? From 87.7 MiB it became just 14.3 MiB which every single program (+ the browsers) loaded instantly without crashing or freezing.

P.S. If you can't find mat2 for your distro, here's a little... "magic" that most people don't know: only the package names are different. The contents inside (paths, binaries, language files and so on) is the same for all distros. So, if you can't find it for your distro, you can always download it from the Arch repo: https://archlinux.org/packages/extra/any/mat2/ unpack it somewhere and then copy the unpacked /usr to your root. The chances for you to already have the dependencies installed are very high, even with Mint. In case you don't, you'll have to check the dependencies listed in the link and install them from your distro's repo and then download mat2 from Arch.
 


Follow Linux.org

Members online


Top