Cracking Password Protected PDF Files
Password cracking is the ability to discover the password using scripts and dictionaries. There are many tools currently available on Kali Linux and Parrot OS; however, as a programmer, you shouldn’t be relying on available tools but rather be capable of writing your own scripts for each occasion. There are also very many programming languages you can use for this purpose but I personally am fond of Python and as such will be using Python for the cause. In this tutorial, I’ll be showing you how to write a Python script to crack a password protected PDF file.
Creating a Password protected PDF file
If you’re using Ubuntu, then creating a Password protected PDF file can be very easy! For the purposes of this example, what we’re going to do is to create a word document in LibreOffice and then go to:
File > Export as > Export as PDF
Here, choose the settings you want for almost everything (such as image compression, etc…) and once chosen, go to the Security Tab.
Then click on the “Set Passwords” button.
Then enter the passwords you want and save the PDF in the location of your choosing.
I named my file secret.pdf. If I try to open it, this is what I get:
Ok, so our document is password protected!
Next thing, we need a dictionary!
SECLISTS
A dictionary is basically a file with potential passwords. So instead of typing by hand one password
after the other, we would use scripts and dictionaries which contain many many passwords to
automate the task. So the computer will try one by one each of the passwords in the dictionary file and
if the right password is found the we break out of the loop.
All Penetration Testers need to have SecLists on their computer!
When cracking passwords, you absolutely need SecLists too! Why? Because there’s a very large
database of passwords in it. Many of these lists of passwords are real world passwords obtained from
real hackers hacking real databases.
SecLists can be found at https://github.com/danielmiessler/SecLists.
Then let’s get a hold of SecLists using the following code:
wget -c https://github.com/danielmiessler/SecLists/archive/master.zip -O SecList.zip \ && unzip SecList.zip \ && rm -f SecList.zip
I have, for the purpose of this script, downloaded it into my Downloads folder.
The passwords can be found in:
SecLists-master/Passwords/
However, the one I’m going to use is a leaked password list called rockyou.txt which you can find in:
SecLists-master/Passwords/Leaked-Databases/rockyou.txt.tar.gz
To unzip it, type:
cd SecLists-master/Passwords/Leaked-Databases/ tar -xf rockyou.txt.tar.gz
Then I’m going to copy the file I want into the Downloads folder (I’m not typing long paths).
cp rockyou.txt ~/Downloads/rockyou.txt
So we need to note down the path to our dictionary. In my case, it’s:
/home/kalyani/Downloads/rockyou.txt
Python Script
So let’s start writing the Python script.
First I’m going to create a .py file.
nano pdfcracker.py
Let’s next install the following:
pip install PyPDF2
Why? Because I need python to be able to read my file!
Now let’s create the logic behind our script before we attempt to write it.
So:
Step 0: import the required modules
Step 1: Ask the user where the pdf is located and assign it a name
Step 2: Ask the user where the dictionary file is located and assign it a name
Step 3: Then open up the dictionary file in read mode and first of all strip newlines and the likes. Then, choose the first word in the dictionary and attempt it on the pdf file, then open up the second word in the dictionary and attempt it on the pdf file, etc…
Step 4: Create a TRY-EXCEPT situation. In it, you tell our pdf reader to open the pdf file and put password one in it and try to open it. If it opens, then we print that we found the password and break out of the loop!
Step 5: In the event that password one was not the correct password, we want it to go to password two, etc… In my case, I like to print that the system tried password one but that it did not work. I like to do this so that I know what it’s doing (like it hasn’t frozen or crashed).
The Code
Ok, so we’ve got out an outline, let’s start writing our code:
Step 0: import the required modules
import PyPDF2
Step 1: Ask the user where the pdf is located and assign it a name
pdfFilePath = input("Where is your password Protected PDF located? ")
Step 2: Ask the user where the dictionary file is located and assign it a name
dictionaryPath = input("Where is your dictionary file located? ")
Step 3: Then open up the dictionary file in read mode and first of all strip newlines and the likes. Then, choose the first word in the dictionary and attempt it on the pdf file, then open up the second word in the dictionary an attempt it on the pdf file, etc…
with open(dictionaryPath, 'r') as pb: for password in pb: password = password.strip()
Step 4: Create a TRY-EXCEPT situation. In it, you tell our pdf reader to open the pdf file and put password one in it and try to open it. If it opens, then we print that we found the password and break out of the loop!
try: collect = PyPDF2.PdfReader(pdfFilePath, False, password) print("THE PASSWORD TO THE PDF IS: ", password) break
Step 5: In the event that password one was not the correct password, we want it to go to password two, etc… In my case, I like to print that the system tried password one but that it did not work. I like to do this so that I know what it’s doing (like hasn’t frozen or crashed).
except: print("I'm attempting the password ", password, "--------> Did not Work")
The whole code would look like this:
import PyPDF2 pdfFilePath = input("Where is your password Protected PDF located? ") dictionaryPath = input("Where is your dictionary file located? ") with open(dictionaryPath, 'r') as pb: for password in pb: password = password.strip() try: collect = PyPDF2.PdfReader(pdfFilePath, False, password) print("THE PASSWORD TO THE PDF IS: ", password) break except: print("I'm attempting the password ", password, "--------> Did not Work")
The rockyou.txt is the password file you’d use in real life but for the sake of showing you, I created a second dictionary file called dict.txt (in the Downloads folder) with only like 5 or 6 passwords so that you can see what the output is like.
Next type:
python3 pdfcracker.py
This is what your output would look like:
And indeed, my password was tooru!
Happy Coding!!!