lessthan100
Supremacy Member
- Joined
- Dec 18, 2011
- Messages
- 5,503
- Reaction score
- 996
https://www.sentinelone.com/blog/malicious-pdfs-revealing-techniques-behind-attacks/
In some kinds of malicious PDF attacks, the PDF reader itself contains a vulnerability or flaw that allows a file to execute malicious code. Remember that PDF readers aren’t just applications like Adobe Reader and Adobe Acrobat. Most browsers contain a built-in PDF reader engine that can also be targeted. In other cases, attackers might leverage AcroForms or XFA Forms, scripting technologies used in PDF creation that were intended to add useful, interactive features to a standard PDF document.
“One of the easiest and most powerful ways to customize PDF files is by using JavaScript.” (Adobe)
To get a better understanding of how such attacks work, let’s look at a typical PDF file structure. We can safely open a PDF file in a plain text editor to inspect its contents. At first glance, it might look indecipherable:
However, with a bit of knowledge of PDF file structure, we can start to see how to decode this without too much trouble. The body or contents of a PDF file are listed as numbered “objects”. These begin with the object’s index number, a generation number and the “obj” keyword, as we can see at lines 3 and 19, which show the start of the definitions for the first two objects in the file:
1 0 obj
2 0 obj
The end of each object is signalled with the keyword endobj, as seen at lines 18 and 24 for Object 1 and Object 2, respectively.
Object 2 immediately offers us some clues. We can see that it contains a dictionary (signalled by the chevrons << and >>. The dictionary has an entry for a JavaScript stream and a reference to Object 1:
JS 1 0 R
This tells us that the “garbage” code in Object 1 between the keywords stream (line 8) and endstream (line 15) is actually a JavaScript stream. Even better, Object 1’s dictionary is kind enough to tell us how to decode it. Line 6 specifies a “filter” of value “FlateDecode”. We can now write a quick-and-dirty Python script that decompresses the stream into plain JavaScript:
As we’ve pointed out before, one thing you need to get used to when doing this kind of work is tidying up code to make it easier to work on. Here’s the same code after running it through a beautifier or prettifier in Sublime Text:
Now we can read the JavaScript and determine if it’s malicious or not. In this case, the code appears to be contacting a domain called “readnotify.com”. Making callbacks (“phoning home”) without user consent shows at least a lack of concern for user privacy. For people working in journalism or in politically-sensitive areas this could be a serious issue, as this kind of callback can reveal the user’s IP address, operating system and browser version to a remote server.
19ac1c943d8d9e7b71404b29ac15f37cd230a463003445b47441dc443d616afd
As the image from VT makes clear, this is some kind of trojan that’s exploiting CVE-2018-4993. Let’s open it up and take a look inside.
This is a very small file. There’s only 4 objects, but the one that interests us is Object 3 and the value for the dictionary key /AA. Note that this contains a child dictionary with key name /O. That’s important because the /O key specifies actions that should occur when a document is opened. And the value of this key is itself another dictionary containing /JS, indicating yet again some encoded JavaScript.
Unlike our previous file, however, this one does not specify a filter. Luckily, the value of “JS” is clearly recognisable as octal encoding. Octal (or “oct”) uses three digits between 0 and 7 to specify a single value. The best thing about oct is we don’t need to roll up our Python sleeves to interpret it; we can just print it out directly on the command line:
As printf shows, the octals represent the same kind of JavaScript call that we saw in the previous example, leveraging the this.submitForm() function.
Going back to the /AA dictionary in the PDF, note the two lines which specify
/S /GoToR
This code issues the “Go To Remote” action, telling the reader application to jump to the destination specified under the /F key.
Malicious PDFs | Revealing the Techniques Behind the Attacks
Most of us are no strangers to phishing attempts, and over the years we’ve kept you informed about the latest tricks used by attackers in the epidemic of phishing and spear-phishing campaigns that plague, in particular, email users. Like other files that can come as attachments or links in an email, PDF files have received their fair share of attention from threat actors, too. In this post, we’ll take you on a tour of the technical aspects behind malicious PDF files: what they are, how they work, and how we can protect ourselves from them.
How Do PDF Files Execute Code?
Regular readers of the SentinelOne blog will be familiar with the idea of malicious Office attachments that run VBA code from Macros or use DDE to deliver attacks, but not so well-known is how PDFs can execute code.In some kinds of malicious PDF attacks, the PDF reader itself contains a vulnerability or flaw that allows a file to execute malicious code. Remember that PDF readers aren’t just applications like Adobe Reader and Adobe Acrobat. Most browsers contain a built-in PDF reader engine that can also be targeted. In other cases, attackers might leverage AcroForms or XFA Forms, scripting technologies used in PDF creation that were intended to add useful, interactive features to a standard PDF document.
“One of the easiest and most powerful ways to customize PDF files is by using JavaScript.” (Adobe)
To get a better understanding of how such attacks work, let’s look at a typical PDF file structure. We can safely open a PDF file in a plain text editor to inspect its contents. At first glance, it might look indecipherable:
However, with a bit of knowledge of PDF file structure, we can start to see how to decode this without too much trouble. The body or contents of a PDF file are listed as numbered “objects”. These begin with the object’s index number, a generation number and the “obj” keyword, as we can see at lines 3 and 19, which show the start of the definitions for the first two objects in the file:
1 0 obj
2 0 obj
The end of each object is signalled with the keyword endobj, as seen at lines 18 and 24 for Object 1 and Object 2, respectively.
Object 2 immediately offers us some clues. We can see that it contains a dictionary (signalled by the chevrons << and >>. The dictionary has an entry for a JavaScript stream and a reference to Object 1:
JS 1 0 R
This tells us that the “garbage” code in Object 1 between the keywords stream (line 8) and endstream (line 15) is actually a JavaScript stream. Even better, Object 1’s dictionary is kind enough to tell us how to decode it. Line 6 specifies a “filter” of value “FlateDecode”. We can now write a quick-and-dirty Python script that decompresses the stream into plain JavaScript:
Cleaning Up the Code
Our Python script churns out the JavaScript perfectly but not exactly beautifully:
As we’ve pointed out before, one thing you need to get used to when doing this kind of work is tidying up code to make it easier to work on. Here’s the same code after running it through a beautifier or prettifier in Sublime Text:
Now we can read the JavaScript and determine if it’s malicious or not. In this case, the code appears to be contacting a domain called “readnotify.com”. Making callbacks (“phoning home”) without user consent shows at least a lack of concern for user privacy. For people working in journalism or in politically-sensitive areas this could be a serious issue, as this kind of callback can reveal the user’s IP address, operating system and browser version to a remote server.
More Malicious JavaScript
Compressed streams aren’t the only way PDF files can contain obfuscated code. Here’s another that looks a bit more of a worry when we look at its hash on VirusTotal:19ac1c943d8d9e7b71404b29ac15f37cd230a463003445b47441dc443d616afd
As the image from VT makes clear, this is some kind of trojan that’s exploiting CVE-2018-4993. Let’s open it up and take a look inside.
This is a very small file. There’s only 4 objects, but the one that interests us is Object 3 and the value for the dictionary key /AA. Note that this contains a child dictionary with key name /O. That’s important because the /O key specifies actions that should occur when a document is opened. And the value of this key is itself another dictionary containing /JS, indicating yet again some encoded JavaScript.
Unlike our previous file, however, this one does not specify a filter. Luckily, the value of “JS” is clearly recognisable as octal encoding. Octal (or “oct”) uses three digits between 0 and 7 to specify a single value. The best thing about oct is we don’t need to roll up our Python sleeves to interpret it; we can just print it out directly on the command line:
As printf shows, the octals represent the same kind of JavaScript call that we saw in the previous example, leveraging the this.submitForm() function.
Going back to the /AA dictionary in the PDF, note the two lines which specify
/S /GoToR
This code issues the “Go To Remote” action, telling the reader application to jump to the destination specified under the /F key.



