Din know PDF so dangerous one can actually execute malicious code!

lessthan100 · Aug 28, 2021

https://www.sentinelone.com/blog/malicious-pdfs-revealing-techniques-behind-attacks/

Malicious PDFs | Revealing the Techniques Behind the Attacks

Most of us are no strangers to phishing attempts, and over the years we’ve kept you informed about the latest tricks used by attackers in the epidemic of phishing and spear-phishing campaigns that plague, in particular, email users. Like other files that can come as attachments or links in an email, PDF files have received their fair share of attention from threat actors, too. In this post, we’ll take you on a tour of the technical aspects behind malicious PDF files: what they are, how they work, and how we can protect ourselves from them.

How Do PDF Files Execute Code?

Regular readers of the SentinelOne blog will be familiar with the idea of malicious Office attachments that run VBA code from Macros or use DDE to deliver attacks, but not so well-known is how PDFs can execute code.

In some kinds of malicious PDF attacks, the PDF reader itself contains a vulnerability or flaw that allows a file to execute malicious code. Remember that PDF readers aren’t just applications like Adobe Reader and Adobe Acrobat. Most browsers contain a built-in PDF reader engine that can also be targeted. In other cases, attackers might leverage AcroForms or XFA Forms, scripting technologies used in PDF creation that were intended to add useful, interactive features to a standard PDF document.

“One of the easiest and most powerful ways to customize PDF files is by using JavaScript.” (Adobe)

To get a better understanding of how such attacks work, let’s look at a typical PDF file structure. We can safely open a PDF file in a plain text editor to inspect its contents. At first glance, it might look indecipherable:

However, with a bit of knowledge of PDF file structure, we can start to see how to decode this without too much trouble. The body or contents of a PDF file are listed as numbered “objects”. These begin with the object’s index number, a generation number and the “obj” keyword, as we can see at lines 3 and 19, which show the start of the definitions for the first two objects in the file:

1 0 obj
2 0 obj

The end of each object is signalled with the keyword endobj, as seen at lines 18 and 24 for Object 1 and Object 2, respectively.

Object 2 immediately offers us some clues. We can see that it contains a dictionary (signalled by the chevrons << and >>. The dictionary has an entry for a JavaScript stream and a reference to Object 1:

JS 1 0 R

This tells us that the “garbage” code in Object 1 between the keywords stream (line 8) and endstream (line 15) is actually a JavaScript stream. Even better, Object 1’s dictionary is kind enough to tell us how to decode it. Line 6 specifies a “filter” of value “FlateDecode”. We can now write a quick-and-dirty Python script that decompresses the stream into plain JavaScript:

Cleaning Up the Code

Our Python script churns out the JavaScript perfectly but not exactly beautifully:

As we’ve pointed out before, one thing you need to get used to when doing this kind of work is tidying up code to make it easier to work on. Here’s the same code after running it through a beautifier or prettifier in Sublime Text:

Now we can read the JavaScript and determine if it’s malicious or not. In this case, the code appears to be contacting a domain called “readnotify.com”. Making callbacks (“phoning home”) without user consent shows at least a lack of concern for user privacy. For people working in journalism or in politically-sensitive areas this could be a serious issue, as this kind of callback can reveal the user’s IP address, operating system and browser version to a remote server.

More Malicious JavaScript

Compressed streams aren’t the only way PDF files can contain obfuscated code. Here’s another that looks a bit more of a worry when we look at its hash on VirusTotal:

19ac1c943d8d9e7b71404b29ac15f37cd230a463003445b47441dc443d616afd

As the image from VT makes clear, this is some kind of trojan that’s exploiting CVE-2018-4993. Let’s open it up and take a look inside.

This is a very small file. There’s only 4 objects, but the one that interests us is Object 3 and the value for the dictionary key /AA. Note that this contains a child dictionary with key name /O. That’s important because the /O key specifies actions that should occur when a document is opened. And the value of this key is itself another dictionary containing /JS, indicating yet again some encoded JavaScript.

Unlike our previous file, however, this one does not specify a filter. Luckily, the value of “JS” is clearly recognisable as octal encoding. Octal (or “oct”) uses three digits between 0 and 7 to specify a single value. The best thing about oct is we don’t need to roll up our Python sleeves to interpret it; we can just print it out directly on the command line:

As printf shows, the octals represent the same kind of JavaScript call that we saw in the previous example, leveraging the this.submitForm() function.

Going back to the /AA dictionary in the PDF, note the two lines which specify

/S /GoToR

This code issues the “Go To Remote” action, telling the reader application to jump to the destination specified under the /F key.

lessthan100 · Aug 28, 2021

Stealing Credentials with an SMB Attack

We can use cURL to grab the headers from that IP address to see what we can learn.

Looks like we need some authentication to get past the server, and that’s exactly where the danger lies for Windows users. If the attacker has set up the remote file as an SMB share, then the crafted PDF’s attempt to jump to that location will cause an exchange between the user’s machine and the attacker’s server in which the user’s NTLM credentials are leaked.

This happens because when a user tries to access SMB shared files, Windows sends the user name and a hashed password to automatically try to log in. Although the hashed password is not the user’s actual password, the leaked credentials can both be used to set up SMB Relay attacks and, if the password is not particularly strong, the plain-text version can easily be retrieved from the hash by automated password-cracking tools.

Let’s see what VT makes of the IP address.

This host has a reputation as malicious, so there’s a good chance that this PDF file is, as suspected, trying to capture the user’s NTLM credentials.

Another Day, Another Callback

In January this year, another kind of callback flaw was spotted in XFA forms. XFA (also known as “Adobe LiveCyle”) was introduced by Adobe in PDF v1.5 and allows PDFs to dynamically resize fields within a document, among other things. Unfortunately, XFA also lends itself to misuse. As explained in this POC, a stream can contain an xml-stylesheet that can also be used to initiate a direct connection to a remote server or SMB share.

In this stream, the reader will parse the URL and immediately attempt a connection. Although there are no known cases of this method being used in the wild to date, the researcher tested it against Adobe Acrobat Reader DC, version 19.010.20069.

Protecting Against PDF Attacks

It’s impossible to tell whether a PDF file contains a credential stealing-callback or malicious JavaScript before opening it, unless you actually inspect it in the ways we’ve shown here. Of course, for most users and most use cases, that’s not a practical solution.

There are, however, a couple of things you can do on the user-side. Most readers and browsers will have some form of JavaScript control. In Adobe’s Acrobat Reader DC, for example, you can disable Acrobat JavaScript in the Preferences and manage access to URLs. Similarly, with a bit of effort, users can also customize how Windows handles NTLM.

While these mitigations are “nice to have” and certainly worth considering, bear in mind that these features were added, just like MS Office Macros, to improve usability and productivity. Therefore, be sure that you’re not disabling some functionality that is an important part of your own or your organization’s workflow.

For enterprise situations, you should ensure you have a good EDR security solution that can offer both full visibility into your network traffic, including encrypted communications, and which can offer comprehensive Firewall control. Of course, in these days, behavioral AI detection is a must-have to properly protect your network and assets from all attacks, including malicious PDF. SentinelOne customers can, in addition, scan PDF documents before they are accessed with our Nexus Embedded SDK.

Conclusion

Leveraging malicious PDFs is a great tactic for threat actors as there’s no way for the user to be aware of what code the PDF runs as it opens. Both the file format and file readers have a long history of exposed and, later, patched flaws. Because of the useful, dynamic features included in the document format, it’s reasonable to assume further flaws will be exposed and exploited by adversaries. With the ever-increasing tide of phishing and social engineering tactics targeting users, it’s vital that you remain vigilant about the dangers of PDFs and deploy a Next Gen security solution to prevent attacks.

motorcyclenumber · Aug 28, 2021

yah nowadays no need macros le. macros is super passe.

lessthan100 · Aug 28, 2021

Din know nowadays open PDF files also can tio virus

SpecialKeyboardService · Aug 29, 2021

My fren got warned moi b4.
Just manipulate some settings , the exe. will pop out from behind the pdf format.

Pdf are like girls, you think they are nice and after you merry, they show u the true colors

lessthan100 · Aug 29, 2021

A gentle reminder to all - remember to scan your PDF attachment before open it

Machiavel · Aug 29, 2021

Wa …. Sibei long

zzzzzzz · Aug 29, 2021

Got one edmwer kana
@articland05
:frown:

Shalomp · Aug 29, 2021

lessthan100 said:
A gentle reminder to all - remember to scan your PDF attachment before open it

Very good advise

matrix05 · Aug 29, 2021

Jia lat, this only for Win10 or Android also ?

ngsteve · Aug 29, 2021

lessthan100 said:
https://www.sentinelone.com/blog/malicious-pdfs-revealing-techniques-behind-attacks/

Malicious PDFs | Revealing the Techniques Behind the Attacks
Most of us are no strangers to phishing attempts, and over the years we’ve kept you informed about the latest tricks used by attackers in the epidemic of phishing and spear-phishing campaigns that plague, in particular, email users. Like other files that can come as attachments or links in an email, PDF files have received their fair share of attention from threat actors, too. In this post, we’ll take you on a tour of the technical aspects behind malicious PDF files: what they are, how they work, and how we can protect ourselves from them.

How Do PDF Files Execute Code?
Regular readers of the SentinelOne blog will be familiar with the idea of malicious Office attachments that run VBA code from Macros or use DDE to deliver attacks, but not so well-known is how PDFs can execute code.

In some kinds of malicious PDF attacks, the PDF reader itself contains a vulnerability or flaw that allows a file to execute malicious code. Remember that PDF readers aren’t just applications like Adobe Reader and Adobe Acrobat. Most browsers contain a built-in PDF reader engine that can also be targeted. In other cases, attackers might leverage AcroForms or XFA Forms, scripting technologies used in PDF creation that were intended to add useful, interactive features to a standard PDF document.

“One of the easiest and most powerful ways to customize PDF files is by using JavaScript.” (Adobe)

To get a better understanding of how such attacks work, let’s look at a typical PDF file structure. We can safely open a PDF file in a plain text editor to inspect its contents. At first glance, it might look indecipherable:

However, with a bit of knowledge of PDF file structure, we can start to see how to decode this without too much trouble. The body or contents of a PDF file are listed as numbered “objects”. These begin with the object’s index number, a generation number and the “obj” keyword, as we can see at lines 3 and 19, which show the start of the definitions for the first two objects in the file:

1 0 obj
2 0 obj

The end of each object is signalled with the keyword endobj, as seen at lines 18 and 24 for Object 1 and Object 2, respectively.

Object 2 immediately offers us some clues. We can see that it contains a dictionary (signalled by the chevrons << and >>. The dictionary has an entry for a JavaScript stream and a reference to Object 1:

JS 1 0 R

This tells us that the “garbage” code in Object 1 between the keywords stream (line 8) and endstream (line 15) is actually a JavaScript stream. Even better, Object 1’s dictionary is kind enough to tell us how to decode it. Line 6 specifies a “filter” of value “FlateDecode”. We can now write a quick-and-dirty Python script that decompresses the stream into plain JavaScript:

Cleaning Up the Code
Our Python script churns out the JavaScript perfectly but not exactly beautifully:

As we’ve pointed out before, one thing you need to get used to when doing this kind of work is tidying up code to make it easier to work on. Here’s the same code after running it through a beautifier or prettifier in Sublime Text:

Now we can read the JavaScript and determine if it’s malicious or not. In this case, the code appears to be contacting a domain called “readnotify.com”. Making callbacks (“phoning home”) without user consent shows at least a lack of concern for user privacy. For people working in journalism or in politically-sensitive areas this could be a serious issue, as this kind of callback can reveal the user’s IP address, operating system and browser version to a remote server.

More Malicious JavaScript
Compressed streams aren’t the only way PDF files can contain obfuscated code. Here’s another that looks a bit more of a worry when we look at its hash on VirusTotal:

19ac1c943d8d9e7b71404b29ac15f37cd230a463003445b47441dc443d616afd

As the image from VT makes clear, this is some kind of trojan that’s exploiting CVE-2018-4993. Let’s open it up and take a look inside.

This is a very small file. There’s only 4 objects, but the one that interests us is Object 3 and the value for the dictionary key /AA. Note that this contains a child dictionary with key name /O. That’s important because the /O key specifies actions that should occur when a document is opened. And the value of this key is itself another dictionary containing /JS, indicating yet again some encoded JavaScript.

Unlike our previous file, however, this one does not specify a filter. Luckily, the value of “JS” is clearly recognisable as octal encoding. Octal (or “oct”) uses three digits between 0 and 7 to specify a single value. The best thing about oct is we don’t need to roll up our Python sleeves to interpret it; we can just print it out directly on the command line:

As printf shows, the octals represent the same kind of JavaScript call that we saw in the previous example, leveraging the this.submitForm() function.

Going back to the /AA dictionary in the PDF, note the two lines which specify

/S /GoToR

This code issues the “Go To Remote” action, telling the reader application to jump to the destination specified under the /F key.

Warning! Noob qns here

, so there is java script code hidden in the pdf file.....
But how does the pdf reader itself interpret it and why would it be allowed to execute these java script code on the computer?? assessing files on the computer wouldn,t there be some form of permission to disallow it ??
Why is the pdf reader allowed to execute java script code hidden in yr pdf on yr pc then?? Can u provide explanation.....noob here Thanks

lessthan100 · Aug 29, 2021

ngsteve said:
Warning! Noob qns here , so there is java script code hidden in the pdf file.....
But how does the pdf reader itself interpret it and why would it be allowed to execute these java script code on the computer?? assessing files on the computer wouldn,t there be some form of permission to disallow it ??
Why is the pdf reader allowed to execute java script code hidden in yr pdf on yr pc then?? Can u provide explanation.....noob here Thanks

By default ADOBE ACROBAT reader allows javascript to execute (don't ask me why they included this kind of thing). You need to disable it yourself (I bet a lot of ppl don't know this).
Not to mention browser allow PDF files to be open or view via 3rd party PDF reader. Sometimes this 3rd party PDF reader may have security loophole that allows malicious code to execute you also wun know

Take a look at this
https://helpx.adobe.com/acrobat/using/javascripts-pdfs-security-risk.html
Disable JavaScript execution in PDF documents for firefox browser

Load about:config in the web browser's address bar.
Confirm that you will be careful to proceed.
Use the search at the top to find pdfjs. enableScripting.
Set the preference to FALSE with a click on the toggle button at the end of the line.

nubitol · Aug 29, 2021

wah, first time know this, so dangerous

whatheheck · Aug 29, 2021

SpecialKeyboardService said:
My fren got warned moi b4.
Just manipulate some settings , the exe. will pop out from behind the pdf format.
Pdf are like girls, you think they are nice and after you merry, they show u the true colors

Huh, not after you merry is show 这颜色健不健康. After marry then she show true colours :s11:

lessthan100 · Aug 29, 2021

bump up for reminder to all

danny8x8 · Aug 29, 2021

Heng I don't use Adobe or browser to open pdf files..... should be safe right? no?

xcodes · Aug 29, 2021

TS, not limited to PDF files ... haha ... :s13:

PaperRay · Aug 29, 2021

hmm thanks.

in summary, the execute of js code embedded within pdf

orpisia · Aug 29, 2021

What the heck

So long to read

What the tldr

Pocoyoz · Aug 29, 2021

antivirus scanner can catch anot?

Sent from Pocoyo Fan Club

Din know PDF so dangerous one can actually execute malicious code!

Supremacy Member

Malicious PDFs | Revealing the Techniques Behind the Attacks​

How Do PDF Files Execute Code?​

Cleaning Up the Code​

More Malicious JavaScript​

​

Supremacy Member

Stealing Credentials with an SMB Attack​

Another Day, Another Callback​

Protecting Against PDF Attacks​

Conclusion​

Banned

Supremacy Member

Arch-Supremacy Member

Supremacy Member

Great Supremacy Member

High Supremacy Member

High Supremacy Member

Great Supremacy Member

Senior Member

Malicious PDFs | Revealing the Techniques Behind the Attacks​

How Do PDF Files Execute Code?​

Cleaning Up the Code​

More Malicious JavaScript​

​

Supremacy Member

High Supremacy Member

Great Supremacy Member

Supremacy Member

Banned

Great Supremacy Member

Arch-Supremacy Member

High Honorary Member

High Supremacy Member

Malicious PDFs | Revealing the Techniques Behind the Attacks

How Do PDF Files Execute Code?

Cleaning Up the Code

More Malicious JavaScript

Stealing Credentials with an SMB Attack

Another Day, Another Callback

Protecting Against PDF Attacks

Conclusion

Malicious PDFs | Revealing the Techniques Behind the Attacks

How Do PDF Files Execute Code?

Cleaning Up the Code

More Malicious JavaScript