Detection of malicious PDF files and directions for enhancements: A state-of-the art survey

Nir Nissim, Aviad Cohen, Chanan Glezer, Yuval Elovici

Research output: Contribution to journalReview articlepeer-review

75 Scopus citations


Initial penetration is one of the first steps of an Advanced Persistent Threat (APT) attack, and it is considered one of the most significant means of initiating cyber-attacks aimed at organizations. Such an attack usually results in the loss of sensitive and confidential information. Because email communication is an integral part of daily business operations, APT attackers frequently leverage email as an attack vector for initial penetration of the targeted organization. Emails allow the attacker to deliver malicious attachments or links to malicious websites. Attackers usually use social engineering in order to make the recipient open the malicious email, open the attachment, or press a link. Existing defensive solutions within organizations prevent executables from entering organizational networks via emails, therefore, recent APT attacks tend to attach non-executable files (PDF, MS Office etc.) which are widely used in organizations and mistakenly considered less suspicious or malicious. This article surveys existing academic methods for the detection of malicious PDF files. The article outlines an Active Learning framework and highlights the correlation between structural incompatibility of PDF files and their likelihood of maliciousness. Finally, we provide comparisons, insights and conclusions, as well as avenues for future research in order to enhance the detection of malicious PDFs.

Original languageEnglish
Pages (from-to)246-266
Number of pages21
JournalComputers and Security
StatePublished - 3 Feb 2015


  • APT
  • Cyber-attack
  • Detection
  • Email
  • Malicious code
  • Malware
  • Organizations
  • PDF


Dive into the research topics of 'Detection of malicious PDF files and directions for enhancements: A state-of-the art survey'. Together they form a unique fingerprint.

Cite this