Examining PDF Files Using Two Tools, PDFid and PDF-Parser, Through Command Entered into a Terminal-Like Environment To Understand its Potential Security Risks and Internal Structure.

Rohit Ray
10 min readAug 27, 2023

--

Hello Everyone!! I hope you’re all having a good time. Welcome back to my blog channel. Today, I will write a write-up about how I examined a Portable Document Format (PDF) file using two tools, PDFid and PDF-Parser, through command entered into a terminal-like environment to understand its potential security risks and internal structure.

In today’s digital landscape, the widespread use of Portable Document Format (PDF) files has become integral to communication and information sharing. However, this convenience also presents potential security challenges, as PDF files can harbor hidden risks and vulnerabilities that malicious actors might exploit. As the need for comprehensive cybersecurity measures grows, it becomes crucial to examine PDF files thoroughly to identify any potential security risks and understand their internal structure.

This exploration often involves the utilization of specialized tools that facilitate in-depth analysis. Two such tools, “PDFid” and “PDF-Parser, offer valuable insights into the nature of a PDF file, its potential security threats, and underlying architecture. Their command-line interface sets these tools apart, which enables users to interact directly with the tools using text commands. This approach grants security professionals and researchers a higher degree of control and flexibility when investigating the intricacies of PDF files.

In this context, this article delves into examining PDF files using the “PDFid” and “PDF-Parser” tools via the command-line interface. By leveraging these tools, security analysts can better understand the potential security risks associated with a given PDF file, such as embedded scripts, links, and metadata. Furthermore, exploring a PDF file’s internal structure provides insights into how the file is constructed, shedding light on potential vulnerabilities that could be exploited.

Throughout this article, we will explore the capabilities of PDFid and PDF-Parser, highlighting their significance in cybersecurity. By emphasizing the importance of command-line tools in scrutinizing PDF files, we aim to equip readers with the knowledge and tools necessary to bolster their defenses against potential security threats lurking within PDF documents.

If you’re ready, let’s get started with our task.

The Commence:

An Overview of PDF and PDF Analysis

PDF: PDF (Portable Document Format) is a file format that captures all of the components of a printed document as an electronic image that can be viewed, navigated, printed, or forwarded to others. PDF files are very suitable for publications that require the original graphic appearance to be preserved online, such as magazine articles, product brochures, or flyers.

Overview of Dummy PDF File

PDF Analysis: PDF Analysis is an analysis in which we analyzed the different PDF file formats to find out the different types of viruses or malicious executable code, JavaScript code, clickable action, and open action with the help of different tools such as “pdfid”, “pdf-parser” in Kali Linux.

Viruses and dangerous executable code can be carried in Adobe PDF documents. Multimedia material, hyperlinks, JavaScript code, clickable action, and system commands are all common places for malware to hide. When consumers open the file or interact with the embedded material once it has been opened, the malware assault is launched.

Overview of PDF File Analysis

These files contain both static and dynamic elements, such as photos and text. While PDF components make a document more comprehensible, functional, and pleasing to the eye, they can also be modified to perform malevolent activities.

Objectives of PDF Analysis

PDF analysis serves a crucial role in cybersecurity and digital forensics by thoroughly examining PDF files to uncover potential security risks, vulnerabilities, and hidden information. The objectives of PDF analysis are multifaceted and encompass a range of goals aimed at enhancing security measures and understanding the internal structure of PDF files. Here are some valuable points regarding the objectives of PDF analysis:

Identifying Malicious Content: One of the primary objectives of PDF analysis is to detect and identify any malicious content within the PDF file, such as embedded malware, malicious scripts, or hidden executables. This helps prevent users from unknowingly opening or interacting with compromised files.

Uncovering Exploitable Vulnerabilities: PDF analysis aims to uncover vulnerabilities within the PDF file format itself or within specific PDF viewers that could be exploited by attackers. By identifying these vulnerabilities, security professionals can work to patch or mitigate them effectively.

Understanding Metadata: PDF files often contain metadata such as author information, creation dates, and editing history. Analyzing this metadata can provide insights into the origin of the file, its history, and potentially any unauthorized modifications.

Detecting Phishing and Social Engineering: PDF analysis can help in detecting phishing attempts and social engineering tactics embedded within PDF files. This includes deceptive links, fake forms, or content designed to trick users into revealing sensitive information.

Validating Authenticity: Verifying the authenticity of a PDF file is another objective of analysis. This involves confirming whether a document has been tampered with, ensuring the integrity of digital signatures, and verifying the legitimacy of the file’s source.

Reconstructing Document History: For forensic investigations, PDF analysis can aid in reconstructing the document’s history, including changes made over time. This information can be crucial in legal cases or investigations.

Understanding Obfuscation Techniques: Attackers often employ obfuscation techniques to hide malicious content within PDF files. PDF analysis aims to identify and decipher these obfuscation methods, revealing the true intent of the file.

Extracting Embedded Objects: PDF files can contain embedded objects such as images, fonts, and multimedia elements. The analysis seeks to extract and examine these objects for any potential security implications.

Analyzing JavaScript and Interactive Elements: Many PDF files contain JavaScript code and interactive elements. PDF analysis helps in assessing the behavior of these elements to ensure they are not being used for malicious purposes.

Enhancing Cyber Defense: Ultimately, the objective of PDF analysis is to bolster cybersecurity defenses. By understanding the various threats and vulnerabilities associated with PDF files, organizations can implement better security measures and educate users on safe practices.

Developing Signature and Pattern Databases: The analysis process contributes to building signature and pattern databases for known threats. These databases can be used for proactive threat detection and prevention.

Contributing to Research and Knowledge: PDF analysis generates insights into evolving attack vectors and techniques. This knowledge aids in ongoing research efforts and the development of more advanced analysis tools.

In summary, PDF analysis aims to ensure the security, authenticity, and integrity of PDF files while shedding light on potential risks and vulnerabilities. By achieving these objectives, organizations and individuals can better protect themselves from cyber threats associated with PDF documents.

Methods and Procedures

PDF (Portable Page Type) is a file format that captures all of the components of a printed document as an electronic image that can be viewed, navigated, printed, or forwarded to others. Before we use a PDF file it must be examined properly whether it is malicious or not. There are many tools to analyze a particular PDF file but “pdfid” and “pdf-parser” are two tools available in today's lab. These are command-line tools to analyze the PDF file in Kali Linux.

PDFid: Although this program is not a PDF parser, it will scan a file for specific PDF keywords, allowing you to detect PDF documents that include JavaScript or act when opened. Name obfuscation is likewise handled by pdfid

Simplicity is a fundamental design criterion for this application. Parsing a PDF document in its entirety necessitates a highly complicated program, which is sure to include numerous (security) flaws. To avoid being exploited, I decided to keep this program as simple as possible (it is even simpler than pdf-parser).

In the Below Screenshot, there are two PDF file documents i.e. “pdf Normal” and “pdf Malicious” which are being examined using the pdfid tool.

Screenshot of Normal pdf analysis using pdfid tool
Screenshot of Malicious pdf analysis using pdfid tool

Almost every PDF document will have the first seven words (obj through startxref), as well as stream and endstream to a lesser extent. I’ve come across a few PDF documents that don’t include an xref or trailer, but they’re uncommon (BTW, this is not an indication of a malicious PDF document).

From the above analysis, the PDF file “normal.pdf” includes 12 pages and an object stream, which may conceal objects, but it does not have other components such as clickable action or malicious JavaScript code, so it may be called a malware-free document.

The investigation, on the other hand, reveals that the PDF file “malicious.pdf” has only one sheet with JavaScript, which is another sign of a malicious document, as well as an open action and an embedded file. According to the results of the preceding analysis performed using the pdfid tool, the file “malicious.pdf” contains malware.

Almost every PDF document will have the first seven words (obj through startxref), as well as stream and endstream to a lesser extent. I’ve come across a few PDF documents that don’t include an xref or trailer, but they’re uncommon (BTW, this is not an indication of a malicious PDF document).

From the above analysis, the PDF file “normal.pdf” includes 12 pages and an object stream, which may conceal objects, but it does not have other components such as clickable action or malicious JavaScript code, so it may be called a malware-free document.

The investigation, on the other hand, reveals that the PDF file “malicious.pdf” has only one sheet with JavaScript, which is another sign of a malicious document, as well as an open action and an embedded file. According to the results of the preceding analysis performed using the pdfid tool, the file “malicious.pdf” contains malware.

PDF-Parser: This program will parse a PDF document to find the essential parts in the file being analyzed. A PDF document will not be rendered. The parser’s code is crude; I wouldn’t recommend it as a textbook example of PDF parsers, but it gets the job done.

In the below Screenshot, we can see that the pdf file document named “Normal.pdf” is being analyzed with the help of the pdf-parser tool.

Screenshot of Normal pdf analysis using pdf-parser tool

From the above Screenshots, we can find that there is no presence of malicious JavaScript code, Clickable action, and open action inside this PDF file document.

The image below shows some of the information acquired by processing the document “Malicious.pdf” that must be considered during analysis.

In the above Screenshot, we can see the embedded file named “eciar-dropper.doc” is found with the help of the pdf-parser tool.

Screenshot of Malicious pdf analysis using pdf-parser tool

The above Screenshot shows JavaScript malicious code in a pdf file named “Malicious.pdf”. As a result, the document may be harmful but should not be accessed immediately in our system, as it may cause our system to break

Findings

Indeed, even though the pdf records “Normal.pdf” and “Malicious.pdf” are inspected in this lab report which “Normal.pdf” record was not found to be suspicious while the “Malicious.pdf” record was found to be suspicious containing JavaScript Malicious code, open action, clickable action, and Embedded Files when analyzed utilizing apparatus. In assist examination utilizing pdf-parse, it resolves PDF archives into components.

Conclusion

Whenever we open a PDF file, we must first inspect it and ensure that it is malware-free. In this study, we used the pdfid tool to determine whether the two PDF files were malicious, and then we used pdf-parser to retrieve confessions from the non-editable file format (PDF) and display it in a computer format.

That’s all for this blog, I hope you guys enjoyed this form of learning. ❤

Till then keep learning, keep exploring, and do hacking………………………

References

1. Contributor, T. (2010) What Is Portable Document Format (PDF)? - Definition From Whatis.Com [online] available from <https://whatis.techtarget.com/definition/Portable-Document-Format-PDF> [9 June 2021]

2. Can Pdfs Contain Viruses? 4 Critical Things To Watch Out For | Data Overhaulers (n.d.) available from <https://dataoverhaulers.com/can-pdfs-contain-viruses/> [9 June 2021]

3. Stevens, D. (n.d.) PDF Tools [online] available from <https://blog.didierstevens.com/programs/pdf-tools/> [9 June 2021]

You can follow me on Social Media:

Linkedin: https://www.linkedin.com/in/rohit-ray-19284b232/

GitHub: https://github.com/rohit273

Twitter: https://twitter.com/RHittttt

Instagram: https://www.instagram.com/ro_hit.exe/

Please follow and subscribe for more awesome upcoming blogs.

--

--