Course Repository for University of Cincinnati Malware Analysis Class (CS7038)

View on GitHub

Analyzing the Attack With Basic Tools

In this lecture, we pick up from the last lecture by analyzing the malicious PDF containing the exploit that we created in the last lecture. I walk through a cursory analysis of the PDF, noting my observations, and describing the conclusions that they lead me to. I use all of these to hand-write a Python-based parser for the malware file I am given - demonstrating a key skill within Malware Analysis & Reverse Engineering: the building of decoder tools. Following this, I use some more powerful community tools to deconstruct the malicious PDF and eventually identify and extract the sub-component contained within the PDF that hides the Metasploit backdoor delivered to the target.

Slides: lecture-w02-2.pdf (PDF)

Video: CS7038: Wk02.2 - Analyzing the Attack With Basic Tools (YouTube)

Link to Didier Stevens’ pdf-parser.py

The code for the quick PDF decoder I wrote in the lecture (not as good as pdf-parser.py):

import sys

# First, let's open a new filehandle to evil.pdf
file_handle = open("evil.pdf")

# Next, read all bytes of data from evil.pdf into memory (inefficient for large files)
stream = file_handle.read()

# We want to parse the PDF into an organized data structure
pdf_file = {
  'pdf_id': '',
  'obj': []

# Extract the %PDF-1.N marker
pdf_file["pdf_id"] = stream[0:8]

# Search the document for the first defined PDF object
obj_i = stream.find(' 0 obj')

# While we have an object identified, run the loop body
while obj_i != -1:
  # Define "object start" as the byte immediately following the 0x0a byte following the obj tag
  obj_start = obj_i + len(' 0 obj') + 1

  # Define "object end" as the byte immediately preceding the next "endobj" marker
  obj_end = stream.find('endobj', obj_start)

  # Copy the data between obj_start and obj_end
  obj_data = stream[obj_start:obj_end]

  # Build a new "object descriptor" and insert it into the 'obj' sub-component of
  # the pdf_file dict initialized above
  pdf_file["obj"].append({'length': obj_end - obj_start, 'data': obj_data})

  # Search for the next occurrence of an obj marker (or will return -1 if none found)
  obj_i = stream.find(' 0 obj', obj_end)

# Display the data structure to the user