HW03: Static Analysis Utility
In the Week 04 lectures, you were introduced to static analysis and have been provided a demonstration of a utility for extracting some static analysis data from malware samples using tools on the Remnux VM.
Your assignment for HW03 will be to take the source code that I began in Lecture Wk04.2 and add an additional analysis to the program that will extract some useful data from the artifact(s).
You will write a report to accompany this, which will include malware analysis of two or more malware samples highlighting why the information extracted is significant.
The Python program is available here: metadata_import.py
If you recall, the Python code that I’ve written already collects the following data from the sample, puts it into a global object within the script, and finally commits it into the database:
- MD5, SHA-1, SHA-256 hashes
- File type (as reported by exiftool / “file magic”)
- File size (in bytes)
- File names
- Compile Time
- Creation Time
- Modify Time
- Company Name
- File Description
- List of sections (if it is a PE32-type file)
You will use a ZIP file containing malware that I provide to you as your experimental set for this homework. This file is available here: Malware_Bundle_HW03.zip (Password: infected7038)
- Using the metadata_import.py script that I’ve provided, the lecture notes from Week04.2, and the ZIP file of malware, rebuild the mongodb database using the samples I’ve provided, according to the in-class demo
- Identify information or characteristics that are available in multiple
malware samples - a portion of your grade depends upon the “difficulty level”
here. If you are simply extracting an additional data-point from exiftool’s
output, or whether or not a particular piece of static content exists, that
will provide enough for at most a B equivalent on the assignment. For a higher
grade, do some research on the file type being analyzed and use your tools to
extract that information. Some examples of acceptable challenges would be:
- Enumerate the encodings used in a PDF
- Calculate the sizes and locations of sections within a PE32 executable
- Parse & extract CSS style sheets from HTML
- The symbols that it imports from one or more particular WINDOWS system DLLs
- Document the characteristic chosen above, why it is significant, and which malware artifacts you were using to develop the extraction technique
- Document the code you added to the Python script. Either use comments and references to lines in the documentation, or document it directly in your report.
- Give a list of the md5, sha-1, or sha-256 (be consistent, though, don’t mix the digest types) of all of the malware samples that are identified by your custom analysis and yielded some amount of data extraction for it
- Use mongodump to dump the contents of the collection into a BSON output file.
Example, using database & collection names from class:
bash$ mkdir mongo-out bash$ cd mongo-out bash$ mongodump -d cs7038 -c malware connected to: 127.0.0.1 Sun Feb 5 22:49:53.120 DATABASE: cs7038 to dump/cs7038 Sun Feb 5 22:49:53.121 cs7038.malware to dump/cs7038/malware.bson Sun Feb 5 22:49:53.129 7820 objects Sun Feb 5 22:49:53.129 Metadata for cs7038.malware to dump/cs7038/malware.metadata.json bash$
The above will create files named dump/cs7038/malware.bson and dump/cs7038/malware.metadata.json. These are your BSON and JSON files.
You’ll submit a report (PDF preferred), plus supporting code, artifacts, and binary data in a ZIP file. You do not need to submit the malware samples to me, but rather include the digest values that uniquely identify the malware samples significant to your analysis. Include the BSON and JSON file(s) generated by the “mongodump” operation in your ZIP file as well.