CS6038/CS5138 Malware Analysis, UC

Course content for UC Malware Analysis

View on GitHub
4 February 2020

EXE File Analysis Lecture 1

by Coleman Kane

This lecture walked through a number of purpose-built utilities common to most Linux systems, and installed on the Kali VM image I provided to all of you students in class. We use these as building blocks toward the main view of Ghidra, which can be overwhelming to the initial observer. Often I find it is very helpful to have these repeatable examples of the single or limited use tools to demonstrate recipes for getting specific data, and then follow these up with introducing a multi-purpose environment such as Ghidra that can demonstrate how stitching these data sets together can provide a lot of power.

Hexdump

Initially, I begin with the hexdump utility. Some great examples and documentation are available here.

My favorite invocation of hexdump is using the -C option. This gives a 16-byte-wide hexadecimal dump output, as well as a preview of the raw text (sanitizing unprinable characters) on the right. This gives you the ability to see the numeric representation, as well as view the raw data for human-readable content or other patterns that are helped by a denser viewport.

hexdump -C filename.exe

Less

The GNU less utility is used a lot in the beginning of this lecture to control the output of the other commands, allowing me to page through the data. The tool is similar to the more well-known more command, which has a variant on Windows systems, too. I happen to favor the less tool, and a common slogan is “less is more” attempting to communicate that less is a rewrite of more and also that less has additional features beyond what’s offered in more.

During the lecture, I use the spacebar to page through the file, and PageUp/Down are supported as well. Additionally, if I want to search through the entire buffer, I can type the / character, followed by the text I want to search. I use this feature in the lecture, and it is what enables me to skip around the file using numeric addresses.

Some more documentation here: Unix Less Command: 10 Tips for Effective Navigation

File

The File command is built in to pretty much every Linux and BSD variant. It is build around libmagic which is a library that can perform metadata analysis based upon arbitrary file structure information stored in a “magic database”.

In the lecture, I use the following to dump out a brief list of file type and intended platform:

file filename.exe

ExifTool

The ExifTool utility is a Perl framework written by Phil Harvey. Originally designed to extract the EXIF content from image files that embeds camera, location, and other metadata, the author decided that this concept was broadly applicable to even more file formats than images.

This tool unfortunately did not get installed on the Kali VM, so you may wish to install it now with the following command:

apt install -y exiftool

In the lecture, I use this to dump out even more metadata about the EXE than file, such as the compilation timestamp, the version of the linker that I used (linkers organize compiled objects into OS-native executable files), and the minimum compatible versions of the Windows OS. It is important to note that this last item is not going to guarantee that the EXE is compatible with all DLLs from that version of Windows, but mainly that the Windows kernel will understand the file layout.

Objdump

The objdump utility is part of the binutils package, which is a bundle of tools used in Linux/UNIX systems for working with many core binary file types. The objdump utility is designed to be a full metadata analysis and reporting tool for executable files. On most systems, the objdump utility can be extended to support analysis of multiple executable file types through the installation of additional packages or modules. This is the case on our Kali VM, which gives us the ability to use objdump to analyze native Linux binaries, as well as native Windows binaries.

In the lecture, I demonstrate using it to perform the following analyses.

Use -f to dump the terse basic file header metadata:

objdump -f filename.exe

Use -x to dump a verbose list of executable file structure. This reports all of the sections of the executable, and where objdump can interpret the section metadata, it reports that out too. Using this, we navigate to the Sections: part of the output, and we deep dive on how this explains to the Windows kernel how to organize the sections of the file from disk into system RAM. It is important to note that this grants the flexibility for the in-memory organization of any file to deviate from the on-disk layout.

objdump -x filename.exe

Disassembly and Source Listing

Using the -d/-D and -S arguments, objdump can be told to disassemble the file and, in the case of -S, can also display C or C++ source code if it is available. The -D option can disassemble all sections of the file, while the -d option will limit disassembly to sections that are marked executable in the file header. This can be helpful if the file attempts to conceal some of the code on disk inside data marked non-executable (to be changed at run-time).

objdump -d filename.exe

Strings

The strings tool is also part of the binutils package. This utility scans the file from beginning to end and attempts to discover strings that would be encoded using standard conventions, such as a sequence of human-readable characters followed by the \0 (NULL) byte (\x00). The strings utility can be told to change its behavior to filter only to longer-sized strings, and also can identify a number of different string encodings, such as the UTF-16 that is popular on Windows.

To show only 6-byte or greater strings (from lecture):

strings -n 6 filename.exe

To show any UTF-16 “Little Endian” strings, again with minimum length 16. This is very handy for Windows binaries, as many of them have UTF-16 string contents:

strings -n 6 -e l filename.exe

If you want strings to also report the offset (within the file on disk) of each string, you may use the -t x option, which will report this offset from the beginning of the file in hexadecimal.

strings -n 6 -e l -t x filename.exe

home

tags: malware lecture c x86 x86-64 asm cfg ghidra