EXE File Analysis Lecture 1
by Coleman Kane
This lecture walked through a number of purpose-built utilities common to most Linux systems, and installed on the Kali VM image I provided to all of you students in class. We use these as building blocks toward the main view of Ghidra, which can be overwhelming to the initial observer. Often I find it is very helpful to have these repeatable examples of the single or limited use tools to demonstrate recipes for getting specific data, and then follow these up with introducing a multi-purpose environment such as Ghidra that can demonstrate how stitching these data sets together can provide a lot of power.
Hexdump
Initially, I begin with the hexdump
utility. Some great examples and documentation
are available here.
My favorite invocation of hexdump
is using the -C
option. This gives a 16-byte-wide
hexadecimal dump output, as well as a preview of the raw text (sanitizing unprinable
characters) on the right. This gives you the ability to see the numeric representation,
as well as view the raw data for human-readable content or other patterns that are helped
by a denser viewport.
hexdump -C filename.exe
Less
The GNU less utility is used a lot in the beginning
of this lecture to control the output of the other commands, allowing me to page through
the data. The tool is similar to the more well-known more
command, which has a variant
on Windows systems, too. I happen to favor the less
tool, and a common slogan is “less is
more” attempting to communicate that less
is a rewrite of more
and also that less
has additional features beyond what’s offered in more
.
During the lecture, I use the spacebar to page through the file, and PageUp/Down are supported
as well. Additionally, if I want to search through the entire buffer, I can type the /
character,
followed by the text I want to search. I use this feature in the lecture, and it is what enables
me to skip around the file using numeric addresses.
Some more documentation here: Unix Less Command: 10 Tips for Effective Navigation
File
The File command is built in to pretty much every Linux and
BSD variant. It is build around libmagic
which is a library that can perform metadata analysis
based upon arbitrary file structure information stored in a “magic database”.
In the lecture, I use the following to dump out a brief list of file type and intended platform:
file filename.exe
ExifTool
The ExifTool utility is a Perl framework written by Phil Harvey. Originally designed to extract the EXIF content from image files that embeds camera, location, and other metadata, the author decided that this concept was broadly applicable to even more file formats than images.
This tool unfortunately did not get installed on the Kali VM, so you may wish to install it now with the following command:
apt install -y exiftool
In the lecture, I use this to dump out even more metadata about the EXE than file
, such as the
compilation timestamp, the version of the linker that I used (linkers organize compiled objects
into OS-native executable files), and the minimum compatible versions of the Windows OS. It is
important to note that this last item is not going to guarantee that the EXE is compatible with
all DLLs from that version of Windows, but mainly that the Windows kernel will understand the
file layout.
Objdump
The objdump utility is part
of the binutils package, which is a bundle of tools used in Linux/UNIX systems for working
with many core binary file types. The objdump
utility is designed to be a full metadata analysis
and reporting tool for executable files. On most systems, the objdump
utility can be extended
to support analysis of multiple executable file types through the installation of additional packages
or modules. This is the case on our Kali VM, which gives us the ability to use objdump
to analyze
native Linux binaries, as well as native Windows binaries.
In the lecture, I demonstrate using it to perform the following analyses.
Use -f
to dump the terse basic file header metadata:
objdump -f filename.exe
Use -x
to dump a verbose list of executable file structure. This reports all of the sections of
the executable, and where objdump
can interpret the section metadata, it reports that out too.
Using this, we navigate to the Sections:
part of the output, and we deep dive on how this
explains to the Windows kernel how to organize the sections of the file from disk into system RAM.
It is important to note that this grants the flexibility for the in-memory organization of any
file to deviate from the on-disk layout.
objdump -x filename.exe
Disassembly and Source Listing
Using the -d
/-D
and -S
arguments, objdump
can be told to disassemble the file and, in
the case of -S
, can also display C or C++ source code if it is available. The -D
option can
disassemble all sections of the file, while the -d
option will limit disassembly to sections
that are marked executable in the file header. This can be helpful if the file attempts to conceal
some of the code on disk inside data marked non-executable (to be changed at run-time).
objdump -d filename.exe
Strings
The strings tool is
also part of the binutils
package. This utility scans the file from beginning to end and
attempts to discover strings that would be encoded using standard conventions, such as a
sequence of human-readable characters followed by the \0
(NULL) byte (\x00
). The strings
utility can be told to change its behavior to filter only to longer-sized strings, and
also can identify a number of different string encodings, such as the UTF-16 that is popular
on Windows.
To show only 6-byte or greater strings (from lecture):
strings -n 6 filename.exe
To show any UTF-16 “Little Endian” strings, again with minimum length 16. This is very handy for Windows binaries, as many of them have UTF-16 string contents:
strings -n 6 -e l filename.exe
If you want strings
to also report the offset (within the file on disk) of each string,
you may use the -t x
option, which will report this offset from the beginning of the file
in hexadecimal.
strings -n 6 -e l -t x filename.exe