CS6038/CS5138 Malware Analysis, UC

Course content for UC Malware Analysis

View on GitHub
24 January 2020

Static Analysis of Compromised VM

by Coleman Kane

Though host forensic analysis is often its own subject space, it is a vital component for malware analysis. After all, part of understanding malware is attempting to understand how it behaves on a running system. For this week, we will review some of the data sources in a Windows environment where we are expecting to find evidence of malware and its actions. We will also go over some tools that exist out there to help analyze an environment after malware has run within it.

Common Data Sources for Malware Analysis

On a Windows system, there are a number of data sets that will contain evidence of malware. SANS has a fairly good posted that discusses them all: SANS Windows Forensic Analysis poster. However, we will drill down into some specific elements (some of which aren’t captured in the poster):

  1. Windows Registry
  2. NTFS Master File Table
  3. Windows Event Log
  4. Browser profiles
  5. System Memory

Windows Registry

This article on Microsoft’s Website discusses the Windows Registry. In short, you can consider the Windows Registry to be a big database of nested groups of key/value pairs, which store settings and metadata about all software, processes, and users of the system. In modern Windows systems, the registry is divided into distinct “hives” of data. The following is a list of each of them, but the linked documentation contains much more valuable detail. These are each given a “hive key” to identify them on the system:

In effect, any system-wide configuration change you can make in the Windows GUI will be stored somewhere in here. From the list of documents in your recent MS Office history, to the changes you make to a network interface. Thus, the Windows registry is a great place to find evidence that malware or an intruder has changed the system configuration for their own nefarious purposes.

The unfortunate situation with the Windows Registry is that the files on disk are binary database files and therefore don’t lend themselves to investigation by a human analyst. On the windows system, you may use the reg.exe command-line tool (which you may have gotten a preview of in Lab 2), or the GUI regedit.exe tool.

The following documentation gives examples on using both of these methods to extract the binary content of the Windows Registry into a human-readable (and machine parseable) form:

In addition to the key/value pairs, the Windows registry maintains data type information as well as permissions and timestamp information on each pair in the registry, which can be a forensic gold mine if you don’t know what you’re looking for but you are certain when it happened.

NTFS Master File Table

The NTFS file system manages its contents through a directory that’s stored alongside the file content data. This directory maintains file names, directory locations, timestamps, and other information about the files. Like the registry hives described above, the NTFS filesytem itself is a code-optimized binary database of arbitrary files across the system, and manual analysis is prohibitive. Windows does offer various capabilities for searching through the Windows Search feature, and some common methods from the command line are as follows:

dir /s C:\

Or a trick to identify directories that might be protected or have a system-specific purpose:

attrib /s /d C:\ > all-files-and-dirs.txt
attrib /s C:\ > all-files-no-dirs.txt

There are also a number of tools that can extract more details from the MFT, by accessing the device directly. As the files on disk are organized into chunks, the NTFS Master File Table helps inform the system about where to find the different chunks of each file spread across the disk. Thus, the files don’t live in one place on disk, but actually are stored piecemeal all over the disk, requiring special tools to reconstruct them in the event that file recovery from a raw disk copy is needed.

The following documentation describes the NTFS filesystem layout, and contains further documentation specifically on the MFT within it:

We specifically drill into the Master File Table, as it stores the metadata about the files on disk. So, ignoring the content of files for a moment, it can provide a log of filesystem activity performed.

A tool that quickly collects together and tables out the content of the MFT is Mft2Csv.

Using the above tool, we can collect a summary of the MFT in the current VM to disk, one row per file:

Mft2Csv.exe /Volume:c: /ExtractResident:1 /OutputPath:\\VBOXSVR\sharedfolder\ /TimeZone:-5 /OutputFormat:all /ScanSlack:1

Additionally, a common alternative format is the log2timeline format, which lists all of the timestamped events as single rows (so, multiple rows per file), to give you a timeline to review filesystem activity events using timestamp information rather than file path information:

Mft2Csv.exe /Volume:c: /ExtractResident:1 /OutputPath:\\VBOXSVR\sharedfolder\ /TimeZone:-5 /OutputFormat:l2t /ScanSlack:1

Windows Event Logs

Any program or service running in Windows that needs to report status, failure, progress, or any other information to the OS will typically report this using the Windows Event Log subsystem. Occasionally, you’ll have some applications that write to flat log files on disk, but the prevailing data store for event logs is the Windows Event Log.

The following page introduces the analyst to Windows Event Log and also documents some of the event types that will be helpful in analyzing activity on a system.

The Windows Event Log is a general purpose event logging system for windows. Much as you can write arbitrary logs into /var/log on a Linux system, any application can write its event logs into Windows. Microsoft released a useful forensic monitoring utility named sysmon, which monitors the system once installed, and reports events into the log about system activity as it occurs.

Similarly, all manner of other applications installed on Microsoft Windows may do the same, such as IIS, Microsoft Office, and others.

Browser Profiles

Browser profiles are managed by each one of the different browsers in different, application-specific, locations. In addition to maintaining history, web browsers will also store cookies, account information, form submission data, and other information that is helpful in making the user experience more optimized. Additionally, things like browsere extensions and plugins are also managed in these locations, so any browser-based malware would be installed in these areas.

For Chrome:

C:\Users\<username>\AppData\Local\Google\Chrome\User Data\Default
C:\Users\<username>\AppData\Local\Google\Chrome\User Data\Default\Cache

For Firefox:

C:\Users\<username>\AppData\Roaming\Mozilla\Firefox\Profiles\xxxxxxxx.default
C:\Users\<username>\AppData\Local\Mozilla\Firefox\Profiles\<profile folder>\cache2

For MS Edge:

C:\Users\<username>\AppData\Local\Packages\<package name>\AC\MicrosoftEdge\User\Default\Favorites
C:\Users\<username>\AppData\Local\Packages\<package name>\AC\MicrosoftEdge\User\Default\Recovery
C:\Users\<username>\AppData\Local\Packages\<package name>\AC\MicrosoftEdge\User\Default\DataStore
C:\Users\<username>\AppData\Local\Microsoft\Windows\WebCache

While not necessarily key to malware analysis, analyzing this information can be helpful in learning where malware came from. As email defense tooling has become more advanced, use of convincing the user to download malware via their web browser (and granting consent) has become significantly more popular.

The company foxton forensics offers a free Browser History collection tool:

Another one that is Firefox/Mozilla-focused and is more exhaustive is Dumpzilla.

System Memory

Finally, system memory is present on every system and, in the end, malware needs to be decoded into a machine readable format in memory in order for it to be effective. Due to this, collection and analysis of system memory still remains an important malware analysis technique.

For Virtual Machines, there are really three approaches to collecting system memory. Collection inside of the VM is possible in two ways, and will enable the collection of memory as the OS and applications see it. Collection of memory using the VM hypervisor will enable collection of memory transparently to the OS (and thus, any rootkit), at the expense of memory not being readily organized in the view the OS or Application has.

To collect system memory, we will make use of the Rekall Memory Forensic Framework, published by Google. This tool provides numerous capabilities for live memory analysis and collection. On the windows system, we will use the winpmem tool (available here) to collect memory images from the Windows systems:

winpmem-2.1.post4.exe --output memdump.aff4 --format raw

This will export a raw memory dump into the archive named memdump.aff4. This file format is a specialized ZIP64 format file, and the version of unzip on your Kali installations should be able to extract the PhysicalMemory
file from inside the archive:

unzip memdump.aff4 PhysicalMemory
mv PhysicalMemory memdump1.raw

Additionally, the volatility suite provides some tools for offline memory analysis:

Using the PhysicalMemory file described above, volatility can be used to analyze it using various plugins. One common example is using volatility to report the type of OS:

volatility -f PhysicalMemory imageinfo
Volatility Foundation Volatility Framework 2.6
INFO    : volatility.debug    : Determining profile based on KDBG search...
          Suggested Profile(s) : Win7SP1x86_23418, Win7SP0x86, Win7SP1x86_24000, Win7SP1x86
                     AS Layer1 : IA32PagedMemoryPae (Kernel AS)
                     AS Layer2 : FileAddressSpace (/mnt/PhysicalMemory)
                      PAE type : PAE
                           DTB : 0x185000L
                          KDBG : 0x82766c28L
          Number of Processors : 2
     Image Type (Service Pack) : 1
                KPCR for CPU 0 : 0x82767c00L
                KPCR for CPU 1 : 0x807c1000L
             KUSER_SHARED_DATA : 0xffdf0000L
           Image date and time : 2020-01-26 04:08:14 UTC+0000
     Image local date and time : 2020-01-25 23:08:14 -0500

Getting the cmd.exe command history buffer (most recently typed commands):

volatility -f /mnt/PhysicalMemory --profile=Win7SP1x86 cmdscan
Volatility Foundation Volatility Framework 2.6
**************************************************
CommandProcess: conhost.exe Pid: 3600
CommandHistory: 0x1effe8 Application: powershell.exe Flags: Allocated, Reset
CommandCount: 10 LastAdded: 9 LastDisplayed: 9
FirstCommand: 0 CommandCountMax: 50
ProcessHandle: 0x5c
Cmd #0 @ 0x1e8b20: cd \\VBOXSVR\
Cmd #1 @ 0x1f5260: cd \\VBOXSVR\vmshare\
Cmd #2 @ 0x1eefe8: dir
Cmd #3 @ 0x1f5298: .\winpmem-2.1.post4.exe 
Cmd #4 @ 0x1ccd70: .\winpmem-2.1.post4.exe --help
Cmd #5 @ 0x1f13c0: .\winpmem-2.1.post4.exe --output winpdump.dmp --format raw --pagefile c:\pagefile.sys
Cmd #6 @ 0x1ef008: dir
Cmd #7 @ 0x1cbd28: del pmemdump.dmp
Cmd #8 @ 0x1f10e0: del .\winpdump.dmp
Cmd #9 @ 0x1e5ac8: .\winpmem-2.1.post4.exe --output winpdump.dmp --format raw
Cmd #36 @ 0x1b00c4: ???
md #37 @ 0x1edb40: ????
**************************************************
CommandProcess: conhost.exe Pid: 3600
CommandHistory: 0x1f1118 Application: winpmem-2.1.post4.exe Flags: Allocated
CommandCount: 0 LastAdded: -1 LastDisplayed: -1
FirstCommand: 0 CommandCountMax: 50
ProcessHandle: 0x8c

Look in memory for IE history artifacts:

volatility -f /mnt/PhysicalMemory --profile=Win7SP1x86 iehistory
...
**************************************************
Process: 1820 explorer.exe
Cache type "URL " at 0x30a5480
Record length: 0x100
Location: :2020011320200120: IEUser@http://www.bing.com/
Last modified: 2020-01-15 17:42:00 UTC+0000
Last accessed: 2020-01-26 03:35:04 UTC+0000
File Offset: 0x100, Data Offset: 0x0, Data Length: 0x0
**************************************************
Process: 1820 explorer.exe
Cache type "URL " at 0x30a5580
Record length: 0x100
Location: :2020011320200120: IEUser@https://nmap.org/download.html
Last modified: 2020-01-15 17:41:02 UTC+0000
Last accessed: 2020-01-26 03:35:04 UTC+0000
File Offset: 0x100, Data Offset: 0x0, Data Length: 0x0
**************************************************
Process: 1820 explorer.exe
Cache type "URL " at 0x30a5680
Record length: 0x100
Location: :2020011320200120: IEUser@http://www.bing.com/search?q=netcat+download+windows&go=Submit+Query&qs=ds&form=QBLH
Last modified: 2020-01-15 17:40:08 UTC+0000
Last accessed: 2020-01-26 03:35:04 UTC+0000
File Offset: 0x100, Data Offset: 0x0, Data Length: 0x0
**************************************************
...

VirtualBox Memory Collection

The VirtualBox environment itself can dump the contents of virtual RAM to a file, for analysis as well. VirtualBox will store this as an ELF-format “Core dump” file, which can help facilitate analysis of it using common debugging tools.

VirtualBox Chapter 8 of the documentation covers using the debugvm option to perform various debugging-related analyses. One of these is exporting RAM to a core dump file:

VBoxManage debugvm <vmname> --filename=<filename.dmp>

The following documentation on Volatility’s GitHub documentation explains how to use this to perform analysis with volatility:

Plaso “super timeline”

Plaso is a really featureful tool that was built by the author of log2timeline, which was very popular for a long time. That tool was originally written in Perl, and the author decided to refactor and update it a bunch and at the same time rewrite it in Python, a more modern and popular language.

Plaso performs a very exhaustive analysis, and also requires that you either export a full copy of the disk image, or that you reboot the system into Linux and mount the NTFS partition somewhere accessible so that it can perform its analysis. It is very thorough, and therefore can take hours to complete a single job. For example, it will go through all of the files on a windows system, and if it identifies files that would contain significant information to add to a timeline, it will dive into those to extract data. Some examples are archive inspectors, browser metadata inspectors, syslong analyzers, and SQLite database analyzers.

Today, it consists of multiple tools, with psteal and log2timeline being the primary front-end tools, with the preference leaning toward psteal as the newer front-end. You can read about all of these tools by clicking here.

The tool Mft2Csv discussed earlier is capable of exporting into the l2t format, as well as the bodyfile format, both of which are compatible with the log2timeline tool.

Though this is a popular utility, I won’t be walking through its full functionality with class, due to the length of time it can take, and the cost that that can incur in the case of having to re-do examples or labs. We may end up working with some of the output, or running the tool on some limited examples. That being said, I do strongly recommend reading about it.

home

tags: malware - lecture