Malware Taxonomy Discussion
by Coleman Kane
Please watch the video, read the slides provided at the following link, and then read the rest of the content on this page. I am providing additional information here that adds to the original version of this lecture:
I’ll provide a direct link to the video URL here, as well: CS7038: Wk03.1 - Malware Taxonomy & Terminology
The above video lecture discusses the nomenclature we will be using in this class when referring to types of malware and their capabilities. As there is no consistent authoritative source for these terms at this time, be advised that the terms I define may be used “incorrectly” in material sourced from outside this class.
A great example is the commonly used term “Trojan”, which was originally chosen to describe a program which masquerades as a legitimate (and desirable) piece of software, for the purpose of getting your permission to run or install it onto your system. Once you’ve granted this, it secretly installs a malicious program onto your computer, while you believe you are actually installing or running something else.
The historical context of this term is referring to the Trojan Horse of Greek legend, which was presented to Troy after a lengthy and fruitless siege by the Greeks who wanted to conquer the then-independent city of Troy. The legend goes on to state that a small force from the Greek army hid inside the large wooden horse, while the rest of the army pretended to abandon the siege. The Trojans captured the horse as a trophy of their victory over the Greeks, and brought it inside their walled city. Once night fell, the soldiers hidden inside the horse broke out, and opened the city up to the rest of the Greek forces, who had doubled back on their retreat. Troy, now defenseless, fell due to Greek ingenuity and the novel strategy of the Trojan Horse.
The term has become so widely used largely because a similar strategy is highly successful in cyber operations. Unfortunately, it has become so widely used that many prominent companies and specialists will often use it as a synonym for any malicious backdoor or running program masking its true idenitity. This is in contrast to the original narrative, where the “Trojan Horse” is actually the carrier, and drawing an analog to the tale, the malware it delivers would be akin to one or more Greek soliders hiding inside the Trojan Horse, opening access from your computer to the rest of the cyber operators, where ever they happen to be operating from.
In our class, the term “Trojan” will have a limited meaning to describe the carrier of a malicious tool, such as a backdoor, where this carrier is trying to pretend it is something else to its target.
Additions to the terms list
Contrasted with Trojan Horses, often malware will install itself such that it runs with an application
name, Windows Service name, or UNIX daemon name that attempts to convince a user or admin that it is
innocuous in nature. Good examples of this are variants of
svchost.exe in Windows, and
UNIX systems. At first glance, neither of these may raise an eyebrow. However, that is what an adversary
would want, and this is a great example of hiding in plain sight. When malware exhibits this behavior,
we will refer to this as a Masquerading Process. I consider it distinct from Trojan (Horses), because it
does not necessarily have to have been delivered onto the system by the user. If the adversary exploits a
weakness in an Internet-facing webserver, and gets the malware installed on the system as a Masquerading
Process, I would not consider that any “Trojan Horse” functionality has been employed at all, as the
intrusion was successful without misleading the human targets into doing the work for you - much as had the
Greeks successfully broken down the city walls, they never would have built the Horse in the first place.
Remote Access Tool
This item is actually defined in the original lecture, but I don’t consider my original description adequate now. To narrow it down, a Remote Access Tool will contain backdoor functionality. In addition to this, however, I expect it to also contain some sort of built-in and dedicated remote file management capability such that the channel used for communication can also be used for the upload and download of arbitrary file. In addition to this, I would also expect it to provide real-time remote monitoring of end-user activity on the system, such as a view of the target user’s desktop on a Windows or MacOS system. There are a near-infinite number of other capability that could further be offered, but I’d consider the above to be the bare-minimum for a tool to be classified as a “RAT”, a more general-purpose malware, versus some other dedicated-purpose nomenclature.
Living Off the Land
Living Off the Land is a term used to describe when malware utilizes already-installed software to implement some or all of its functionality. As Microsoft has improved the capability of interfaces such as WMI, PowerShell, Windows Subsystem for Linux, and others, this technique has become more popular as adversaries can rely upon Microsoft-licensed code to do a lot more heavy-lifting. The benefit is often two-fold:
- Less code needs to be authored in-house by malware authors
- Genuine software processes distributed by trusted vendors will do the work, which is less likely to be caught as malicious activity.
A common use of this is, once an adversary gets a backdoor installed onto a system, the may resort to these
techniques to fill gaps in the backdoor or RAT’s capabilities. For instance, they may resort to using
for modifying the Windows Registry, or
crontab on a UNIX system to provide a mechanism for restarting the
backdoor channel when it gets disconnected.
Some great documentation available here:
- LOLBAS GitHub Repo (Windows-focused)
- GTFOBins (Linux-focused, many would likely work on MacOS X too)
An emerging term that you will encounter is “Fileless” or “File-less” malware. The term often is used to describe attacks that employ a lot of the existing software on a system to execute malware, largely in memory. This can largely be seen as a very specific subset of Living Off the Land techniques, where the existing infrastructure of the OS is used even further to establish the system compromise in a manner intended to evade forensic analysis. While it is named fileless, the truth is that it is largely file-limited. It is truly difficult to get away from files altogether, because most underlying operating systems rely upon file-like objects at a low level to get data into or out of the system.
Here are some vendors’ documentation of the techniques. Be advised that different vendors have differing opinions on what constitutes fileless, so look out for some contradictions and try to read these while identifying the common traits:
McAfee provides the following diagram, which we will use to discuss the attack:
In the above “fileless” attack, you can clearly see that the user received an email (which is a file, of sorts, that
lives in the user’s inbox), and this contained a link to an HTML page on a malicious website, which contains an embedded
tag to play an Adobe Flash animation. The HTML page and the Adobe Flash (SWF) animation are both files, as well, with the
SWF being a small compiled application that executes within the Adobe Flash context, inside of the user’s web browser. The
flash animation then executes PowerShell to download and execute a payload entirely within memory. While no file is
written to disk during this step, a file hosted on a remote server is fetched by the PowerShell process, and stored entirely
in memory. Additional tools are referenced, like
mimikatz (a popular password [info] stealer), that are hosted remotely
as distinct files, and downloaded to the local system into RAM. Some of this RAM may additionally be swapped to disk in
pagefile.sys, or a compromised system may be hibernated to create a
hiberfile.sys that may also contain useful forensic
artifacts. Likewise, if an HTTP or HTTPS proxy is employed by the target, there is an opportunity for all of the downloaded
files to have been archived locally, either by forensic data retention, or possibly by cacheing logic intended to speed up
So, in the above McAfee example, the attack they describe as fileless really seems to consist of no less than 5 files, but limits the storage of them into volatile memory on the host, expecting that, in a panic, the target may simply power off all compromised systems in reaction to discovering malicious activity, destroying a lot of the evidence in the process.