Malware Research Online
by Coleman Kane
Malware Research Online
Table of Contents
- Malware Research Online
This lecture discusses researching malware online, and provides some resources for doing so.
In my opinion, the best resource available for educating oneself on security incidents and attacks is the APTnotes archive:
This archive keeps track of a large number of publicly-available reports that detail some major cyber incidents over the years, going back to 2008. It is definitely not comprehensive, but still provides a well-curated supply of decent cyber reporting.
Note that APT is an abbreviation for “Advanced Persistent Threat”, and is intended to communicate a subset of cyber operators that are considered to be more focused on long term access operations, and often associated with using custom tools and advanced techniques, often adapting to an unfamiliar environment in the moment.
The ThreatMiner project has built a nice user interface to it, as well as other, cyber security reporting:
Additionally, I deep dive into Malware Analysis reports published by security research firms for two cyber threats:
2016 - OilRig
This is an alleged Iranian threat actor that launches complex targeted attacks. They’ve been tracked since 2015, according to the source
- Iranian Threat Agent OilRig Delivers Digitally Signed Malware, Impersonates University of Oxford _ ClearSky Cybersecurity
2014 - Operations Clandestine Fox and Double Tap
This is alleged to have been carried out by a Chinese threat group with ties going back at least a few years as of the publication of the report. The connecting relationships between the Spring 2014 attacks and the Fall 2014 attacks are described in the malware analysis in Operation Double Tap
Slides: lecture-w03-2.pdf (PDF)
Symantec Nitro Attacks Report
Below is a link to another report, from Symantec in 2011, which I feel has a good amount of malware analysis describing a group which used the Poison Ivy RAT heavily around that time.
Report Analysis (Transparent Tribe)
For an example, we will take one of the reports from the APTnotes section, entitled Transparent Tribe: Evolution analysis, part 1, by Kaspersky Labs. In the APTnotes.csv, you will be able to find this report by scrolling down to (roughly, as updates may change this in the future) row #546.
|Filename||Title||Source||Box.com URL||Report MD5||Date Published||Year|
|Kaspersky_Transparent-Tribe-Evolution-analysis-part1(08-20-2020)||Transparent Tribe: Evolution analysis, part 1||Kaspersky||https://app.box.com/s/ujm0zncu4yslx1tvu6aes0qzm5nhvjyg||87ab4c2ff18e568da5932e17ea24c76d9a467938||08/20/2020||2020|
From the above, you are able to get a direct link to an archived copy of the original PDF report, using the box link. This report fits a common format that is typically used, and the sections are organized as follows:
- Background & key findings
- Deep Dive on “Crimson Server” Malware (including describing old and new variants)
- Deep Dive on USBWorm Malware
- IoC list
This report discusses a campaign that was contemporary to 2020, and the analysis comprises a deep dive of a few examples of two types of malware that were being used by a group known as “Transparent Tribe”. Two malware samples that the group used are discussed in the ensuing sections. Sometimes, adversaries will use tools they’ve developed themselves, while other times they may use what they can get access to supplied by others. Typically this information is discussed in the report, to help highlight whether or not malware samples derived from the same code can or cannot be tied to a specific adversary (in this case, “Transparent Tribe”). Toward the end of the document is a list of IoC’s, which stands for Indicators of Compromise. This final section contains a list of datapoints that you may be able to use, possibly individually, but typically in combination, to search for evidence of the same or similar intrusion, in another network.
Ultimately, the layout of this report is designed to communicate two things that were highlighted by earlier material discussing malware analyis. Namely, the report attempts to primarily communicate the behavior and content of the malware components, so that you may monitor and scan an environment (provided the right tools, of course) for evidence that the same activity might be occurring, or may have occurred in the past. Additionally, the report attempts to communicate some additional context, which may be helpful in directing you toward other published material that may also provide you with similar reporting and evidence to expand your search with.
File hashes / checksums as unique file identifiers
One data type that you will encounter a lot is the file hash. This is frequently used as a globally unique identifier for file contents, in effect providing a computable value that can verify that the contents of a file I may be looking at is identical to one that another person is looking at. The following articles describes this practice and some of the mathematics behind it, as well as standard practices:
Common industry practice is to report all discovered files (malware and otherwise) in reporting by using the checksum values. The most commonly used algorithms are MD5, SHA-1, and SHA-256. The SHA-256 is gradually being adopted as the industry standard, as it has the lowest collision rate of the three, while still being roughly comparable to compute.
VirusTotal is a service that is owned by Google, but started as an independent operation. Initially, it was centered around a cloud-hosted platform that would accept malware samples for upload from anywhere, provide some brief metadata, and then scan the suspect file with many different Anti-Virus products, reporting the signature name that would match the file. You’d also be able to get an idea of how many others had uploaded the file, as well as (sometimes), an indication if the file was part of a known good software package.
Over time, the site collected a large number of samples, and the maintainers realized that it would be beneficial to expand the service to provide analysts a window into the vast malware library that had accumulated. Today, the service offers an enormous amount of data for free, and also paid customers can get access to even more features that allow for exploring the library in detail, searching using a complex query language, and more.
Using the “Transparent Tribe” report discussed in the previous section, we can scroll down to the IOCs section, and use some of the reported data points to explore VirusTotal’s data set with. Let’s start with this particular IP address:
Visiting, VirusTotal’s search page, we can put the IP address into the search field and query the entire site for any information it has about it. The following link will take you directly to the results:
This reveals anew page with 4 tabs, plus a score that has been generated by querying the IP address across a number of internet reputational registries. The four tabs are:
- Detection: The results of querying a bad/clean score from multiple registries and AV products availble for the particular data being queried.
- Details: Specific details about the queried item. This will change significantly across different pieces of data, and is comprised of the reporting from whatever set of analysis tools are available for the data type. In this case, it is an IP address, which means that VirusTotal has used WHOIS to look up the public registration data for the IP. Additionally, it has also queried Google for the IP, and lists the top results returned from that search.
- Relations: Lists relationships that the datapoint has to other datapoints that are also stored in VirusTotal’s database
- Community: Some registered users of VirusTotal can publish insights as community members about a particular datapoint. These comments and assessments are recorded in the Community tab, where they can be viewed publicly even by unauthenticated users, to help in making an assessment, as well as help provide further reading about a particular threat.
VirusTotal also offers a paid service that offers the ability to download copies of malware samples, see more details and analysis that isn’t made available to the public, and also search the library with multiple advanced search capabilities.
NSRL (National Software Refereence Library)
The National Software Reference Library is a project supported by NIST to keep track of the file checksums, largely, of “known good” software and keep them in a large authoritative national database, offered as a free public service.
VirusShare is a semi-public archive of (largely) malware samples, managed as a community research project. It can often be a good place to find malware samples that have already been identified in the community, to learn from analyzing. Samples can be downloaded and searched for, after registering for a free account. Samples can be downloaded and searched after registering for an account here. Additionally, this project also seeds out bittorrents of ZIP files containing the raw samples, and checksums identifying their contents are listed here. The bittorrent links for these are available, as well, to registered users.
VirusShare will also periodically publish noteworth samples that it archives on its Twitter Account.
Some projects exist to provide library interfaces to this dataset as well, usch as this one:
MalShare is another semi-public repository that offers a privately-curated library of malware, and some basica analysis of it, to registered users. Registration is available at this link. The registration process is easier than VirusShare, as well as more information being provided in a more polished user interface. These are independent projects, so there’s largely differing sample libraries as well that comprise them.
The MalShare Project offers a lot of libraries and open source tools on their GitHub page:
theZoo (github project)
theZoo goes a bit further than the above services, in terms of providing a community-supplied malware library. This project is designed to be self-hosted, while still maintaining malware samples within the distributed library. The idea would be that an engineer or analyst would fork the GitHub repository for their own use, while still tracking the public branch on GitHub. Where a community member can share samples back into the public community library, traditional GitHub pull request and merge workflows are used for that purpose.
Main project page: thezoo.morirt.com