Course Repository for University of Cincinnati Malware Analysis Class (CS7038)

View on GitHub

Static Analyzers (Yara, vscan, ClamAV)

While the prior lectures focused on some static analysis methods, this lecture introduces 3 tools/platforms for which your analysis output would be the input.

I dive in deep on The Yara Project, which will be more of a focus in this class than the others. I demonstrate its C API as well as the Python API that has been built for it. I also contrast it with the other two apllications, ClamAV and vscan.

Slides: lecture-w05-1.pdf (PDF)

Video: CS7038: Wk05.1 - Static Analyzers and Yara Experiments

Below summaries:


We scrap the surface of ClamAV, and demonstrate building two ClamAV rules for the evil.pdf from Week 02.

In particular, I demonstrate the use of ClamAV’s sigtool to add hash digests to a database of “known bad”, named evil_pdf.hdb

bash$ sigtool --md5 evil.pdf >> evil_pdf.hdb

Here is a link to an HDB of all of the hashes for all 10000 samples I provided in the ZIP: test_samples.hdb

Following that, I analyzed the the PDF and found the following Javascript:

this.exportDataObject({ cName: "template", nLaunch: 0 });

I also found the following PDF data that’s part of a Launch command:

/F(cmd.exe)/D(c:\\windows\\system32)/P(/Q /C %HOMEDRIVE%&cd %HOMEPATH%&
(if exist "Desktop\\template.pdf" (cd "Desktop"))&
(if exist "My Documents\\template.pdf" (cd "My Documents"))&
(if exist "Documents\\template.pdf" (cd "Documents"))&
(if exist "Escritorio\\template.pdf" (cd "Escritorio"))&
(if exist "Mis Documentos\\template.pdf" (cd "Mis Documentos"))&
(start template.pdf)

To view the encrypted content please tick the "Do not show this message again" box and press Open.

I extract the following text:

please tick the "Do not

And convert it into hexadecimal:

bash$ echo -n 'please tick the "Do not' | sigtool --hex-dump

And finally generate the following ClamAV signature which I write into evil_pdf.ldb



We go through and demonstrate creating some yara rules based upon the evil.pdf as well.

During analysis, we analyzed the above PDF and JavaScript and created the following yara rule to identify it:

rule evil_pdf_rule {
  author = "Coleman Kane"
  revision = 12
  description = "Detect evil.pdf sample from Week2 lecture"

  $a = "\"Do not show this message again\"" nocase
  $r = /if exist.*template\.pdf/
  $b = { 706c65617365207469636b207468652022446f206e6f74 }
  $pt1 = "start " nocase
  $pt2 = "cd " nocase
  $pt3 = "exist " nocase
  $pt4 = "cmd.exe" nocase

  $a or $b or $r or 2 of ($pt*)

The rule is available here: evil_pdf.yar

The above rule will fire in any one of the following conditions:

Also demonstrated in class were the C and Python APIs for libyara. In both cases, I generated a rule file looking for the words “no” and “yes” in the user input, and each time a user enters a line of text, the text is scanned with the yara rule. Program flow is controlled based upon which rules fired on the input text.


yara_chat.c - the libyara C API implementation

yara_chat.py - the Python implementation of yara_chat.c

To build the C program, first make sure you have libyara and its headers installed. Then:

gcc -o yara_chat yara_chat.c -lyara

And then: