Static Analyzers (Yara, vscan, ClamAV)
Static Analyzers (Yara, vscan, ClamAV)
While the prior lectures focused on some static analysis methods, this lecture introduces 3 tools/platforms for which your analysis output would be the input.
I dive in deep on The Yara Project, which will be more of a focus in this class than the others. I demonstrate its C API as well as the Python API that has been built for it. I also contrast it with the other two apllications, ClamAV and vscan.
Slides: lecture-w05-1.pdf (PDF)
Video: CS7038: Wk05.1 - Static Analyzers and Yara Experiments
Below summaries:
We scrap the surface of ClamAV, and demonstrate building two ClamAV rules for the evil.pdf from Week 02.
In particular, I demonstrate the use of ClamAV’s sigtool to add hash digests to a database of “known bad”, named evil_pdf.hdb
bash$ sigtool --md5 evil.pdf >> evil_pdf.hdb
Here is a link to an HDB of all of the hashes for all 10000 samples I provided in the ZIP: test_samples.hdb
Following that, I analyzed the the PDF and found the following Javascript:
this.exportDataObject({ cName: "template", nLaunch: 0 });
I also found the following PDF data that’s part of a Launch command:
/F(cmd.exe)/D(c:\\windows\\system32)/P(/Q /C %HOMEDRIVE%&cd %HOMEPATH%&
(if exist "Desktop\\template.pdf" (cd "Desktop"))&
(if exist "My Documents\\template.pdf" (cd "My Documents"))&
(if exist "Documents\\template.pdf" (cd "Documents"))&
(if exist "Escritorio\\template.pdf" (cd "Escritorio"))&
(if exist "Mis Documentos\\template.pdf" (cd "Mis Documentos"))&
(start template.pdf)
To view the encrypted content please tick the "Do not show this message again" box and press Open.
I extract the following text:
please tick the "Do not
And convert it into hexadecimal:
bash$ echo -n 'please tick the "Do not' | sigtool --hex-dump
And finally generate the following ClamAV signature which I write into evil_pdf.ldb
We go through and demonstrate creating some yara rules based upon the evil.pdf as well.
During analysis, we analyzed the above PDF and JavaScript and created the following yara rule to identify it:
rule evil_pdf_rule {
author = "Coleman Kane"
revision = 12
description = "Detect evil.pdf sample from Week2 lecture"
$a = "\"Do not show this message again\"" nocase
$r = /if exist.*template\.pdf/
$b = { 706c65617365207469636b207468652022446f206e6f74 }
$pt1 = "start " nocase
$pt2 = "cd " nocase
$pt3 = "exist " nocase
$pt4 = "cmd.exe" nocase
$a or $b or $r or 2 of ($pt*)
The rule is available here: evil_pdf.yar
The above rule will fire in any one of the following conditions:
- If $a is found in the file content
- If a match to pattern $r is found in the file content
- If a byte sequence matching $b is found in the file content
- If any two (implied “or more”) of $pt1, $pt2, $pt3, or $pt4 are found in the file content
Also demonstrated in class were the C and Python APIs for libyara. In both cases, I generated a rule file looking for the words “no” and “yes” in the user input, and each time a user enters a line of text, the text is scanned with the yara rule. Program flow is controlled based upon which rules fired on the input text.
yara_chat.c - the libyara C API implementation - the Python implementation of yara_chat.c
To build the C program, first make sure you have libyara and its headers installed. Then:
gcc -o yara_chat yara_chat.c -lyara
And then: