CS6038/CS5138 Malware Analysis, UC

Course content for UC Malware Analysis

View on GitHub
10 March 2020

Continued Malware Identification with Yara

by Coleman Kane

This lecture quickly reviews some of the foundations that were discussed in the 2020-03-07 lecture, and goes on to apply and expand on these for malware identification. Many times, you will employ yara when a suspicious file is discovered by a teammate, and they would like an answer about whether it matches anything malicious that’s been seen before. To get a quick answer, you may use a curated library of malware signatures to scan the suspect file: if you see hits, then the conclusion is likely (but not always) that it is malware. If you get no hits, then it could either be a work-related tool that simply has suspicious characteristics, or possibly a new type of malware that you’ve not seen before. To determine either of these (and maybe make a new yara signature), you’ll employ the characteristic discovery methods we’ve employed in class and some more that we have yet to cover.

The prior lecture discussed a number of simplified discovery mechanisms using the text of a popular novel. Though as we’ve analyzed malware, we have explored much of the binary data within the samples, most of the concepts are the same. Additionally, most malware is intended to be controlled by a human operator (very rare is anything truly autonomous). Because of this, it is very common for malware to contain a large number of text-based uniquely identifying characteristics. In these cases, this can often result in using many of the same exact strategies discussed earlier in building yara signatures to identify malware. This content also goes over some examples that can be used to produce yara signatures to match binary content.

Using multiple rules within a signature

In the video, I continue on with the example by adding the below content to the rule1.yar signature, which already contains the rule named rule1. Yara will interpret these as independent rules, and treat them as separates, even though they’re in the same file:

rule hatter {
 meta:
  note = "Separate signature to discover Hatter"
 strings:
  $hatter = "Hatter" nocase

 condition:
  $hatter
}

If I wanted to, I could even set this up such that hatter would only math in the event that rule1 (looking for alice, queen, and rabbit) matched, simply by referencing the rule name as a condition:

rule hatter {
 meta:
  note = "Separate signature to discover Hatter"
 strings:
  $hatter = "Hatter" nocase

 condition:
  $hatter and rule1
}

Both rules still match, and the match is reported on the output, but this time the hatter rule will not match any file that doesn’t already match the combination of alice, queen, and rabbit defined in rule1.

Yara Rules for Malware Identification

There is a helpful public repository of yara rules:

In particular, the repository has a number of signatures related to the “APT1” campaign, which utilized a creative approach to backdooring systems, where the malware would make web requests against a third-party website (think of a small business, a church, or some other entity), and the compromised website would handle channeling the traffic back and forth between attacker and victim.

The project has compiled all of those rules into one signature, available here:

String Signature

The following is the extract of the BANGAT_APT1 rule, which attempts to detect a malware family called BANGAT:

rule BANGAT_APT1
{

    meta:
        author = "AlienVault Labs"
        info = "CommentCrew-threat-apt1"
        
    strings:
        $s1 = "superhard corp." wide ascii
        $s2 = "microsoft corp." wide ascii
        $s3 = "[Insert]" wide ascii
        $s4 = "[Delete]" wide ascii
        $s5 = "[End]" wide ascii
        $s6 = "!(*@)(!@KEY" wide ascii
        $s7 = "!(*@)(!@SID=" wide ascii
        $s8 = "end      binary output" wide ascii
        $s9 = "XriteProcessMemory" wide ascii
        $s10 = "IE:Password-Protected sites" wide ascii
        $s11 = "pstorec.dll" wide ascii

    condition:
        all of them
}

The above signature identifies this family of malware, all without having to resort to binary signatures. As you can see, it pretty much treats the malware in a manner similar to how we treated the text of Alice’s Adventure in Wonderland. As discussed, the wide and ascii modifiers tell yara to look for either the UTF-16 or UTF-8 encoded variants of each string. Thus, each defined string really matches two possible byte strings of content.

Often times, it is beneficial to determine the purpose for each of the strings. This may not be a perfect science, but using some knowledge of the underlying systems, the functionality of the malware, and a bit of intuition, I can conclude the following likely purposes for many of them:

The following are likely used for reporting non-printable modifier keypresses while running a keylogger on the user’s system, perhaps to steal passwords, account numbers, and other personal information:

$s3 = "[Insert]" wide ascii
$s4 = "[Delete]" wide ascii
$s5 = "[End]" wide ascii

The following is likely a slightly obfuscated string WriteProcessMemory. It is likely that, at run-time, the malware changes the X to a W before calling the function at run-time. Obfuscating this could help evade anti-virus tools that would consider any program calling that function to be suspect. However, this is a double-edged sword: the presence of the obfuscated version of this function name is almost guaranteed to be bad, while the function itself may have various legitimate uses.

$s9 = "XriteProcessMemory" wide ascii

A file name that likely will be opened by the program at some point during execution. It is rare to have a file name show up in a program without it being a file that is opened. So, in addition to gaining an identifying string from this malware, you may also have learned one more way to detect it on the system:

$s11 = "pstorec.dll" wide ascii

Hexadecimal Strings

Hexadecimal strings, and string patterns, are useful for matching binary content - when the unique characteristics you want to match do not cleanly fit a solution that can be solved using human-readable strings.

Yara’s documentation on it is here:

In the video I work off of the example they show:

rule WildcardExample
{
    strings:
       $hex_string = { E2 34 ?? C8 A? FB }

    condition:
       $hex_string
}

The above will match a variety of 6-byte strings, as long as they have the following characteristics:

I demonstrated converting this into a regular expression that looks like this:

/\xE2\x34.\xC8[\xA0-\xAF]\xFB/

The following would also work:

/\xE24.\xC8[\xA0-\xAF]\xFB/

As you can see, for mostly binary data the hexadecimal pattern syntax is more readable, often more concise, and also a lot easier to author and tune. Additionally, it is often more efficient to execute, as the pattern language is much simpler.

Building a Hexadecimal String

We can use this knowledge to build a signature for matching one of the encryption functions from the mid-term project’s malware.exe artifact. Typically, when we are reverse-engineering the malware, we are identifying compiled code that was added by the author of the malware, for some purposes that are unique to the malware.

For those following along at home, here is a direct link to download the sample:

Extract Reference Bytes from Ghidra

Here’s the excerpt from Ghidra, which I have broken into the three sections that we discussed in class that all functions can often be divided up into. Since this one only has a single return line, there is only a single copy of the function’s epilogue.

                     **************************************************************
                     *                          FUNCTION                          *
                     **************************************************************
                     undefined __cdecl FUN_00401070(byte * param_1, byte * pa
     undefined         AL:1           <RETURN>
     byte *            Stack[0x4]:4   param_1            XREF[4]:     0040108d(R), 
                                                                      0040109f(R), 
                                                                      004010a4(RW), 
                                                                      004010ae(R)  
     byte *            Stack[0x8]:4   param_2            XREF[5]:     00401073(R), 
                                                                      0040107d(R), 
                                                                      00401087(R), 
                                                                      00401094(R), 
                                                                      004010a8(RW)  
                     FUN_00401070                        XREF[3]:     entry:0040164b(c), 
                                                                      entry:00403e01(c), 
                                                                      entry:00404281(c)  
00401070 55              PUSH       EBP
00401071 89 e5           MOV        EBP,ESP
                     LAB_00401073                        XREF[1]:     004010ac(j)  
00401073 8b 45 0c        MOV        EAX,dword ptr [EBP + param_2]
00401076 0f b6 00        MOVZX      EAX,byte ptr [EAX]
00401079 84 c0           TEST       AL,AL
0040107b 74 31           JZ         LAB_004010ae
0040107d 8b 45 0c        MOV        EAX,dword ptr [EBP + param_2]
00401080 0f b6 00        MOVZX      EAX,byte ptr [EAX]
00401083 3c 55           CMP        AL,0x55
00401085 75 0d           JNZ        LAB_00401094
00401087 8b 45 0c        MOV        EAX,dword ptr [EBP + param_2]
0040108a 0f b6 10        MOVZX      EDX,byte ptr [EAX]
0040108d 8b 45 08        MOV        EAX,dword ptr [EBP + param_1]
00401090 88 10           MOV        byte ptr [EAX],DL
00401092 eb 10           JMP        LAB_004010a4
                     LAB_00401094                        XREF[1]:     00401085(j)
00401094 8b 45 0c        MOV        EAX,dword ptr [EBP + param_2]
00401097 0f b6 00        MOVZX      EAX,byte ptr [EAX]
0040109a 83 f0 55        XOR        EAX,0x55
0040109d 89 c2           MOV        EDX,EAX
0040109f 8b 45 08        MOV        EAX,dword ptr [EBP + param_1]
004010a2 88 10           MOV        byte ptr [EAX],DL
                     LAB_004010a4                        XREF[1]:     00401092(j)  
004010a4 83 45 08 01     ADD        dword ptr [EBP + param_1],0x1
004010a8 83 45 0c 01     ADD        dword ptr [EBP + param_2],0x1
004010ac eb c5           JMP        LAB_00401073
                     LAB_004010ae                        XREF[1]:     0040107b(j)  
004010ae 8b 45 08        MOV        EAX,dword ptr [EBP + param_1]
004010b1 c6 00 00        MOV        byte ptr [EAX],0x0
004010b4 90              NOP
004010b5 5d              POP        EBP
004010b6 c3              RET

Furthermore, here is the decompiled C code. As you can see - it is much more readable, and though it cannot be matched directly, you can use it to identify what components exist for the function in Ghidra.

void __cdecl FUN_00401070(byte *param_1,byte *param_2)

{
  while (*param_2 != 0) {
    if (*param_2 == 0x55) {
      *param_1 = *param_2;
    }
    else {
      *param_1 = *param_2 ^ 0x55;
    }
    param_1 = param_1 + 1;
    param_2 = param_2 + 1;
  }
  *param_1 = 0;
  return;
}

Also helpful is to use Ghidra’s function graph viewer. Click on the button that looks like the one highlighted below, from the tool bar above the assembly/listing view:

Ghidra Function Graph Button

You should be able to view a graph of the function, example displayed below. Hover over and select each block to see the respective code highlighted in the listing view, as well as getting to view a pop-out view of the in-context listing of the disassesmbly.

Example view of the Function Graph

Copy the Reference Bytes into Yara

A good place to start, and it demonstrates the wisdom of the syntax used for the hexadecmial string types in yara, is to copy the bytes from Ghidra into a new hex string. In yara the hex strings are bounded by the { and } characters. The yara syntax allows you to include comments within these, as well as allows for a single hex string to span multiple lines, if desired. Thus, it makes it very straightforward to copy from a tool like Ghidra, or ndisasm and objdump for that matter, and adapt the output into a hex string.

Here’s an example where I have used the “body” of the function above, and have removed the addresses from the left-most column, but have left all the other columns in-place. The byte values are interpreted as the hexadecimal string, while I put the remainder of the line into comments, for documentation:

rule xor_string_function {
 meta:
  author  = "Coleman Kane"
  sample1 = "676c34bfd1bc41c256d5cbf0f5272010"
  sample1_name = "lab8_malware.exe"
  rev = 1

 strings:
  $xor_string_body = {
                      // LAB_00401073                        XREF[1]:     004010ac(j)  
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      84 c0       //  TEST       AL,AL
                      74 31       //  JZ         LAB_004010ae
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      3c 55       //  CMP        AL,0x55
                      75 0d       //  JNZ        LAB_00401094
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 10    //  MOVZX      EDX,byte ptr [EAX]
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      88 10       //  MOV        byte ptr [EAX],DL
                      eb 10       //  JMP        LAB_004010a4
                      // LAB_00401094                        XREF[1]:     00401085(j)
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      83 f0 55    //  XOR        EAX,0x55
                      89 c2       //  MOV        EDX,EAX
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      88 10       //  MOV        byte ptr [EAX],DL
                      // LAB_004010a4                        XREF[1]:     00401092(j)  
                      83 45 08 01 //  ADD        dword ptr [EBP + param_1],0x1
                      83 45 0c 01 //  ADD        dword ptr [EBP + param_2],0x1
                      eb c5       //  JMP        LAB_00401073
                      // LAB_004010ae                        XREF[1]:     0040107b(j)  
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      c6 00 00    //  MOV        byte ptr [EAX],0x0
                     }
  condition:
   any of them
}

Tweak the Rule to Adapt for Changes the Author Might Make

So this might be a pretty good yara signature for detecting this function, but we have also hardcoded in the use of 0x55 as the encryption key. If the author wanted to change the encryption key to some other value, they could and it would render our signature useless. The following lines from the C code represent this:

    ...
    if (*param_2 == 0x55) {
      *param_1 = *param_2;
    }
    else {
      *param_1 = *param_2 ^ 0x55;
    ...

The following lines from the disassembly-annotated yara string are where the value is hard-coded:

3c 55       //  CMP        AL,0x55
...
83 f0 55    //  XOR        EAX,0x55

So, a great use of the wildcards in yara would be to replace these two with the following:

3c ??       //  CMP        AL,0x??
...
83 f0 ??    //  XOR        EAX,0x??

Now, the signature will match this construct, regardless of if the author decides to get clever and change the encryption key byte value in the future. This yields the following, which we will update as rev = 2:

rule xor_string_function {
 meta:
  author  = "Coleman Kane"
  sample1 = "676c34bfd1bc41c256d5cbf0f5272010"
  sample1_name = "lab8_malware.exe"
  rev = 2

 strings:
  $xor_string_body = {
                      // LAB_00401073                        XREF[1]:     004010ac(j)  
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      84 c0       //  TEST       AL,AL
                      74 31       //  JZ         LAB_004010ae
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      3c ??       //  CMP        AL,0x??
                      75 0d       //  JNZ        LAB_00401094
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 10    //  MOVZX      EDX,byte ptr [EAX]
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      88 10       //  MOV        byte ptr [EAX],DL
                      eb 10       //  JMP        LAB_004010a4
                      // LAB_00401094                        XREF[1]:     00401085(j)
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      83 f0 ??    //  XOR        EAX,0x??
                      89 c2       //  MOV        EDX,EAX
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      88 10       //  MOV        byte ptr [EAX],DL
                      // LAB_004010a4                        XREF[1]:     00401092(j)  
                      83 45 08 01 //  ADD        dword ptr [EBP + param_1],0x1
                      83 45 0c 01 //  ADD        dword ptr [EBP + param_2],0x1
                      eb c5       //  JMP        LAB_00401073
                      // LAB_004010ae                        XREF[1]:     0040107b(j)  
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      c6 00 00    //  MOV        byte ptr [EAX],0x0
                     }
  condition:
   any of them
}

It May Be Valuable to Divide the Rule into Strings for Each of the Blocks

The above rule will now match that long sequence of bytes, but another possibility is that the author makes changes at the assembly language level to try to move the blocks of the function around. We’ve played a bit with editing code to add NOP instructions, and the author could easily insert a few of those if they desired, between each of the branches.

We might want to break the code into strings comprised of the following byte groups:

                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      84 c0       //  TEST       AL,AL
                      74 31       //  JZ         LAB_004010ae
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      3c ??       //  CMP        AL,0x??
                      75 0d       //  JNZ        LAB_00401094
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 10    //  MOVZX      EDX,byte ptr [EAX]
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      88 10       //  MOV        byte ptr [EAX],DL
                      eb 10       //  JMP        LAB_004010a4
                      // LAB_00401094                        XREF[1]:     00401085(j)
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      83 f0 ??    //  XOR        EAX,0x??
                      89 c2       //  MOV        EDX,EAX
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      88 10       //  MOV        byte ptr [EAX],DL
                      // LAB_004010a4                        XREF[1]:     00401092(j)  
                      83 45 08 01 //  ADD        dword ptr [EBP + param_1],0x1
                      83 45 0c 01 //  ADD        dword ptr [EBP + param_2],0x1
                      eb c5       //  JMP        LAB_00401073
                      // LAB_004010ae                        XREF[1]:     0040107b(j)  
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      c6 00 00    //  MOV        byte ptr [EAX],0x0

Furthermore, the various JMP and J?? operations are really where some hard-coded address offsets exist, so it might make sense for us to wildcard those out as well:

                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      84 c0       //  TEST       AL,AL
                      74 ??       //  JZ         LAB_004010ae
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      3c ??       //  CMP        AL,0x??
                      75 ??       //  JNZ        LAB_00401094
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 10    //  MOVZX      EDX,byte ptr [EAX]
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      88 10       //  MOV        byte ptr [EAX],DL
                      eb ??       //  JMP        LAB_004010a4
                      // LAB_00401094                        XREF[1]:     00401085(j)
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      83 f0 ??    //  XOR        EAX,0x??
                      89 c2       //  MOV        EDX,EAX
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      88 ??       //  MOV        byte ptr [EAX],DL
                      // LAB_004010a4                        XREF[1]:     00401092(j)  
                      83 45 08 01 //  ADD        dword ptr [EBP + param_1],0x1
                      83 45 0c 01 //  ADD        dword ptr [EBP + param_2],0x1
                      eb ??       //  JMP        LAB_00401073
                      // LAB_004010ae                        XREF[1]:     0040107b(j)  
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      c6 00 00    //  MOV        byte ptr [EAX],0x0

Breaking these up, we might build a yara rule that looks like this, with each of them as an individual string:

rule xor_string_function {
 meta:
  author  = "Coleman Kane"
  sample1 = "676c34bfd1bc41c256d5cbf0f5272010"
  sample1_name = "lab8_malware.exe"
  rev = 3

 strings:
  $xor_string_blk1 = {
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      84 c0       //  TEST       AL,AL
                      74 ??       //  JZ         LAB_004010ae
  }
  $xor_string_blk2 = {
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      3c ??       //  CMP        AL,0x??
                      75 ??       //  JNZ        LAB_00401094
  }
  $xor_string_blk3 = {
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 10    //  MOVZX      EDX,byte ptr [EAX]
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      88 10       //  MOV        byte ptr [EAX],DL
                      eb ??       //  JMP        LAB_004010a4
  }
  $xor_string_blk4 = {
                      // LAB_00401094                        XREF[1]:     00401085(j)
                      8b 45 0c    //  MOV        EAX,dword ptr [EBP + param_2]
                      0f b6 00    //  MOVZX      EAX,byte ptr [EAX]
                      83 f0 ??    //  XOR        EAX,0x??
                      89 c2       //  MOV        EDX,EAX
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      88 ??       //  MOV        byte ptr [EAX],DL
  }
  $xor_string_blk5 = {
                      // LAB_004010a4                        XREF[1]:     00401092(j)  
                      83 45 08 01 //  ADD        dword ptr [EBP + param_1],0x1
                      83 45 0c 01 //  ADD        dword ptr [EBP + param_2],0x1
                      eb ??       //  JMP        LAB_00401073
  }
  $xor_string_blk6 = {
                      // LAB_004010ae                        XREF[1]:     0040107b(j)  
                      8b 45 08    //  MOV        EAX,dword ptr [EBP + param_1]
                      c6 00 00    //  MOV        byte ptr [EAX],0x0
  }
 condition:
  all of them
}

Running the above rule, using yara, against lab8_malware.exe + results shown below:

yara -s xor_string_rule.yar lab8_malware.exe
xor_string_function lab8_malware.exe
0x473:$xor_string_blk1: 8B 45 0C 0F B6 00 84 C0 74 31
0x538:$xor_string_blk1: 8B 45 0C 0F B6 00 84 C0 74 63
0x5bb:$xor_string_blk1: 8B 45 0C 0F B6 00 84 C0 74 5C
0x636:$xor_string_blk1: 8B 45 0C 0F B6 00 84 C0 74 6C
0x47d:$xor_string_blk2: 8B 45 0C 0F B6 00 3C 55 75 0D
0x487:$xor_string_blk3: 8B 45 0C 0F B6 10 8B 45 08 88 10 EB 10
0x494:$xor_string_blk4: 8B 45 0C 0F B6 00 83 F0 55 89 C2 8B 45 08 88 10
0x4a4:$xor_string_blk5: 83 45 08 01 83 45 0C 01 EB C5
0x59b:$xor_string_blk5: 83 45 08 01 83 45 0C 01 EB 93
0x617:$xor_string_blk5: 83 45 08 01 83 45 0C 01 EB 9A
0x6a2:$xor_string_blk5: 83 45 08 01 83 45 0C 01 EB 8A
0x4ae:$xor_string_blk6: 8B 45 08 C6 00 00
0x5a5:$xor_string_blk6: 8B 45 08 C6 00 00
0x621:$xor_string_blk6: 8B 45 08 C6 00 00
0x6ac:$xor_string_blk6: 8B 45 08 C6 00 00

In the above, you’ll notice that some of the block strings (1, 5, and 6) all hit in multiple locations within the file, while some of the block strings (2, 3, and 4) are more unique and only identify the occurences within the target routine. This is a great demonstration of how the compiler may re-use code generation recipes in many places.

The code represented by block #6, for instance, is the following very common construct, in C:

*param_1 = 0;

Whereas, the code represented by block #2 is the following, which may be much less commonly used:

if (*param_2 == 0x55) {

Testing and Tuning Signatures

Using a technique like this, we could even collect all of the EXE and DLL files from Windows into a single folder, and then use yara to test our signature against that folder to determine how unique our yara rule really is:

yara -s xor_string_rule.yar -r windows_exes_and_dlls/

This practice can be useful for fidelity testing - enabling you to experiment with a large data set of software that you don’t want your signature to cause a match to occur on any legitimate piece of software in the systems you’re responsible for securing.

Finally, as you have learned that 3 of the block strings (2, 3, and 4) are more significant (unique) than the other three, you might modify the condition to broaden it even further - accounting for that an author may edit the content in one or more of the blocks slightly in the future:

...
 condition:
  all of ($xor_string_blk1,$xor_string_blk5,$xor_string_blk6) and
    any of ($xor_string_blk2,$xor_string_blk3,$xor_string_blk4)

home

tags: malware - yara - lecture