CS6038/CS5138 Malware Analysis, UC

Course content for UC Malware Analysis

View on GitHub
11 February 2020

Analysis of Assignment 4, advanced parts

by Coleman Kane

In class today, we went back over the sample from last week’s Thursday lab exercise. In particular, there were a number of questions about what the source code was and how to analyze the two functions that I added to the Revolution Shell.

The text from the assignment is as follows:

7) A function was added by the attacker that moves a bunch of data around in a buffer/array. You’ve been informed that this function is called right before a while loop that is responsible for establishing & maintaining the TCP connection. At what memory address is this called from the main function, and where in memory is the start address of this function? What argument(s) does it take?

8) Another function was added by the attacker which we are told calls “MessageBoxA” to display a message when the malware disconnects. What is the address of this function? At what address is it called by the malware? The message gives a “title” and “message” to the MessageBoxA call, what is the text contained in these two arguments? Hint: use the “Imports” branch in the “Symbol Tree” view to help you find out which function does this work.

In class, most of the questions were related to 8, as 7 was more of an exhaustive search and discovery effort, so I’ll focus on 8 for these notes.

What we are given

It was important to note that I provided the following facts, which are intended to be the first in a trail of breadcrumbs that you’ll follow to get the answers. This is a common process in malware analysis, so it is important to get acquainted with it.

Solving Question #8

After loading revbw.exe into Ghidra, and waiting for auto-analysis to complete (don’t forget this step, if you don’t wait then not all of the data will be prepared for you), I navigate to the Symbol Viewer window. During class I used the filter feature to find out what library MessageBoxA was imported from. This also gave me the ability to click on the MessageBoxA entry displayed to me, and this tells the listing/assembly view to immediately jump to the spot in the file where the pointer reference to that function is stored (not where it is called, but where the calling code gets the address).

Symbol Tree Symbol Tree Filtered to MessageBoxA

Below you will see the function pointer highlighted within the disasseembly listing. As I explain in the lecture, external DLL-imported functions in Windows EXEs will have at least 2 references show up in Ghidra: the function pointer in the import table (what the Imports section of the Symbol Tree is depicting) and the actual location(s) in the compiled code that call the function. This is a picture of the data representing the function pointer:

MessageBoxA Function Pointer

To the right of the PTR_MessageBoxA_013582b4 is the Ghidra cross-reference to the location(s) in the code where the function is called. In this case, there’s only one location that calls it. This markup is exactly the same as the references we used to identify the command-name strings in the earlier parts of the assignment and in Week 4’s lecture. You can use this reference to navigate to the code that calls MessageBoxA. However, it is important to note that this reference provides two of the answers for question 8:

Double-clicking the XREF takes listing view to the location near the CALL instruction for MessageBoxA, and selecting it will also make sure to move the view of the decompiler to the function call as well. Once there, you’ll notice that the two string arguments to the function are represented by the local pointer variables:

Viewing MessageBoxA Call

As discussed in the video, and mentioned earlier in the What We Are Given section, I can look MessageBoxA up on MSDN to determine which argument is the message box title (Caption) and which is the content (Text).

Identifying String Parameters

As mentioned earlier, there are two string parameters (LPCSTR) provided to MessageBoxA:

Note that the address operator is used to give the pointer to local_107, indicating that this variable actually stores the first 4 bytes of content for the string, while the local_f2 parameter is passed literally to the function, indicating that it is already a pointer to a string.

These represent local data allocated on the local function heap/stack. During the in-class video I quickly went through it, so here’s a more detailed walk of what is happening with both of these. There are two important notes to keep in mind for these variables. First, the f2 and 107 numbers represent the array positions on the local heap/stack. Second, these pointers must trace back to the data to be displayed by MessageBoxA somewhere earlier and within the function.

Below is the decompiled representation of this function, from Ghidra:

void FUN_013389cf(void)

{
  int iVar1;
  uint uVar2;
  undefined4 *puVar3;
  undefined4 *puVar4;
  undefined4 local_107;
  undefined4 local_103;
  undefined4 local_ff;
  undefined4 local_fb;
  undefined4 local_f7;
  undefined local_f3;
  undefined local_f2 [210];
  undefined4 uStack32;
  undefined auStack28 [12];
  
  local_f2._0_4_ = 0x73696854;
  uStack32 = 0x203d55;
  iVar1 = -(int)(local_f2 + 2);
  uVar2 = (uint)(auStack28 + iVar1) >> 2;
  puVar3 = (undefined4 *)
           (
           "This is a really long message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a reallylong message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a really long messageVGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= "
           + -(int)(local_f2 + iVar1));
  puVar4 = (undefined4 *)(local_f2 + 2);
  while (uVar2 != 0) {
    uVar2 = uVar2 - 1;
    *puVar4 = *puVar3;
    puVar3 = puVar3 + 1;
    puVar4 = puVar4 + 1;
  }
  local_107 = 0x73564753;
  local_103 = 0x67384762;
  local_ff = 0x70355756;
  local_fb = 0x79566d64;
  local_f7 = 0x3d553263;
  local_f3 = 0;
  MessageBoxA((HWND)0x0,local_f2,(LPCSTR)&local_107,1);
  return;
}

Caption

The most straight-forward string is the one that local_107 represents. In this case, just above the call to MessageBoxA, we see the sequence of assignment operations, and in the disassembly listing we can use the Data Convert to convert these to readable text.

Doing this changes the numeric parameters in each of the MOV instructions into human-readable string segments. This effectively builds the string in segments of 4 bytes. Since this is a 32-bit VM, moving 4 bytes at a time, instead of one byte at a time, is a significant performance improvement that the compiler knows to do.

01338a11 c7 85 fd fe       MOV          dword ptr [EBP + local_107],"SGVs"
         ff ff 53 47 
         56 73
01338a1b c7 85 01 ff       MOV          dword ptr [EBP + local_103],"bG8g"
         ff ff 62 47 
         38 67
01338a25 c7 85 05 ff       MOV          dword ptr [EBP + local_ff],"VW5p"
         ff ff 56 57 
         35 70
01338a2f c7 85 09 ff       MOV          dword ptr [EBP + local_fb],"dmVy"
         ff ff 64 6d 
         56 79
01338a39 c7 85 0d ff       MOV          dword ptr [EBP + local_f7],"c2U="
         ff ff 63 32 
         55 3d

Then we can string these together into a single string:

SGVsbG8gVW5pdmVyc2U=

Some astute students pointed out that this is the Base64 encoding for the following announcement, but decoding this wasn’t required to complete the exercise:

Hello Universe

Message Body Text

The less intuitive code is responsible for the other parameter, the message body text. In this case, it is important to recognize that all string constants are de-duplicated and typically placed into a read-only data memory space. Then, when they are used in code, this globally-accessible location is referenced by address as needed, in context. In our example, a local read/write array was also created to store a local copy of the text, merely because I used char [] as the data type instead of const char * inside of my function.

The net result is the below code. Despite the entire string being visible below, as you can see, the reference to it within this function copies it one 4-byte chunk at a time from the address stored in puVar3 into the array pointed at by puVar4. The puVar4 is set to point into local_f2 right before the while loop does the data copying. The additional complication here is that the compiler broke all of this work up into about 3 different stages:

  local_f2._0_4_ = 0x73696854;
  uStack32 = 0x203d55;
  iVar1 = -(int)(local_f2 + 2);
  uVar2 = (uint)(auStack28 + iVar1) >> 2;
  puVar3 = (undefined4 *)
           (
           "This is a really long message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a reallylong message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a really long messageVGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= "
           + -(int)(local_f2 + iVar1));
  puVar4 = (undefined4 *)(local_f2 + 2);
  while (uVar2 != 0) {
    uVar2 = uVar2 - 1;
    *puVar4 = *puVar3;
    puVar3 = puVar3 + 1;
    puVar4 = puVar4 + 1;
  }

I strongly recommend that, if the above continues to be confusing to you, experiment with similar C code, and even play around with compiling different samples and viewing their behavior in Ghidra.

The original source code I wrote is below:

int output_message(void) {
    char buffer[] = "This is a really long message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a really long message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a really long message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= ";
    char title[] = "SGVsbG8gVW5pdmVyc2U=";
    return MessageBoxA(NULL, buffer, title, MB_OKCANCEL);
}

home

tags: malware ghidra disassembly lecture