Analysis of Assignment 4, advanced parts
by Coleman Kane
In class today, we went back over the sample from last week’s Thursday lab exercise. In particular, there were a number of questions about what the source code was and how to analyze the two functions that I added to the Revolution Shell.
The text from the assignment is as follows:
7) A function was added by the attacker that moves a bunch of data around in a buffer/array. You’ve been informed that this function is called right before a while loop that is responsible for establishing & maintaining the TCP connection. At what memory address is this called from the main function, and where in memory is the start address of this function? What argument(s) does it take?
8) Another function was added by the attacker which we are told calls “MessageBoxA” to display a message when the malware disconnects. What is the address of this function? At what address is it called by the malware? The message gives a “title” and “message” to the MessageBoxA call, what is the text contained in these two arguments? Hint: use the “Imports” branch in the “Symbol Tree” view to help you find out which function does this work.
In class, most of the questions were related to 8, as 7 was more of an exhaustive search and discovery effort, so I’ll focus on 8 for these notes.
What we are given
It was important to note that I provided the following facts, which are intended to be the first in a trail of breadcrumbs that you’ll follow to get the answers. This is a common process in malware analysis, so it is important to get acquainted with it.
MessageBoxA
is a function called by the modifications I made to the backdoor- A hint for starting the work is to use the Imports in the Symbol Tree
MessageBoxA
is going to be a symbol referenced within the code- Intuitively recognizing the format of a WinAPI function name (ends with “W”, “Ex”, or “A”),
I recognize I can search for
MessageBoxA
on MSDN - If I don’t intuitively know that, then at least I know I can search Google or MSDN’s own library for any function, and learn that it is a Win API function this way - over time learning how to recognize these more readily.
Solving Question #8
After loading revbw.exe
into Ghidra, and waiting for auto-analysis to complete (don’t
forget this step, if you don’t wait then not all of the data will be prepared for you),
I navigate to the Symbol Viewer window. During class I used the filter feature to
find out what library MessageBoxA
was imported from. This also gave me the ability to
click on the MessageBoxA
entry displayed to me, and this tells the listing/assembly
view to immediately jump to the spot in the file where the pointer reference to that
function is stored (not where it is called, but where the calling code gets the address).
Below you will see the function pointer highlighted within the disasseembly listing. As I explain in the lecture, external DLL-imported functions in Windows EXEs will have at least 2 references show up in Ghidra: the function pointer in the import table (what the Imports section of the Symbol Tree is depicting) and the actual location(s) in the compiled code that call the function. This is a picture of the data representing the function pointer:
To the right of the PTR_MessageBoxA_013582b4
is the Ghidra cross-reference to the
location(s) in the code where the function is called. In this case, there’s only one
location that calls it. This markup is exactly the same as the references we used to
identify the command-name strings in the earlier parts of the assignment and in Week 4’s
lecture. You can use this reference to navigate to the code that calls MessageBoxA
.
However, it is important to note that this reference provides two of the answers for
question 8:
- Function Address that calls
MessageBoxA
:0x013389cf
- Address of the instruction calling
MessageBoxA
:0x01338a6d
Double-clicking the XREF takes listing view to the location near the CALL
instruction
for MessageBoxA
, and selecting it will also make sure to move the view of the decompiler
to the function call as well. Once there, you’ll notice that the two string arguments to
the function are represented by the local pointer variables:
local_f2
local_107
As discussed in the video, and mentioned earlier in the What We Are Given section, I can look MessageBoxA up on MSDN to determine which argument is the message box title (Caption) and which is the content (Text).
Identifying String Parameters
As mentioned earlier, there are two string parameters (LPCSTR) provided to MessageBoxA
:
local_f2
&local_107
Note that the address operator is used to give the pointer to local_107
, indicating that
this variable actually stores the first 4 bytes of content for the string, while the local_f2
parameter is passed literally to the function, indicating that it is already a pointer to
a string.
These represent local data allocated on the local function heap/stack. During the in-class
video I quickly went through it, so here’s a more detailed walk of what is happening with
both of these. There are two important notes to keep in mind for these variables. First, the
f2
and 107
numbers represent the array positions on the local heap/stack. Second, these
pointers must trace back to the data to be displayed by MessageBoxA
somewhere earlier and
within the function.
Below is the decompiled representation of this function, from Ghidra:
void FUN_013389cf(void)
{
int iVar1;
uint uVar2;
undefined4 *puVar3;
undefined4 *puVar4;
undefined4 local_107;
undefined4 local_103;
undefined4 local_ff;
undefined4 local_fb;
undefined4 local_f7;
undefined local_f3;
undefined local_f2 [210];
undefined4 uStack32;
undefined auStack28 [12];
local_f2._0_4_ = 0x73696854;
uStack32 = 0x203d55;
iVar1 = -(int)(local_f2 + 2);
uVar2 = (uint)(auStack28 + iVar1) >> 2;
puVar3 = (undefined4 *)
(
"This is a really long message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a reallylong message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a really long messageVGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= "
+ -(int)(local_f2 + iVar1));
puVar4 = (undefined4 *)(local_f2 + 2);
while (uVar2 != 0) {
uVar2 = uVar2 - 1;
*puVar4 = *puVar3;
puVar3 = puVar3 + 1;
puVar4 = puVar4 + 1;
}
local_107 = 0x73564753;
local_103 = 0x67384762;
local_ff = 0x70355756;
local_fb = 0x79566d64;
local_f7 = 0x3d553263;
local_f3 = 0;
MessageBoxA((HWND)0x0,local_f2,(LPCSTR)&local_107,1);
return;
}
Caption
The most straight-forward string is the one that local_107
represents. In this case,
just above the call to MessageBoxA
, we see the sequence of assignment operations, and
in the disassembly listing we can use the Data Convert to convert these to readable
text.
Doing this changes the numeric parameters in each of the MOV
instructions
into human-readable string segments. This effectively builds the string in
segments of 4 bytes. Since this is a 32-bit VM, moving 4 bytes at a time, instead
of one byte at a time, is a significant performance improvement that the
compiler knows to do.
01338a11 c7 85 fd fe MOV dword ptr [EBP + local_107],"SGVs"
ff ff 53 47
56 73
01338a1b c7 85 01 ff MOV dword ptr [EBP + local_103],"bG8g"
ff ff 62 47
38 67
01338a25 c7 85 05 ff MOV dword ptr [EBP + local_ff],"VW5p"
ff ff 56 57
35 70
01338a2f c7 85 09 ff MOV dword ptr [EBP + local_fb],"dmVy"
ff ff 64 6d
56 79
01338a39 c7 85 0d ff MOV dword ptr [EBP + local_f7],"c2U="
ff ff 63 32
55 3d
Then we can string these together into a single string:
SGVsbG8gVW5pdmVyc2U=
Some astute students pointed out that this is the Base64 encoding for the following announcement, but decoding this wasn’t required to complete the exercise:
Hello Universe
Message Body Text
The less intuitive code is responsible for the other parameter, the message body
text. In this case, it is important to recognize that all string constants are
de-duplicated and typically placed into a read-only data memory space. Then, when
they are used in code, this globally-accessible location is referenced by address
as needed, in context. In our example, a local read/write array was also created
to store a local copy of the text, merely because I used char []
as the data type
instead of const char *
inside of my function.
The net result is the below code. Despite the entire string being visible below, as
you can see, the reference to it within this function copies it one 4-byte chunk at
a time from the address stored in puVar3
into the array pointed at by puVar4
. The
puVar4
is set to point into local_f2
right before the while
loop does the data
copying. The additional complication here is that the compiler broke all of this work
up into about 3 different stages:
- Assign the first 4 bytes of
local_f2
toThis
- Assign the last 4 bytes of
local_f2
toU=
followed by NULL. - Copy the middle bytes, 4 at a time, from
puVar3
intolocal_f2
(usingpuVar4
as a temporary cursor variable)
local_f2._0_4_ = 0x73696854;
uStack32 = 0x203d55;
iVar1 = -(int)(local_f2 + 2);
uVar2 = (uint)(auStack28 + iVar1) >> 2;
puVar3 = (undefined4 *)
(
"This is a really long message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a reallylong message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a really long messageVGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= "
+ -(int)(local_f2 + iVar1));
puVar4 = (undefined4 *)(local_f2 + 2);
while (uVar2 != 0) {
uVar2 = uVar2 - 1;
*puVar4 = *puVar3;
puVar3 = puVar3 + 1;
puVar4 = puVar4 + 1;
}
I strongly recommend that, if the above continues to be confusing to you, experiment with similar C code, and even play around with compiling different samples and viewing their behavior in Ghidra.
The original source code I wrote is below:
int output_message(void) {
char buffer[] = "This is a really long message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a really long message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= This is a really long message VGhpcyBpcyBhIHJlYWxseSBsb25nIG1lc3NhZ2U= ";
char title[] = "SGVsbG8gVW5pdmVyc2U=";
return MessageBoxA(NULL, buffer, title, MB_OKCANCEL);
}