More Ghidra Code Analysis
by Coleman Kane
Table of Contents
This is a continuation from the prior material on Ghidra. If you’re reading through this, it is recommended to start with that module first.
Analyzing main
Function
Returning to the main
function that we labeled, we will focus our effort on the Decompiled source code.
For starters, the decompilation gives us a readable view of the C++ streams cin
and cout
being used, as well as
the strings that were embedded within the source code that are being used with them.
It is worth noting that C++ treats the <<
operator as a function (in fact, all operators can be overridden as functions in
C++) that takes 2 arguments (the values on the left and right of the operator: left << right
). When the source code is compiled,
these are resolved by the parser into function calls like operator<<(left, right)
, with left
being our cout
and right
being the string to print to the screen. The return value of this function call is a reference to the left
argument, which
allows C++ to utilize the pretty “chaining” syntax you see:
cout << first << second << third;
The above resolves into the following function call set, more or less.
operator<<(operator<<(operator<<(cout, "first"), "second"), "third");
In the above screenshot, the this
pointer referred to within the function ends up holding onto the reference to cout
, and
when the calls are executed from inner to outer, ensuring the strings are displayed
in the order you want). Ghidra actually breaks this apart for your benefit, however:
this = operator<<<std--char_traits<char>>((basic_ostream *)cout,"Your string \"");
this = operator<<<char,std--char_traits<char>,std--allocator<char>>
(this,(basic_string *)local_38);
this = operator<<<std--char_traits<char>>(this,"\" was NOT found!");
operator<<((basic_ostream<char,std--char_traits<char>> *)this,endl<char,std--char_traits<char>>)
Scan the Code Looking for Key System Function Calls
Using the above knowledge, you can observe that the main
function calls getline
as well as operator<<
, indicating that
the function provides some sort of interactive command-line interface to the user. Abstracted out, you’ll find this event-driven
design common for a lot of malicious tools, as well. It’s not always getline
and cout
- sometimes it is gets
and printf
or
in many cases it is often send
/recv
and read
/write
.
Divide and Conquer
A great strategy here, though, is to break the code up into sections, using the getline
prompts as a barrier. This gives us
the opportunity to analyze smaller chunks individually, to decide what their function/purpose is within the larger program.
For example:
local_c = 0;
local_18 = param_2;
local_10 = param_1;
this = operator<<<std--char_traits<char>>
((basic_ostream *)cout,
"Enter strings one per line for list. Empty line to terminate:");
operator<<((basic_ostream<char,std--char_traits<char>> *)this,endl<char,std--char_traits<char>>);
basic_string();
do {
/* try { // try from 00401351 to 004014da has its CatchHandler @ 00401467 */
getline<char,std--char_traits<char>,std--allocator<char>>
((basic_istream *)cin,(basic_string *)local_38);
FUN_00401590(&DAT_00404348,local_38);
lVar2 = size();
} while (lVar2 != 0);
This fragment of code includes a cout
write that prompts the user to provide strings, one per line. Following that is a do
-loop
within which is a call to getline
that fills variable local_38
and then passes that to FUN_00401590
. The size()
function
is then called and its result is saved in lVar
. In the iostreams
code, the top-level function size()
is used to report the
length of the most recent basic_string
operation. To signify the “empty line” this is tested for zero to determine when to cut out
of the loop. This is basically a recipe for taking streaming chunked input (the lines of text) and then running a processor on it,
FUN_00401590
.
Aside: see if you can identify where I screwed up the iteration algorithm above, think of how the original algo1.cpp
code should
have been written for this part to execute as intended, and keep that in mind.
If you right-click the FUN_00401590(...)
call in the decompiler source code window, it will bring up a context menu that gives you
some comment options. Ghidra supports marking up the disassembly listing with three types of comments:
- Pre-Comment: A comment on a single line, displayed above the marked instruction. These are also displayed in the Decompiler view.
- Post-Comment: A comment on a single line, displayed below the marked instruction
- Plate Comment: A comment displayed inside a Box, above the Pre-Comment & the instruction
- EOL Comment: A comment displayed on the same line as the instruction, by default it is all the way to the right
The Set… option brings up a dialog that allows you to edit any of the comments, while the quick choices only exist for Pre and Plate comments.
Below is an example with all four populated:
this = (basic_ostream *)
operator<<((basic_ostream<char,std--char_traits<char>> *)cout,
endl<char,std--char_traits<char>>);
this = operator<<<std--char_traits<char>>(this,"Now enter a string to find it in the list:");
operator<<((basic_ostream<char,std--char_traits<char>> *)this,endl<char,std--char_traits<char>>);
getline<char,std--char_traits<char>,std--allocator<char>>
((basic_istream *)cin,(basic_string *)local_38);
bVar1 = FUN_00401600(&DAT_00404348,local_38);
The above code merely prompts the user to enter a string value, and then a reference to that input is stored in local_38
and then
is passed to FUN_00401600
. In this case, the prompt message suggests that the program will use the inputted string to find it in
the list, so this is a good hint as to what FUN_00401600
does. The result of this call is saved in bVar1
, which is used below
to decide which of two code paths to take.
if ((bVar1 & 1) == 0) {
this = operator<<<std--char_traits<char>>((basic_ostream *)cout,"Your string \"");
this = operator<<<char,std--char_traits<char>,std--allocator<char>>
(this,(basic_string *)local_38);
this = operator<<<std--char_traits<char>>(this,"\" was NOT found!");
operator<<((basic_ostream<char,std--char_traits<char>> *)this,endl<char,std--char_traits<char>>)
;
}
else {
this = operator<<<std--char_traits<char>>((basic_ostream *)cout,"Your string \"");
this = operator<<<char,std--char_traits<char>,std--allocator<char>>
(this,(basic_string *)local_38);
this = operator<<<std--char_traits<char>>(this,"\" was found!");
operator<<((basic_ostream<char,std--char_traits<char>> *)this,endl<char,std--char_traits<char>>)
;
}
Inspecting both of the blocks above, it is clear that they are an if
statement, which takes the first block/path if the value of
bVar1 & 1
is 0
, and the second path, for any other value of bVar1
. Comparing the lines closely, you will see that the only
difference between them is the message passed in the third operator<<
call. In the first path (where bVar1
is 0
), the message
reports a failure to find the provided string, while in the second case, the message reports successfully finding the string.
The construct for comparing bVar1 & 1
is performing a comparison on the lower bit of the variable, and is a C++ language implementation
of the bool
single-bit type, and often when you observe this approach being used to implement boolean true
/false
logic in C and
C++.
Side note: It is also important to note that bool
fields can be packed together sometimes, so that the 1
could be any number that is a
whole power of 2, up to the machine word size, to implement tests against bit field structures like this:
struct flags{
bool enabled:1;
bool active:1;
bool occupied:1;
...
};