Java Malware and Obfuscation
by Coleman Kane
This lecture will discuss some more advanced topics in Java program analysis. One big one is obfuscation. This technique attempts to introduce confusion and misdirection to the analyst by altering the program flow, naming conventions, and other properties of a program in randomly-generated ways that would have been difficult for a programmer to write in the first place.
Beginning Obfuscation
There are a largee number of tools out there for obfuscation. One tool that I picked out is named ProGuard. This tool is designed to complicate reverse engineering of a program - often a feature desired by programmers that are concerned with the privacy of the source code that they wrote. Unlike many machine languages for traditional CPUs, you’ve seen that Java code, when decompiled, can ideally reproduce the program written by the original author. Survivability of malware in the wild is often a function of how quickly an analyst may be able to reverse engineer a smaple. Therefore, these tools are often even more desirable to malware authors, who don’t often have a need for field-serviceability of their deployed code in the wild.
Below, I take the obfuscator for a spin on the ex3.jar
that was created in
the previous walkthrough.
When run through the obfuscator with some default settings, the original JAR structure of
Example3
is changed to contain the following:
2020-04-13 00:16:28 ..... 76 75 META-INF/MANIFEST.MF
2020-04-13 00:16:28 ..... 620 403 Example3.class
2020-04-13 00:16:28 ..... 118 105 a/a.class
Decompiled, a/a.class
contains the following (basically an empty class now, even though
this used to be the SquareClass
):
package a;
public final class a
{
public a()
{
}
}
And the Example3
class now contains the squaring logic embedded within it:
import a.a;
import java.io.PrintStream;
import java.util.Scanner;
public class Example3
{
public Example3()
{
}
public static void main(String args[])
{
args = new Scanner(System.in);
new a();
System.out.println("Hello World!");
System.out.print("Provide an int to square: ");
args = args = (args = args = args.nextInt()) * args;
System.out.print("Squared result: ");
System.out.println(args);
}
}
In the above code, you can see that these lines appear to suggest use of the
a
package, but then it is never actually used later on:
import a.a;
new a();
This is an example of a simple mis-direction. The code suggests that you may need to
look into another source file, but it turns out that that file is a dead-end. Additionally,
though this new class a
is now empty, it was created initially as a rename for the old
SquareClass
. Obfuscators like this have the ability to navigate Java’s rather rudimentary
symbol structure to substitute descriptive and self-documenting names in the code with
completely random and arbitrary ones, robbing you of any hints as to what the code may do.
Additionally, the following line of code indicates that a bunch of self-assignments were performed, as well as a multiplication by a value that is the product of a few assignment operations within grouping parentheses. In this case, this is a simplified example of an arithmetic obfuscation that is intended to make it significantly more difficult to figure out what is going on here, without first unwinding some of these steps:
args = args = (args = args = args.nextInt()) * args;
These are some simplified examples of obfuscation, with hopefully enough of the original code remaining that you can understand a handful of teh tricks that are used.
Obfuscated Malware Example: jRAT
Java is becoming increasingly popular, as it has the capability for an author to distribute a single program which can be executed and interact with multiple operating systems. Such is the case with Adwind as well as jRAT/JACKSBOT.
In this session I will perform some analysis of a malware sample that can be seen online at
this Hybrid-Analysis Link. It appears to have been uploaded under the name jrat.jar
which could
be a helpful hint about what malware sample it is, or (as it is always important to be conscious
of mistakes) this could merely be a mislabeling by someone else.
Analysis of the JAR(ZIP) directory
As before, I can use 7z l jrat.jar
to list the contents of the JAR file. In this case, there
are over 100 files contained within it:
2017-09-12 17:10:04 ..... 233 186 META-INF/MANIFEST.MF
2017-09-12 17:10:04 ..... 1471 748 UrulosuvoKupi/IvolisuvAkipu/UhapakaVakipI.class
2017-09-12 17:10:04 ..... 2054 1021 UrulosuvoKupi/IvolisuvAkipu/EpipiKevekupA.class
2017-09-12 17:10:04 ..... 2114 1043 UrulosuvoKupi/IvolisuvAkipu/OzekakuVukipI.class
2017-09-12 17:10:04 ..... 295 229 UrulosuvoKupi/IvolisuvAkipu/OweriSuvokiPi.class
2017-09-12 17:10:04 ..... 1994 997 UrulosuvoKupi/IvolisuvAkipu/UpugusOvakePe.class
2017-09-12 17:10:04 ..... 1720 866 UrulosuvoKupi/IvolisuvAkipu/OtiyiSivekEpa.class
2017-09-12 17:10:04 ..... 2683 1297 UrulosuvoKupi/IvolisuvAkipu/OgazuSovokoPi.class
2017-09-12 17:10:04 ..... 295 232 UrulosuvoKupi/IvolisuvAkipu/AkerusOvikipA.class
2017-09-12 17:10:04 ..... 3330 1580 UrulosuvoKupi/IvolisuvAkipu/AfabaKovikUpe.class
...
Some of these are classes, but also many of these are files without extensions, such as:
UrulosuvoKupi/IvolisuvAkipu/AcihesuvukupA/aue9vpuh36bgmm4hfdivare2er2e9m3d7lfsun6s79al1e/8
UrulosuvoKupi/IvolisuvAkipu/AcihesuvukupA/uh7be4gs65e888ao1lur6sd2f1vhopsgub2ma3a719m1518teef27t7l44vjbk7
...
Counting the different file types, I can see that there are:
- 103
*.class
files - 46 files without extensions
- 1
META-INF/MANIFEST.MF
, which is to be expected for a properly-formed JAR
As was done before, I can use the 7z e -so jrat.jar META-INF/MANIFEST.MF
to display the
contents of this file to stdout. This allows me to figure out which, of the 100+ classes,
is the entry point for execution:
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.8.0
X-COMMENT: Main-Class will be added automatically by build
Class-Path:
Created-By: 1.8.0_25-b18 (Oracle Corporation)
Main-Class: UrulosuvoKupi.IvolisuvAkipu.InihuSevokEpo
So, now I have learned that the class UrulosuvoKupi.IvolisuvAkipu.InihuSevokEpo
is where Java will
begin executing upon loading the program.
Quick Look at the *.class
File Sizes
One way that might be helpful in identifying where significant code lies often is to look at
the file sizes. I used the following command line code to display the contents of the JAR, filter
down to the *.class
files, and then sort those (on the 4th column) by size, in ascending order
so the larges classes will be present at the end of the output:
7z l jrat.jar | grep -- \\.class | sort -n -k 4
This produced an output that ended with the following lines:
...
2017-09-12 17:10:04 ..... 3070 1463 UrulosuvoKupi/IvolisuvAkipu/OjaposivAkope.class
2017-09-12 17:10:04 ..... 3116 1482 UrulosuvoKupi/IvolisuvAkipu/OximusAvukipi.class
2017-09-12 17:10:04 ..... 3318 1607 UrulosuvoKupi/IvolisuvAkipu/UjaneSavukIpu.class
2017-09-12 17:10:04 ..... 3330 1580 UrulosuvoKupi/IvolisuvAkipu/AfabaKovikUpe.class
2017-09-12 17:10:04 ..... 3463 1696 UrulosuvoKupi/IvolisuvAkipu/UgicosovEkipu.class
2017-09-12 17:10:04 ..... 4341 2109 UrulosuvoKupi/IvolisuvAkipu/IwitiSavekaPa.class
2017-09-12 17:10:04 ..... 4727 2285 UrulosuvoKupi/IvolisuvAkipu/UfexiKevukEpo.class
2017-09-12 17:10:04 ..... 4778 2315 UrulosuvoKupi/IvolisuvAkipu/IgiviKuvekePe.class
2017-09-12 17:10:04 ..... 6332 2994 UrulosuvoKupi/IvolisuvAkipu/UsobeKevukUpi.class
Again, this isn’t a guarantee that UsobeKevukUpi.class
will have all of the significant code in it,
but it is an opportunity to record more differentiating data about the classes within the JAR that
may prove helpful in guiding us in a productive analysis direction.
Doing the same thing, but on all files that are not the *.class
could look like this:
7z l jrat.jar | grep -v -- \\.class | grep ^2017 | sort -n -k 4
This yields, again, sorted output, but the results are less incremental, with the following three significantly-large files present at the end of the list:
2017-09-12 17:10:04 ..... 5656 5661 UrulosuvoKupi/IvolisuvAkipu/AcihesuvukupA/aue9vpuh36bgmm4hfdivare2er2e9m3d7lfsun6s79al1e/8
2017-09-12 17:10:04 ..... 224147 224217 UrulosuvoKupi/IvolisuvAkipu/AcihesuvukupA/43biqnrdkkukmocf311qd7bluh8l1iki
2017-09-12 17:10:04 ..... 236058 236133 UrulosuvoKupi/IvolisuvAkipu/AcihesuvukupA/8vlglhskhjv5gkrifhplabs9rvfgn56fnopap7hjt4eq1fvqkl95simfoc7
Following this, you can load the entire JAR into Ghidra using the import method and following the prompts on the Batch Import dialog that was covered in the last walkthrough.
Peek at InihuSevokEpo
in Ghidra
A quick look at the function that was defined as the Main-Class
in MANIFEST.MF
reveals it
to contain 4 bytes of JVM code, which consists of 2 instructions:
void __stdcall main_java.lang.String[]_void(void)
assume alignmentPad = 0x3
void <VOID> <RETURN>
main_java.lang.String[]_void XREF[1]: ram:e0000004(*)
ram:00010008 b8 00 02 invokestatic offset CPOOL[2] = null
ram:0001000b b1 return
This evaluates to the following source code, per Ghidra’s decompiler. Recognize that the class names
are all stored in separate data tables within the JVM, and referenced via the XREF
annotations
listed in the code.
/* Flags:/* Flags:
ACC_PUBLIC
ACC_STATIC
public static void AmoboKevukEpu() throws java.lang.Throwable */
void AmoboKevukEpu_void(void)
{
int iVar1;
int iVar2;
int iVar3;
dword pdVar4;
dword pdVar5;
pdVar5 = ApolisovaKupu.OjalaSivikuPu;
iVar1 = EferuSevekiPe.ApurisEvikopa(0x73);
iVar2 = UdihesAvikepE.EyohosoVokepA(-0x28);
iVar1 = iVar1 - iVar2;
pdVar4 = EvohusaVakipU.EqohesuVokopE;
iVar2 = AyerosEvekipe.IxuveSovekUpe(0x50);
iVar3 = IzorasiVakipI.AgaraSovikEpi(-0x50);
pdVar5[iVar1] = pdVar4[iVar2 - iVar3];
pdVar5 = InolosovuKupi.OfelosAvakepo;
iVar1 = AtahuSevekApe.IzuhasUvikuPe(0x1f);
iVar2 = OlovesAvekopO.EhuveSivukUpo(-0x1c);
iVar1 = iVar1 - iVar2;
pdVar4 = OdulisOvekuPi.UyaluSavakIpi;
iVar2 = AboheSuvekIpu.EwohiSevakIpe(0x18b);
iVar3 = AsaviSavakOpi.IkoveSavikUpe(-0x1b);
pdVar5[iVar1] = pdVar4[iVar2 - iVar3];
AfabaKovikUpe.EpobiKovekOpe();
pdVar5 = InolosovuKupi.OfelosAvakepo;
iVar1 = AyerosEvekipe.IxuveSovekUpe(-0x1a6);
iVar2 = EverisEvekuPo.UqorisAvikaPo(-0x47);
iVar1 = iVar1 + iVar2;
pdVar4 = AbolisUvikePo.IweliSuvakapE;
iVar2 = UxaroSivikIpa.AsiresUvekaPu(-0x261);
iVar3 = OciraSevukupA.AnereSivakapU(-0xb1);
pdVar5[iVar1] = pdVar4[iVar2 + iVar3];
return;
}
ACC_PUBLIC
ACC_STATIC
public static void main(java.lang.String[]) throws java.lang.Throwable */
void main_java.lang.String[]_void(void)
{
IgabukIvikepE.AmoboKevukEpu();
return;
}
The call to IgabukIvikepE.AmoboKevukEpu()
tells us there’s another function inside another class
where the next step in the program lives.
Digging into IgabukIvikepE
The second class that we are directed to, IgabukIvikepE
, also has just one function, which is what
was called from the main
, above. Rather than simply passing control to a new function, this one
has a bit more complexity to it, and calls 12 other functions in other classes, and performs some
arithmetic. I will skip displaying the JVM bytecode disassembly, and just focus on the decompiled
source code:
/* Flags:
ACC_PUBLIC
ACC_STATIC
public static void AmoboKevukEpu() throws java.lang.Throwable */
void AmoboKevukEpu_void(void)
{
int iVar1;
int iVar2;
int iVar3;
dword pdVar4;
dword pdVar5;
pdVar5 = ApolisovaKupu.OjalaSivikuPu;
iVar1 = EferuSevekiPe.ApurisEvikopa(0x73);
iVar2 = UdihesAvikepE.EyohosoVokepA(-0x28);
iVar1 = iVar1 - iVar2;
pdVar4 = EvohusaVakipU.EqohesuVokopE;
iVar2 = AyerosEvekipe.IxuveSovekUpe(0x50);
iVar3 = IzorasiVakipI.AgaraSovikEpi(-0x50);
pdVar5[iVar1] = pdVar4[iVar2 - iVar3];
pdVar5 = InolosovuKupi.OfelosAvakepo;
iVar1 = AtahuSevekApe.IzuhasUvikuPe(0x1f);
iVar2 = OlovesAvekopO.EhuveSivukUpo(-0x1c);
iVar1 = iVar1 - iVar2;
pdVar4 = OdulisOvekuPi.UyaluSavakIpi;
iVar2 = AboheSuvekIpu.EwohiSevakIpe(0x18b);
iVar3 = AsaviSavakOpi.IkoveSavikUpe(-0x1b);
pdVar5[iVar1] = pdVar4[iVar2 - iVar3];
AfabaKovikUpe.EpobiKovekOpe();
pdVar5 = InolosovuKupi.OfelosAvakepo;
iVar1 = AyerosEvekipe.IxuveSovekUpe(-0x1a6);
iVar2 = EverisEvekuPo.UqorisAvikaPo(-0x47);
iVar1 = iVar1 + iVar2;
pdVar4 = AbolisUvikePo.IweliSuvakapE;
iVar2 = UxaroSivikIpa.AsiresUvekaPu(-0x261);
iVar3 = OciraSevukupA.AnereSivakapU(-0xb1);
pdVar5[iVar1] = pdVar4[iVar2 + iVar3];
return;
}
This appears to be another obfuscation technique, where 4-line sequences of operations seem to be constructed that match the following recipe:
x = class1.data;
y = class2.func1(i);
z = class3.func2(j);
k = i + j; // or some variation of this
We won’t walk through all of these, but we will look at the first two function calls and open their classes in Ghidra for analysis:
iVar1 = EferuSevekiPe.ApurisEvikopa(0x73);
iVar2 = UdihesAvikepE.EyohosoVokepA(-0x28);
This gives us the two classnames we want to analyze next: EferuSevekiPe
and UdihesAvikepE
.
Analysis of EferuSevekiPe
and UdihesAvikepE
Opening these in Ghidra gives us two classes, each with one simple function.
EferuSevekiPe
int ApurisEvikopa_int_int(int param1)
{
return param1 + 0xb3;
}
UdihesAvikepE
int EyohosoVokepA_int_int(int param1)
{
return param1 + 0x11f;
}
As can be seen above, both functions perform a simple addition and then return the answer to the caller. What’s going on here (and in most of the function calls of the original class) is that the obfuscation inserted a whole bunch of simple mathematical operations who’s results are discarded.
Ghidra gives us a great UI to visualize these within when working on a case-by-case basis, but
this can quickly become tedious when attempting to unwind all of this using Ghidra alone. This
is where tools such as yara
, jad
, and the other command-line utilities come in handy, as
it is easy and flexible to script them into bulk operations.
Using jad
to extract the contents
I’d recommend creating a new folder named jrat_output
to unzip the contents into:
mkdir -p jrat_output
cd jrat_output
7z x ../jrat.jar
Then, since all of the *.class
files live within the sub-folder UrulosuvoKupi/IvolisuvAkipu
,
the following command can be run to decompile all of the Java into new *.jad
files that are in
the same folder as the corresponding *.class
file:
jad -r UrulosuvoKupi/IvolisuvAkipu/*.class
Navigating by import
Many of you who are familiar with Java already know that if you need to perform a number of common
operations to provide interaction with the system (such as network communication) and user input,
you can’t do this solely using the core Java operations. You need to import
external libraries.
For this reason, searching the *.jad
files for any import
declarations might be insightful:
grep ^import *.jad
Yields the following output:
AkoxoKevekuPe.jad:import java.io.ByteArrayOutputStream;
AnarukOvukuPa.jad:import java.math.BigInteger;
EkogasavOkupe.jad:import java.lang.reflect.Method;
EqenuKovakoPo.jad:import java.util.Random;
EroyosiVekepE.jad:import java.util.Random;
EsuciKevukipa.jad:import java.util.Random;
IcafisavUkapo.jad:import java.lang.reflect.Method;
IduxikeVikapo.jad:import java.io.ByteArrayOutputStream;
IrozisoVikopu.jad:import java.lang.reflect.Method;
IwitiSavekaPa.jad:import java.lang.reflect.Method;
IxerikoVukupe.jad:import java.math.BigInteger;
OqusuKivokUpo.jad:import java.io.InputStream;
OzekakuVukipI.jad:import java.net.URLConnection;
UdureKavukapE.jad:import java.math.BigInteger;
UfexiKevukEpo.jad:import java.lang.reflect.Method;
UgicosovEkipu.jad:import java.lang.reflect.Method;
UjaneSavukIpu.jad:import java.lang.reflect.Method;
UkolokaVakipE.jad:import java.net.URL;
UqubuSavukApu.jad:import java.lang.reflect.Method;
UvekoKuvikEpo.jad:import java.io.ByteArrayOutputStream;
This is only about 20 classes, rather than the 103 classes that we have source code for. Additionally, just looking at these class imports and navigating the Java API Documentation, I can already draw a number of conclusions from the output:
- The code makes a connection, likely via TCP (implied by most URL types)
- The code also creates (or parses) URL objects
- The code utilizes the Reflection API to evaluate some methods at run-time
We can use this technique to help us narrow down the artifacts we need to hand-analyze to get a
better understanding of the core components of the malware. We can look at the code in OzekakuVukipI
for some understanding of how it is using the imported URLConnection
.
OzekakuVukipI
analysis
The lone function defined within this class is below:
void EgikokovukEpa_void(void)
{
URLConnection objectRef;
int iVar1;
int iVar2;
int iVar3;
InputStream pIVar4;
dword pdVar5;
dword pdVar6;
pdVar6 = OgeloSavukApe.OmoluSavikapi;
iVar1 = AravesuvoKepa.UveveSevokoPi(0x129);
iVar2 = IgehesuVekepU.EmoheSuvikUpu(-0x23);
iVar1 = iVar1 - iVar2;
pdVar5 = AxuhaSovakopE.OsuhasAvukopE;
iVar2 = AkerusOvikipA.OliraSuvekuPe(-0x18a);
iVar3 = AboheSuvekIpu.EwohiSevakIpe(-0x44);
pdVar6[iVar1] = pdVar5[iVar2 - iVar3];
pdVar6 = EkahasEvikoPe.AlohusUvakipE;
iVar1 = UxaroSivikIpa.AsiresUvekaPu(-0x15e);
iVar2 = OweriSuvokiPi.IdirasivEkupe(-0x8b);
iVar1 = iVar1 - iVar2;
pdVar5 = AhuhaSovokIpu.OrohoSivikOpo;
iVar2 = AkerusOvikipA.OliraSuvekuPe(-0xd8);
iVar3 = OlovesAvekopO.EhuveSivukUpo(-0xb7);
objectRef = pdVar5[iVar2 ^ iVar3].checkcast(URLConnection);
pIVar4 = objectRef.getInputStream();
pdVar6[iVar1] = pIVar4;
pdVar6 = ApolisovaKupu.OjalaSivikuPu;
iVar1 = EferuSevekiPe.ApurisEvikopa(-0x253);
iVar2 = UxaroSivikIpa.AsiresUvekaPu(-0x2f);
iVar1 = iVar1 + iVar2;
pdVar5 = InolosovuKupi.OfelosAvakepo;
iVar2 = IpohasUvekopU.AjehiseVakipi(-0x188);
iVar3 = OlovesAvekopO.EhuveSivukUpo(-0xbf);
pdVar6[iVar1] = pdVar5[iVar2 + iVar3];
pdVar6 = OdulisOvekuPi.UyaluSavakIpi;
iVar1 = OciraSevukupA.AnereSivakapU(0x44);
iVar2 = IzorasiVakipI.AgaraSovikEpi(-0x52);
iVar1 = iVar1 - iVar2;
pdVar5 = OquleSuvokEpi.OcalasEvokePi;
iVar2 = UdihesAvikepE.EyohosoVokepA(-0x188);
iVar3 = EverisEvekuPo.UqorisAvikaPo(-0x55);
pdVar6[iVar1] = pdVar5[iVar2 + iVar3];
return;
}
The following block of code depicts where the URLConnection
object
is created in the objectRef
variable, and then stored as a sub element
in the array pointed at by pdVar6
:
...
pdVar6[iVar1] = pdVar5[iVar2 - iVar3];
pdVar6 = EkahasEvikoPe.AlohusUvakipE;
iVar1 = UxaroSivikIpa.AsiresUvekaPu(-0x15e);
iVar2 = OweriSuvokiPi.IdirasivEkupe(-0x8b);
iVar1 = iVar1 - iVar2;
pdVar5 = AhuhaSovokIpu.OrohoSivikOpo;
iVar2 = AkerusOvikipA.OliraSuvekuPe(-0xd8);
iVar3 = OlovesAvekopO.EhuveSivukUpo(-0xb7);
objectRef = pdVar5[iVar2 ^ iVar3].checkcast(URLConnection);
pIVar4 = objectRef.getInputStream();
pdVar6[iVar1] = pIVar4;
pdVar6 = ApolisovaKupu.OjalaSivikuPu;
I have broken up the code to demonstrate where pdVar6
is first overwritten with
the static data that is defined within EkahasEvikoPe.AlohusUvakipE
, showing how
the index is stored in iVar1
and then later used to store the new URLConnection
input stream in pdVar6[iVar1]
. You can see this by looking at the various assignment
operations going on.
When analyzing obfuscated code, approaching the problem by identifying the lifespan for certain key data, and the dependencies between variables is often key. In this case, if we wanted to, we could modify the code above such that successive assignments that overwrite existing data can instead be represented as assignments to a new “version” of the local variable.
For example, the following lines involved storing the result of a single function
call in iVar1
, and then using that, plus iVar2
to overwrite iVar1
with an
aritmetic operation:
iVar1 = UxaroSivikIpa.AsiresUvekaPu(-0x15e);
iVar2 = OweriSuvokiPi.IdirasivEkupe(-0x8b);
iVar1 = iVar1 - iVar2;
We could use this “variable renaming” technique to modify the code in this fashion,
adding v1
, v2,
… vN
for each newly-assigned variable version.
iVar1v1 = UxaroSivikIpa.AsiresUvekaPu(-0x15e);
iVar2v1 = OweriSuvokiPi.IdirasivEkupe(-0x8b);
iVar1v2 = iVar1v1 - iVar2v1;
The benefit of doing this is it gives us the ability to define variable dependency relationships that can help us map out our understanding of the code better. The goal being that, if we can alter the code such that each variable is written to exactly once, then we can associate variables with the values (the data) that they manage within the program.
iVar1v2
← (iVar1v1
,iVar1v2
)
For the larger block of code, the rewrite may look more like this:
pdVar6v1 = EkahasEvikoPe.AlohusUvakipE;
iVar1v1 = UxaroSivikIpa.AsiresUvekaPu(-0x15e);
iVar2v1 = OweriSuvokiPi.IdirasivEkupe(-0x8b);
iVar1v2 = iVar1v1 - iVar2v1;
pdVar5v1 = AhuhaSovokIpu.OrohoSivikOpo;
iVar2v2 = AkerusOvikipA.OliraSuvekuPe(-0xd8);
iVar3v1 = OlovesAvekopO.EhuveSivukUpo(-0xb7);
objectRefv1 = pdVar5v1[iVar2v2 ^ iVar3v1].checkcast(URLConnection);
pIVar4v1 = objectRefv1.getInputStream();
pdVar6v1[iVar1] = pIVar4v1;
pdVar6v2 = ApolisovaKupu.OjalaSivikuPu;
With the following dependency relationships:
iVar1v2
← (iVar1v1
,iVar2v1
)objectRefv1
← (iVar2v2
,iVar3v1
,pdVar5v1
)pIVar4v1
←objectRefv1
pdVar6v1[iVar1v2]
← (iVar1v2
,pIVar4v1
)
Such an exercise can help diagram the relationships in such a way as to help isolate the input and output variables of a section from the intermediate variables.
Recall that objectRefv1
is where the URLConnection
object was stored, and what the
code used to call .getInputStream()
. Using this dependency graph, I can isolate that
I would limit my exploration of the code to the inputs to the 3 local variables iVar2v2
,
iVar3v1
, and pdVar5v1
, if I want to learn more about how this code gets the data that
is loaded in as a URLConnection
instance.
Helping Using Optimization Tools
I mentioned earlier that there’s a utility called ProGuard
that can be used to try to obfuscate your code. The tool contains a large number of features, one of
which is a code optimizer that can do a number of things, including collapsing disparate code into
single classes. If we use the tool, disable its obfuscation features, and enable optimization features,
we can enable a utility like this to do a lot of heavy lifting for us. It will take the jar
as
input, and you can have it produce another jar
as output. One caveat is that you will need to have
the openjdk-8-jdk
package installed in the VM, as it needs to use some of the library classes from
that Java version.
apt install -y openjdk-8-jdk
I downloaded v 6.3.0 beta 1 from the following URL:
Within the bundle is a bin/proguardgui.sh
which, when executed, brings up the GUI. The GUI has
a “Wizard”-like interface, with tabs that are selections for the various features on the left,
and options and buttons that allow you to tweak these on the right.
First I use the “Add input…” picker to choose jrat.jar
and then use the “Add output…” picker
to set whta my new output JAR is going to be. Additionally, I had to Remove the existing reference to
rt.jar
in the lower pane, and then “Add…” a reference to the OpenJDK 8 one at
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar
in order for the tool to find all the class libraries.
I skipped over the Shrinking pane, content with using the defaults there.
In the Obfuscation pane, I uncheck the top Obfuscate check box, deactivating the entire module. I am wanting to simplify the application, and this will only make things worse.
Under the Optimization pane, I left the defaults selected, but additionally checked the following boxes:
- Allow access modification
- Merge interfaces aggressively
Once done, go to the “Process” pane and click the Process! button, and after a bit of waiting on the work to complete, it will report that it has written the new JAR.
Using 7z
to list the contents of the jrat_out.jar
, we can see that there are only 13 classes defined
now within it:
2020-04-14 16:18:14 ..... 233 186 META-INF/MANIFEST.MF
2020-04-14 16:18:14 ..... 280 222 UrulosuvoKupi/IvolisuvAkipu/OdulisOvekuPi.class
2020-04-14 16:18:14 ..... 280 220 UrulosuvoKupi/IvolisuvAkipu/AxuhaSovakopE.class
2020-04-14 16:18:14 ..... 280 219 UrulosuvoKupi/IvolisuvAkipu/InolosovuKupi.class
2020-04-14 16:18:14 ..... 280 219 UrulosuvoKupi/IvolisuvAkipu/ApolisovaKupu.class
2020-04-14 16:18:14 ..... 280 222 UrulosuvoKupi/IvolisuvAkipu/AbolisUvikePo.class
2020-04-14 16:18:14 ..... 280 219 UrulosuvoKupi/IvolisuvAkipu/OquleSuvokEpi.class
2020-04-14 16:18:14 ..... 280 220 UrulosuvoKupi/IvolisuvAkipu/AhuhaSovokIpu.class
2020-04-14 16:18:14 ..... 9259 4916 UrulosuvoKupi/IvolisuvAkipu/InihuSevokEpo.class
2020-04-14 16:18:14 ..... 27973 14664 UrulosuvoKupi/IvolisuvAkipu/AboheSuvekIpu.class
2020-04-14 16:18:14 ..... 280 222 UrulosuvoKupi/IvolisuvAkipu/EvohusaVakipU.class
2020-04-14 16:18:14 ..... 280 221 UrulosuvoKupi/IvolisuvAkipu/EkahasEvikoPe.class
2020-04-14 16:18:14 ..... 280 222 UrulosuvoKupi/IvolisuvAkipu/ItoleSevekEpa.class
2020-04-14 16:18:14 ..... 280 219 UrulosuvoKupi/IvolisuvAkipu/OgeloSavukApe.class
...
Looking at the file sizes, it is also apparent that the majority of the code has been consolidated into 2 classes now:
2020-04-14 16:18:14 ..... 9259 4916 UrulosuvoKupi/IvolisuvAkipu/InihuSevokEpo.class
2020-04-14 16:18:14 ..... 27973 14664 UrulosuvoKupi/IvolisuvAkipu/AboheSuvekIpu.class
Looking at each one of the classes that are size 280
, after decompiling with jad
reveals they all
match some variation on the following structure:
public final class AbolisUvikePo
{
static Object IweliSuvakapE[] = new Object[46];
}
So, rather than having a bunch of small classes implementing arbitrary arithmetic that may or may
not be discarded, we now have just a handful of 11 classes that are instantiating Object
arrays
in memory, for use by the rest of the code.
If you next look at the InihuSevokEpo.jad
you will see the main
function is still there, but
a lot of the arithmetic is brought into the function, and the code is now a long sequence of
assignment statements from one array to another, with obfuscated indices. Additionally, a good
number of these are being assigned from newly-initialized arrays, so are now pretty easy to spot,
since the amount of classes that really just contain global array variables has been significantly
minimized.
As well, you’ll see some lines that look like this:
AhuhaSovokIpu.OrohoSivikOpo[AboheSuvekIpu.UqorisAvikaPo(-87) ^ AboheSuvekIpu.ApurisEvikopa(-115)] = new String(new char[] {
... ...
(char)(AboheSuvekIpu.IkoveSavikUpe(-176) ^ AboheSuvekIpu.IdirasivEkupe(-125))
});
This is constructing String
objects at run-time that will be used later on by the program.
Additionally, the AboheSuvekIpu.jad
and AboheSuvekIpu.class
files now contain the source code and
the binary code, respectively, for simplified versions of many of the critical functions that were
spread around multiple classes. This can enable you to navigate them within a single Ghidra window
now, which wasn’t possible before.
Conclusion
This is intended to cover some obfuscation approaches that are common in the wild, and give you a good understanding for how obfuscation can be scaled to a whole program by repeating a handful of simple patterns. We didn’t completely deconstruct the malware sample, which itself could be a dedicated weeks-long exercise, much of which will be necessarily tedious, to get the answers you need.
At this point, however, it might also be helpful to load the Java code into a VirtualBox VM so that you could step through these lines with the help of a Java Debugger.
tags: malware java ghidra lecture