CS6038/CS5138 Malware Analysis, UC

Course content for UC Malware Analysis

View on GitHub
13 April 2020

Java Malware and Obfuscation

by Coleman Kane

This lecture will discuss some more advanced topics in Java program analysis. One big one is obfuscation. This technique attempts to introduce confusion and misdirection to the analyst by altering the program flow, naming conventions, and other properties of a program in randomly-generated ways that would have been difficult for a programmer to write in the first place.

Beginning Obfuscation

There are a largee number of tools out there for obfuscation. One tool that I picked out is named ProGuard. This tool is designed to complicate reverse engineering of a program - often a feature desired by programmers that are concerned with the privacy of the source code that they wrote. Unlike many machine languages for traditional CPUs, you’ve seen that Java code, when decompiled, can ideally reproduce the program written by the original author. Survivability of malware in the wild is often a function of how quickly an analyst may be able to reverse engineer a smaple. Therefore, these tools are often even more desirable to malware authors, who don’t often have a need for field-serviceability of their deployed code in the wild.

Below, I take the obfuscator for a spin on the ex3.jar that was created in the previous walkthrough.

When run through the obfuscator with some default settings, the original JAR structure of Example3 is changed to contain the following:

2020-04-13 00:16:28 .....           76           75  META-INF/MANIFEST.MF
2020-04-13 00:16:28 .....          620          403  Example3.class
2020-04-13 00:16:28 .....          118          105  a/a.class

Decompiled, a/a.class contains the following (basically an empty class now, even though this used to be the SquareClass):

package a;


public final class a
{

    public a()
    {
    }
}

And the Example3 class now contains the squaring logic embedded within it:

import a.a;
import java.io.PrintStream;
import java.util.Scanner;

public class Example3
{

    public Example3()
    {
    }

    public static void main(String args[])
    {
        args = new Scanner(System.in);
        new a();
        System.out.println("Hello World!");
        System.out.print("Provide an int to square: ");
        args = args = (args = args = args.nextInt()) * args;
        System.out.print("Squared result: ");
        System.out.println(args);
    }
}

In the above code, you can see that these lines appear to suggest use of the a package, but then it is never actually used later on:

import a.a;
new a();

This is an example of a simple mis-direction. The code suggests that you may need to look into another source file, but it turns out that that file is a dead-end. Additionally, though this new class a is now empty, it was created initially as a rename for the old SquareClass. Obfuscators like this have the ability to navigate Java’s rather rudimentary symbol structure to substitute descriptive and self-documenting names in the code with completely random and arbitrary ones, robbing you of any hints as to what the code may do.

Additionally, the following line of code indicates that a bunch of self-assignments were performed, as well as a multiplication by a value that is the product of a few assignment operations within grouping parentheses. In this case, this is a simplified example of an arithmetic obfuscation that is intended to make it significantly more difficult to figure out what is going on here, without first unwinding some of these steps:

args = args = (args = args = args.nextInt()) * args;

These are some simplified examples of obfuscation, with hopefully enough of the original code remaining that you can understand a handful of teh tricks that are used.

Obfuscated Malware Example: jRAT

Java is becoming increasingly popular, as it has the capability for an author to distribute a single program which can be executed and interact with multiple operating systems. Such is the case with Adwind as well as jRAT/JACKSBOT.

In this session I will perform some analysis of a malware sample that can be seen online at this Hybrid-Analysis Link. It appears to have been uploaded under the name jrat.jar which could be a helpful hint about what malware sample it is, or (as it is always important to be conscious of mistakes) this could merely be a mislabeling by someone else.

Analysis of the JAR(ZIP) directory

As before, I can use 7z l jrat.jar to list the contents of the JAR file. In this case, there are over 100 files contained within it:

2017-09-12 17:10:04 .....          233          186  META-INF/MANIFEST.MF
2017-09-12 17:10:04 .....         1471          748  UrulosuvoKupi/IvolisuvAkipu/UhapakaVakipI.class
2017-09-12 17:10:04 .....         2054         1021  UrulosuvoKupi/IvolisuvAkipu/EpipiKevekupA.class
2017-09-12 17:10:04 .....         2114         1043  UrulosuvoKupi/IvolisuvAkipu/OzekakuVukipI.class
2017-09-12 17:10:04 .....          295          229  UrulosuvoKupi/IvolisuvAkipu/OweriSuvokiPi.class
2017-09-12 17:10:04 .....         1994          997  UrulosuvoKupi/IvolisuvAkipu/UpugusOvakePe.class
2017-09-12 17:10:04 .....         1720          866  UrulosuvoKupi/IvolisuvAkipu/OtiyiSivekEpa.class
2017-09-12 17:10:04 .....         2683         1297  UrulosuvoKupi/IvolisuvAkipu/OgazuSovokoPi.class
2017-09-12 17:10:04 .....          295          232  UrulosuvoKupi/IvolisuvAkipu/AkerusOvikipA.class
2017-09-12 17:10:04 .....         3330         1580  UrulosuvoKupi/IvolisuvAkipu/AfabaKovikUpe.class
...

Some of these are classes, but also many of these are files without extensions, such as:

UrulosuvoKupi/IvolisuvAkipu/AcihesuvukupA/aue9vpuh36bgmm4hfdivare2er2e9m3d7lfsun6s79al1e/8
UrulosuvoKupi/IvolisuvAkipu/AcihesuvukupA/uh7be4gs65e888ao1lur6sd2f1vhopsgub2ma3a719m1518teef27t7l44vjbk7
...

Counting the different file types, I can see that there are:

As was done before, I can use the 7z e -so jrat.jar META-INF/MANIFEST.MF to display the contents of this file to stdout. This allows me to figure out which, of the 100+ classes, is the entry point for execution:

Manifest-Version: 1.0
Ant-Version: Apache Ant 1.8.0
X-COMMENT: Main-Class will be added automatically by build
Class-Path:
Created-By: 1.8.0_25-b18 (Oracle Corporation)
Main-Class: UrulosuvoKupi.IvolisuvAkipu.InihuSevokEpo

So, now I have learned that the class UrulosuvoKupi.IvolisuvAkipu.InihuSevokEpo is where Java will begin executing upon loading the program.

Quick Look at the *.class File Sizes

One way that might be helpful in identifying where significant code lies often is to look at the file sizes. I used the following command line code to display the contents of the JAR, filter down to the *.class files, and then sort those (on the 4th column) by size, in ascending order so the larges classes will be present at the end of the output:

7z l jrat.jar | grep -- \\.class | sort -n -k 4

This produced an output that ended with the following lines:

...
2017-09-12 17:10:04 .....         3070         1463  UrulosuvoKupi/IvolisuvAkipu/OjaposivAkope.class
2017-09-12 17:10:04 .....         3116         1482  UrulosuvoKupi/IvolisuvAkipu/OximusAvukipi.class
2017-09-12 17:10:04 .....         3318         1607  UrulosuvoKupi/IvolisuvAkipu/UjaneSavukIpu.class
2017-09-12 17:10:04 .....         3330         1580  UrulosuvoKupi/IvolisuvAkipu/AfabaKovikUpe.class
2017-09-12 17:10:04 .....         3463         1696  UrulosuvoKupi/IvolisuvAkipu/UgicosovEkipu.class
2017-09-12 17:10:04 .....         4341         2109  UrulosuvoKupi/IvolisuvAkipu/IwitiSavekaPa.class
2017-09-12 17:10:04 .....         4727         2285  UrulosuvoKupi/IvolisuvAkipu/UfexiKevukEpo.class
2017-09-12 17:10:04 .....         4778         2315  UrulosuvoKupi/IvolisuvAkipu/IgiviKuvekePe.class
2017-09-12 17:10:04 .....         6332         2994  UrulosuvoKupi/IvolisuvAkipu/UsobeKevukUpi.class

Again, this isn’t a guarantee that UsobeKevukUpi.class will have all of the significant code in it, but it is an opportunity to record more differentiating data about the classes within the JAR that may prove helpful in guiding us in a productive analysis direction.

Doing the same thing, but on all files that are not the *.class could look like this:

7z l jrat.jar | grep -v -- \\.class | grep ^2017 | sort -n -k 4

This yields, again, sorted output, but the results are less incremental, with the following three significantly-large files present at the end of the list:

2017-09-12 17:10:04 .....         5656         5661  UrulosuvoKupi/IvolisuvAkipu/AcihesuvukupA/aue9vpuh36bgmm4hfdivare2er2e9m3d7lfsun6s79al1e/8
2017-09-12 17:10:04 .....       224147       224217  UrulosuvoKupi/IvolisuvAkipu/AcihesuvukupA/43biqnrdkkukmocf311qd7bluh8l1iki
2017-09-12 17:10:04 .....       236058       236133  UrulosuvoKupi/IvolisuvAkipu/AcihesuvukupA/8vlglhskhjv5gkrifhplabs9rvfgn56fnopap7hjt4eq1fvqkl95simfoc7

Following this, you can load the entire JAR into Ghidra using the import method and following the prompts on the Batch Import dialog that was covered in the last walkthrough.

Peek at InihuSevokEpo in Ghidra

A quick look at the function that was defined as the Main-Class in MANIFEST.MF reveals it to contain 4 bytes of JVM code, which consists of 2 instructions:

                          void __stdcall main_java.lang.String[]_void(void)
                              assume alignmentPad = 0x3
        void                 <VOID>           <RETURN>
                          main_java.lang.String[]_void       XREF[1]:       ram:e0000004(*)  
ram:00010008 b8 00 02          invokestatic      offset CPOOL[2]   = null
ram:0001000b b1                return

This evaluates to the following source code, per Ghidra’s decompiler. Recognize that the class names are all stored in separate data tables within the JVM, and referenced via the XREF annotations listed in the code.

/* Flags:/* Flags:
     ACC_PUBLIC
     ACC_STATIC
   
   public static void AmoboKevukEpu() throws java.lang.Throwable  */

void AmoboKevukEpu_void(void)

{
  int iVar1;
  int iVar2;
  int iVar3;
  dword pdVar4;
  dword pdVar5;
  
  pdVar5 = ApolisovaKupu.OjalaSivikuPu;
  iVar1 = EferuSevekiPe.ApurisEvikopa(0x73);
  iVar2 = UdihesAvikepE.EyohosoVokepA(-0x28);
  iVar1 = iVar1 - iVar2;
  pdVar4 = EvohusaVakipU.EqohesuVokopE;
  iVar2 = AyerosEvekipe.IxuveSovekUpe(0x50);
  iVar3 = IzorasiVakipI.AgaraSovikEpi(-0x50);
  pdVar5[iVar1] = pdVar4[iVar2 - iVar3];
  pdVar5 = InolosovuKupi.OfelosAvakepo;
  iVar1 = AtahuSevekApe.IzuhasUvikuPe(0x1f);
  iVar2 = OlovesAvekopO.EhuveSivukUpo(-0x1c);
  iVar1 = iVar1 - iVar2;
  pdVar4 = OdulisOvekuPi.UyaluSavakIpi;
  iVar2 = AboheSuvekIpu.EwohiSevakIpe(0x18b);
  iVar3 = AsaviSavakOpi.IkoveSavikUpe(-0x1b);
  pdVar5[iVar1] = pdVar4[iVar2 - iVar3];
  AfabaKovikUpe.EpobiKovekOpe();
  pdVar5 = InolosovuKupi.OfelosAvakepo;
  iVar1 = AyerosEvekipe.IxuveSovekUpe(-0x1a6);
  iVar2 = EverisEvekuPo.UqorisAvikaPo(-0x47);
  iVar1 = iVar1 + iVar2;
  pdVar4 = AbolisUvikePo.IweliSuvakapE;
  iVar2 = UxaroSivikIpa.AsiresUvekaPu(-0x261);
  iVar3 = OciraSevukupA.AnereSivakapU(-0xb1);
  pdVar5[iVar1] = pdVar4[iVar2 + iVar3];
  return;
}
     ACC_PUBLIC
     ACC_STATIC
   
   public static void main(java.lang.String[]) throws java.lang.Throwable  */

void main_java.lang.String[]_void(void)

{
  IgabukIvikepE.AmoboKevukEpu();
  return;
}

The call to IgabukIvikepE.AmoboKevukEpu() tells us there’s another function inside another class where the next step in the program lives.

Digging into IgabukIvikepE

The second class that we are directed to, IgabukIvikepE, also has just one function, which is what was called from the main, above. Rather than simply passing control to a new function, this one has a bit more complexity to it, and calls 12 other functions in other classes, and performs some arithmetic. I will skip displaying the JVM bytecode disassembly, and just focus on the decompiled source code:

/* Flags:
     ACC_PUBLIC
     ACC_STATIC
   
   public static void AmoboKevukEpu() throws java.lang.Throwable  */

void AmoboKevukEpu_void(void)

{
  int iVar1;
  int iVar2;
  int iVar3;
  dword pdVar4;
  dword pdVar5;
  
  pdVar5 = ApolisovaKupu.OjalaSivikuPu;
  iVar1 = EferuSevekiPe.ApurisEvikopa(0x73);
  iVar2 = UdihesAvikepE.EyohosoVokepA(-0x28);
  iVar1 = iVar1 - iVar2;
  pdVar4 = EvohusaVakipU.EqohesuVokopE;
  iVar2 = AyerosEvekipe.IxuveSovekUpe(0x50);
  iVar3 = IzorasiVakipI.AgaraSovikEpi(-0x50);
  pdVar5[iVar1] = pdVar4[iVar2 - iVar3];
  pdVar5 = InolosovuKupi.OfelosAvakepo;
  iVar1 = AtahuSevekApe.IzuhasUvikuPe(0x1f);
  iVar2 = OlovesAvekopO.EhuveSivukUpo(-0x1c);
  iVar1 = iVar1 - iVar2;
  pdVar4 = OdulisOvekuPi.UyaluSavakIpi;
  iVar2 = AboheSuvekIpu.EwohiSevakIpe(0x18b);
  iVar3 = AsaviSavakOpi.IkoveSavikUpe(-0x1b);
  pdVar5[iVar1] = pdVar4[iVar2 - iVar3];
  AfabaKovikUpe.EpobiKovekOpe();
  pdVar5 = InolosovuKupi.OfelosAvakepo;
  iVar1 = AyerosEvekipe.IxuveSovekUpe(-0x1a6);
  iVar2 = EverisEvekuPo.UqorisAvikaPo(-0x47);
  iVar1 = iVar1 + iVar2;
  pdVar4 = AbolisUvikePo.IweliSuvakapE;
  iVar2 = UxaroSivikIpa.AsiresUvekaPu(-0x261);
  iVar3 = OciraSevukupA.AnereSivakapU(-0xb1);
  pdVar5[iVar1] = pdVar4[iVar2 + iVar3];
  return;
}

This appears to be another obfuscation technique, where 4-line sequences of operations seem to be constructed that match the following recipe:

x = class1.data;
y = class2.func1(i);
z = class3.func2(j);
k = i + j; // or some variation of this

We won’t walk through all of these, but we will look at the first two function calls and open their classes in Ghidra for analysis:

iVar1 = EferuSevekiPe.ApurisEvikopa(0x73);
iVar2 = UdihesAvikepE.EyohosoVokepA(-0x28);

This gives us the two classnames we want to analyze next: EferuSevekiPe and UdihesAvikepE.

Analysis of EferuSevekiPe and UdihesAvikepE

Opening these in Ghidra gives us two classes, each with one simple function.

EferuSevekiPe

int ApurisEvikopa_int_int(int param1)

{
  return param1 + 0xb3;
}

UdihesAvikepE

int EyohosoVokepA_int_int(int param1)

{
  return param1 + 0x11f;
}

As can be seen above, both functions perform a simple addition and then return the answer to the caller. What’s going on here (and in most of the function calls of the original class) is that the obfuscation inserted a whole bunch of simple mathematical operations who’s results are discarded.

Ghidra gives us a great UI to visualize these within when working on a case-by-case basis, but this can quickly become tedious when attempting to unwind all of this using Ghidra alone. This is where tools such as yara, jad, and the other command-line utilities come in handy, as it is easy and flexible to script them into bulk operations.

Using jad to extract the contents

I’d recommend creating a new folder named jrat_output to unzip the contents into:

mkdir -p jrat_output
cd jrat_output
7z x ../jrat.jar

Then, since all of the *.class files live within the sub-folder UrulosuvoKupi/IvolisuvAkipu, the following command can be run to decompile all of the Java into new *.jad files that are in the same folder as the corresponding *.class file:

jad -r UrulosuvoKupi/IvolisuvAkipu/*.class

Many of you who are familiar with Java already know that if you need to perform a number of common operations to provide interaction with the system (such as network communication) and user input, you can’t do this solely using the core Java operations. You need to import external libraries. For this reason, searching the *.jad files for any import declarations might be insightful:

grep ^import *.jad

Yields the following output:

AkoxoKevekuPe.jad:import java.io.ByteArrayOutputStream;
AnarukOvukuPa.jad:import java.math.BigInteger;
EkogasavOkupe.jad:import java.lang.reflect.Method;
EqenuKovakoPo.jad:import java.util.Random;
EroyosiVekepE.jad:import java.util.Random;
EsuciKevukipa.jad:import java.util.Random;
IcafisavUkapo.jad:import java.lang.reflect.Method;
IduxikeVikapo.jad:import java.io.ByteArrayOutputStream;
IrozisoVikopu.jad:import java.lang.reflect.Method;
IwitiSavekaPa.jad:import java.lang.reflect.Method;
IxerikoVukupe.jad:import java.math.BigInteger;
OqusuKivokUpo.jad:import java.io.InputStream;
OzekakuVukipI.jad:import java.net.URLConnection;
UdureKavukapE.jad:import java.math.BigInteger;
UfexiKevukEpo.jad:import java.lang.reflect.Method;
UgicosovEkipu.jad:import java.lang.reflect.Method;
UjaneSavukIpu.jad:import java.lang.reflect.Method;
UkolokaVakipE.jad:import java.net.URL;
UqubuSavukApu.jad:import java.lang.reflect.Method;
UvekoKuvikEpo.jad:import java.io.ByteArrayOutputStream;

This is only about 20 classes, rather than the 103 classes that we have source code for. Additionally, just looking at these class imports and navigating the Java API Documentation, I can already draw a number of conclusions from the output:

  1. The code makes a connection, likely via TCP (implied by most URL types)
  2. The code also creates (or parses) URL objects
  3. The code utilizes the Reflection API to evaluate some methods at run-time

We can use this technique to help us narrow down the artifacts we need to hand-analyze to get a better understanding of the core components of the malware. We can look at the code in OzekakuVukipI for some understanding of how it is using the imported URLConnection.

OzekakuVukipI analysis

The lone function defined within this class is below:

void EgikokovukEpa_void(void)

{
  URLConnection objectRef;
  int iVar1;
  int iVar2;
  int iVar3;
  InputStream pIVar4;
  dword pdVar5;
  dword pdVar6;
  
  pdVar6 = OgeloSavukApe.OmoluSavikapi;
  iVar1 = AravesuvoKepa.UveveSevokoPi(0x129);
  iVar2 = IgehesuVekepU.EmoheSuvikUpu(-0x23);
  iVar1 = iVar1 - iVar2;
  pdVar5 = AxuhaSovakopE.OsuhasAvukopE;
  iVar2 = AkerusOvikipA.OliraSuvekuPe(-0x18a);
  iVar3 = AboheSuvekIpu.EwohiSevakIpe(-0x44);
  pdVar6[iVar1] = pdVar5[iVar2 - iVar3];
  pdVar6 = EkahasEvikoPe.AlohusUvakipE;
  iVar1 = UxaroSivikIpa.AsiresUvekaPu(-0x15e);
  iVar2 = OweriSuvokiPi.IdirasivEkupe(-0x8b);
  iVar1 = iVar1 - iVar2;
  pdVar5 = AhuhaSovokIpu.OrohoSivikOpo;
  iVar2 = AkerusOvikipA.OliraSuvekuPe(-0xd8);
  iVar3 = OlovesAvekopO.EhuveSivukUpo(-0xb7);
  objectRef = pdVar5[iVar2 ^ iVar3].checkcast(URLConnection);
  pIVar4 = objectRef.getInputStream();
  pdVar6[iVar1] = pIVar4;
  pdVar6 = ApolisovaKupu.OjalaSivikuPu;
  iVar1 = EferuSevekiPe.ApurisEvikopa(-0x253);
  iVar2 = UxaroSivikIpa.AsiresUvekaPu(-0x2f);
  iVar1 = iVar1 + iVar2;
  pdVar5 = InolosovuKupi.OfelosAvakepo;
  iVar2 = IpohasUvekopU.AjehiseVakipi(-0x188);
  iVar3 = OlovesAvekopO.EhuveSivukUpo(-0xbf);
  pdVar6[iVar1] = pdVar5[iVar2 + iVar3];
  pdVar6 = OdulisOvekuPi.UyaluSavakIpi;
  iVar1 = OciraSevukupA.AnereSivakapU(0x44);
  iVar2 = IzorasiVakipI.AgaraSovikEpi(-0x52);
  iVar1 = iVar1 - iVar2;
  pdVar5 = OquleSuvokEpi.OcalasEvokePi;
  iVar2 = UdihesAvikepE.EyohosoVokepA(-0x188);
  iVar3 = EverisEvekuPo.UqorisAvikaPo(-0x55);
  pdVar6[iVar1] = pdVar5[iVar2 + iVar3];
  return;
}

The following block of code depicts where the URLConnection object is created in the objectRef variable, and then stored as a sub element in the array pointed at by pdVar6:

...
pdVar6[iVar1] = pdVar5[iVar2 - iVar3];

pdVar6 = EkahasEvikoPe.AlohusUvakipE;
iVar1 = UxaroSivikIpa.AsiresUvekaPu(-0x15e);
iVar2 = OweriSuvokiPi.IdirasivEkupe(-0x8b);
iVar1 = iVar1 - iVar2;
pdVar5 = AhuhaSovokIpu.OrohoSivikOpo;
iVar2 = AkerusOvikipA.OliraSuvekuPe(-0xd8);
iVar3 = OlovesAvekopO.EhuveSivukUpo(-0xb7);
objectRef = pdVar5[iVar2 ^ iVar3].checkcast(URLConnection);
pIVar4 = objectRef.getInputStream();
pdVar6[iVar1] = pIVar4;

pdVar6 = ApolisovaKupu.OjalaSivikuPu;

I have broken up the code to demonstrate where pdVar6 is first overwritten with the static data that is defined within EkahasEvikoPe.AlohusUvakipE, showing how the index is stored in iVar1 and then later used to store the new URLConnection input stream in pdVar6[iVar1]. You can see this by looking at the various assignment operations going on.

When analyzing obfuscated code, approaching the problem by identifying the lifespan for certain key data, and the dependencies between variables is often key. In this case, if we wanted to, we could modify the code above such that successive assignments that overwrite existing data can instead be represented as assignments to a new “version” of the local variable.

For example, the following lines involved storing the result of a single function call in iVar1, and then using that, plus iVar2 to overwrite iVar1 with an aritmetic operation:

iVar1 = UxaroSivikIpa.AsiresUvekaPu(-0x15e);
iVar2 = OweriSuvokiPi.IdirasivEkupe(-0x8b);
iVar1 = iVar1 - iVar2;

We could use this “variable renaming” technique to modify the code in this fashion, adding v1, v2,vN for each newly-assigned variable version.

iVar1v1 = UxaroSivikIpa.AsiresUvekaPu(-0x15e);
iVar2v1 = OweriSuvokiPi.IdirasivEkupe(-0x8b);
iVar1v2 = iVar1v1 - iVar2v1;

The benefit of doing this is it gives us the ability to define variable dependency relationships that can help us map out our understanding of the code better. The goal being that, if we can alter the code such that each variable is written to exactly once, then we can associate variables with the values (the data) that they manage within the program.

For the larger block of code, the rewrite may look more like this:

pdVar6v1 = EkahasEvikoPe.AlohusUvakipE;
iVar1v1 = UxaroSivikIpa.AsiresUvekaPu(-0x15e);
iVar2v1 = OweriSuvokiPi.IdirasivEkupe(-0x8b);
iVar1v2 = iVar1v1 - iVar2v1;
pdVar5v1 = AhuhaSovokIpu.OrohoSivikOpo;
iVar2v2 = AkerusOvikipA.OliraSuvekuPe(-0xd8);
iVar3v1 = OlovesAvekopO.EhuveSivukUpo(-0xb7);
objectRefv1 = pdVar5v1[iVar2v2 ^ iVar3v1].checkcast(URLConnection);
pIVar4v1 = objectRefv1.getInputStream();
pdVar6v1[iVar1] = pIVar4v1;

pdVar6v2 = ApolisovaKupu.OjalaSivikuPu;

With the following dependency relationships:

Such an exercise can help diagram the relationships in such a way as to help isolate the input and output variables of a section from the intermediate variables.

Variable Dependency Graph

Recall that objectRefv1 is where the URLConnection object was stored, and what the code used to call .getInputStream(). Using this dependency graph, I can isolate that I would limit my exploration of the code to the inputs to the 3 local variables iVar2v2, iVar3v1, and pdVar5v1, if I want to learn more about how this code gets the data that is loaded in as a URLConnection instance.

Helping Using Optimization Tools

I mentioned earlier that there’s a utility called ProGuard that can be used to try to obfuscate your code. The tool contains a large number of features, one of which is a code optimizer that can do a number of things, including collapsing disparate code into single classes. If we use the tool, disable its obfuscation features, and enable optimization features, we can enable a utility like this to do a lot of heavy lifting for us. It will take the jar as input, and you can have it produce another jar as output. One caveat is that you will need to have the openjdk-8-jdk package installed in the VM, as it needs to use some of the library classes from that Java version.

apt install -y openjdk-8-jdk

I downloaded v 6.3.0 beta 1 from the following URL:

Within the bundle is a bin/proguardgui.sh which, when executed, brings up the GUI. The GUI has a “Wizard”-like interface, with tabs that are selections for the various features on the left, and options and buttons that allow you to tweak these on the right.

First I use the “Add input…” picker to choose jrat.jar and then use the “Add output…” picker to set whta my new output JAR is going to be. Additionally, I had to Remove the existing reference to rt.jar in the lower pane, and then “Add…” a reference to the OpenJDK 8 one at /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar in order for the tool to find all the class libraries.

ProGuard Input/Output

I skipped over the Shrinking pane, content with using the defaults there.

In the Obfuscation pane, I uncheck the top Obfuscate check box, deactivating the entire module. I am wanting to simplify the application, and this will only make things worse.

Under the Optimization pane, I left the defaults selected, but additionally checked the following boxes:

ProGuard Optimization

Once done, go to the “Process” pane and click the Process! button, and after a bit of waiting on the work to complete, it will report that it has written the new JAR.

Using 7z to list the contents of the jrat_out.jar, we can see that there are only 13 classes defined now within it:

2020-04-14 16:18:14 .....          233          186  META-INF/MANIFEST.MF
2020-04-14 16:18:14 .....          280          222  UrulosuvoKupi/IvolisuvAkipu/OdulisOvekuPi.class
2020-04-14 16:18:14 .....          280          220  UrulosuvoKupi/IvolisuvAkipu/AxuhaSovakopE.class
2020-04-14 16:18:14 .....          280          219  UrulosuvoKupi/IvolisuvAkipu/InolosovuKupi.class
2020-04-14 16:18:14 .....          280          219  UrulosuvoKupi/IvolisuvAkipu/ApolisovaKupu.class
2020-04-14 16:18:14 .....          280          222  UrulosuvoKupi/IvolisuvAkipu/AbolisUvikePo.class
2020-04-14 16:18:14 .....          280          219  UrulosuvoKupi/IvolisuvAkipu/OquleSuvokEpi.class
2020-04-14 16:18:14 .....          280          220  UrulosuvoKupi/IvolisuvAkipu/AhuhaSovokIpu.class
2020-04-14 16:18:14 .....         9259         4916  UrulosuvoKupi/IvolisuvAkipu/InihuSevokEpo.class
2020-04-14 16:18:14 .....        27973        14664  UrulosuvoKupi/IvolisuvAkipu/AboheSuvekIpu.class
2020-04-14 16:18:14 .....          280          222  UrulosuvoKupi/IvolisuvAkipu/EvohusaVakipU.class
2020-04-14 16:18:14 .....          280          221  UrulosuvoKupi/IvolisuvAkipu/EkahasEvikoPe.class
2020-04-14 16:18:14 .....          280          222  UrulosuvoKupi/IvolisuvAkipu/ItoleSevekEpa.class
2020-04-14 16:18:14 .....          280          219  UrulosuvoKupi/IvolisuvAkipu/OgeloSavukApe.class
...

Looking at the file sizes, it is also apparent that the majority of the code has been consolidated into 2 classes now:

2020-04-14 16:18:14 .....         9259         4916  UrulosuvoKupi/IvolisuvAkipu/InihuSevokEpo.class
2020-04-14 16:18:14 .....        27973        14664  UrulosuvoKupi/IvolisuvAkipu/AboheSuvekIpu.class

Looking at each one of the classes that are size 280, after decompiling with jad reveals they all match some variation on the following structure:

public final class AbolisUvikePo
{
    static Object IweliSuvakapE[] = new Object[46];
}

So, rather than having a bunch of small classes implementing arbitrary arithmetic that may or may not be discarded, we now have just a handful of 11 classes that are instantiating Object arrays in memory, for use by the rest of the code.

If you next look at the InihuSevokEpo.jad you will see the main function is still there, but a lot of the arithmetic is brought into the function, and the code is now a long sequence of assignment statements from one array to another, with obfuscated indices. Additionally, a good number of these are being assigned from newly-initialized arrays, so are now pretty easy to spot, since the amount of classes that really just contain global array variables has been significantly minimized.

As well, you’ll see some lines that look like this:

AhuhaSovokIpu.OrohoSivikOpo[AboheSuvekIpu.UqorisAvikaPo(-87) ^ AboheSuvekIpu.ApurisEvikopa(-115)] = new String(new char[] {
   ... ...
            (char)(AboheSuvekIpu.IkoveSavikUpe(-176) ^ AboheSuvekIpu.IdirasivEkupe(-125))
        });

This is constructing String objects at run-time that will be used later on by the program.

Additionally, the AboheSuvekIpu.jad and AboheSuvekIpu.class files now contain the source code and the binary code, respectively, for simplified versions of many of the critical functions that were spread around multiple classes. This can enable you to navigate them within a single Ghidra window now, which wasn’t possible before.

Conclusion

This is intended to cover some obfuscation approaches that are common in the wild, and give you a good understanding for how obfuscation can be scaled to a whole program by repeating a handful of simple patterns. We didn’t completely deconstruct the malware sample, which itself could be a dedicated weeks-long exercise, much of which will be necessarily tedious, to get the answers you need.

At this point, however, it might also be helpful to load the Java code into a VirtualBox VM so that you could step through these lines with the help of a Java Debugger.

home

tags: malware java ghidra lecture