I wanted to improve my reverse engineering skills, and I decided that doing some malware analysis was the best way to do this. Malware can be an excellent target for reverse engineering not only because there are far less ethical considerations than if you were reverse engineering a legitimate application, but also because most malware employs obfuscation and anti-debugger protections, which most legitimate applications don't bother with; these force you to develop a much more advanced understanding of reverse engineering.
This article shows how I approached the reverse engineering of a piece of malware in the hope that it can be useful to others who are also relatively new to reverse engineering.
I searched for a simple target on KernelMode.info (as I'm a beginner to malware analysis) and came across Win32 VertexNet. Apparently it was written in C++ and has very little obfuscation, so it should be a great target for me to start with.
Before I go on, I did all of this analysis in a Windows XP virtual machine; you should never try to reverse engineer any malware on a computer that you care about!
Loading the executable into IDA Pro gives a small section of code that does some kind of loop. IDA couldn't find any functions, and it looked like it had been programmed directly in x86. It certainly wasn't the payload!
I could have debugged the executable to unpack its payload manually; but I discovered two sections called UPX0
and UPX1
which, after a quick search, lead me to the application which it had been packed with, UPX.
After downloading UPX, I was able to unpack the executable with the following command line:
upx VNet.exe -d -o unpacked.exe
Now that we have the unpacked executable; there are 144 functions, all of which can be succesfully decompiled!
Before attaching a debugger, I wanted to browse through a few functions to get a vague understanding of the program flow.
Let's start analysing WinMain
; there is no obfuscation on this function, and you can generate the following pseudo code by pressing F5
:
int __stdcall WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd)
{
struct tagMSG Msg; // [sp+4h] [bp-1Ch]@1
sub_406980();
while ( GetMessageA(&Msg, NULL, 0, 0) )
{
TranslateMessage(&Msg);
DispatchMessageA(&Msg);
}
return 0;
}
What a lovely introduction to malware RE; we can clearly tell that sub_406980();
is the initialisation function of the malware; so select it and hit N
to give it a suitable name (init
).
Double click on the init
function to view its pseudo code. You can press Esc
to go back to the last function you viewed.
From here we can see some more function calls which we will come back to later, but the most apparent are the two calls to CreateThread:
CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)sub_404400, NULL, 0, &ThreadId);
return CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)sub_406300, NULL, 0, &ThreadId);
I named the two threads' start routines thread1
and thread2
.
I saw that sub_4022B0
was one of the most frequently called functions, so I took a closer look at what it does.
Initially, this function looked quite daunting:
int __usercall sub_4022B0<eax>(unsigned int a1<eax>, int a2<edx>, int a3<ecx>)
{
unsigned int v3; // esi@1
signed int v4; // eax@5
v3 = a1;
if ( a1 < 4 )
{
LABEL_4:
if ( !v3 )
return 0;
}
else
{
while ( *(_DWORD *)a2 == *(_DWORD *)a3 )
{
v3 -= 4;
a3 += 4;
a2 += 4;
if ( v3 < 4 )
goto LABEL_4;
}
}
v4 = *(_BYTE *)a2 - *(_BYTE *)a3;
if ( *(_BYTE *)a2 != *(_BYTE *)a3 )
return (v4 >> 31) | 1;
if ( v3 <= 1 )
return 0;
v4 = *(_BYTE *)(a2 + 1) - *(_BYTE *)(a3 + 1);
if ( *(_BYTE *)(a2 + 1) != *(_BYTE *)(a3 + 1) )
return (v4 >> 31) | 1;
if ( v3 <= 2 )
return 0;
v4 = *(_BYTE *)(a2 + 2) - *(_BYTE *)(a3 + 2);
if ( *(_BYTE *)(a2 + 2) != *(_BYTE *)(a3 + 2) )
return (v4 >> 31) | 1;
if ( v3 > 3 )
{
v4 = *(_BYTE *)(a2 + 3) - *(_BYTE *)(a3 + 3);
return (v4 >> 31) | 1;
}
return 0;
}
It appears to use a2
and a3
as pointers to integers, however, it later goes on to use these as pointers to characters.
One of the best ways I've found to help understand what a function does is by adding and removing whitespace to make it match your own style. Whilst doing this it became clear what the function does: it performs a string comparison. The reason that the pointers are dereferenced to integers instead of characters at the start is just an optimisation; it is faster to compare blocks of 4 bytes at a time than to compare all bytes individually.
int compareStrings(unsigned int length, char *string1, char *string2) {
unsigned int i = length;
if(length < 4) {
LABEL_4:
if(i == 0) return 0;
}
else {
while(*(int *)string1 == *(int *)string2) {
i -= 4;
string2 += 4;
string1 += 4;
if(i < 4) goto LABEL_4;
}
}
signed int difference;
difference = (unsigned __int8)string1[0] - (unsigned __int8)string2[0];
if(string1[0] != string2[0]) return (difference >> 31) | 1;
if(i <= 1) return 0;
difference = (unsigned __int8)string1[1] - (unsigned __int8)string2[1];
if(string1[1] != string2[1]) return (difference >> 31) | 1;
if(i <= 2) return 0;
difference = (unsigned __int8)string1[2] - (unsigned __int8)string2[2];
if(string1[2] != string2[2]) return (difference >> 31) | 1;
if(i > 3) {
difference = (unsigned __int8)string1[3] - (unsigned __int8)string2[3];
return (difference >> 31) | 1;
}
return 0;
}
I'm not entirely sure of this function's purpose considering it does the same job as strncmp
(with a slightly different parameter order). A futile attempt at obfuscation, or an incompetent programmer, you decide!
Regardless, now that we know the correct function declaration, we can identify the types of all variables which are passed to it, and get a better understanding of any code that uses it in general.
Now that we've got a reasonable understanding of the program's flow, let's try using the strings view (Shift + F12
) to identify some more functions of interest.
There are definitely some interesting strings including: PHP URLs, HTTP headers, and even some debug text. You can get "xrefs" to exactly where a string is referenced in the code; this can be used to identify a few more functions.
However, this is definitely not all of the data used by the malware, some things are missing.
Later on in the code I found several calls to sub_409360
. It easy to identify this function's purpose as getResource
since it contains some very standard Windows resource opening code (FindResource, LoadResource, etc). It also contains useful debug strings including "[ERROR] while finding resource"
and "[ERROR] while loading ressource"
which confirm this.
After a quick search I found an application which can extract resources from a Windows executable, ResourceHacker. These are the resources that the malware contains:
All resources are very small and could have easily been placed in the .text
section along with the rest of the code, so why are they stored as resources? Another futile attempt at obfuscation?
After taking a reasonable look at a few other functions, I attached a debugger and started to run through the code. By this point, the gist of what it did was fairly obvious.
Upon execution VertexNet copies itself to C:\dropped.exe
.
It then creates a registry entry called vnet
with the value C:\dropped.exe
at the following path:
HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\
This causes the malware to automatically execute at startup.
Two main threads are then created: thread1
, which sends a request to adduser.php
(which contains information about your system), and thread2
, which fetches commands remotely from tasks.php
.
The malware can be removed manually by following these steps:
vnlogs.log
in the same directory if there is oneC:\dropped.exe
and vnlogs.log
in the same directory if there is onevnet
at HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\
But I would recommend performing a full system scan instead of doing it manually. The malware could have been used to remotely install additional malware on your system (it is uncommon to be infected with just a single piece of malware). The malware could also be stored in a system backup if you made one whilst infected, which an anti-virus program can remove.
In thread2
commands are fetched remotely from tasks.php
and are then handled by sub_404D50
. It is blindingly obvious what all of these commands do since they are sent as strings. After further analysis, I was able to verify their authenticity (they have not been mislabeled as an attempt at obfuscation); so here is a list of all handled commands:
When the setkeylogger::
command is received, a new thread is created (sub_404300
):
CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)keyloggerThread, NULL, 0, &ThreadId);
Keylogs are stored in the file vnlogs.log
which can be read through the getklogs
command.
The VertexNet malware employs some very basic protections such as packing itself with UPX, implementing its own version of standard C library functions, and storing sensitive information as resources; however, these are all extremely limited approaches which can be bypassed using freely available tools.
The author of the malware left debug strings in the final executable which can be used to easily identify functions via xrefs.
There are no real obfuscations or anti-debugger mechanisms. The only thing that makes debugging slightly more difficult is the usage of multithreading.
All wireless traffic is sent in plaintext, without any encryption; commands are sent as strings rather than an enumerator which would have been at least a bit less obvious.