( 。 •̀ ᴗ •́ 。)

Introduction

SmokeLoader is a popular bot that has existed since 2011. It is mostly used to deliver other malwares. It keeps evolving and changing, with new features being introduced all the time.

This article was written by me while I was learning about a few popular malware techniques and how the smokeloader leverages them efficiently to avoid AV detection and make binary reversing harder.

The file we’ll be looking at is shellcode developed and FASM compiled, making analysis more challenging. This is used by malware to minimize file size in KBs with no imports and evade AV detections.

image image image image

A few abnormalities that stand out at first glance are high entropy and rwx section permissions.

SandBox Analysis:

On checking the file in Any Run, we can conclude that it injects a process into explorer.exe, adds persistence, and then connects to C2. But we don’t have conclusions on what it does next, as no more processes or network activities are created.

Process tree

image

Persistence: copies to appdata

image

Persistence T1547.001

image

C2 connection

image

Code Reversing

Part 1: Anti-debug check

Now we will reverse the binary and find more on its code structure. The initial phase of Smokeloader contains a large amount of junk jump calls that do nothing to advance the code and hence slow down the static analysis work on this sample.

image

On traversing the jumps, we get few anti-debug checks. To determine whether the application is in a debugging environment, the PEB is referenced and the BeingDebugged and NtGlobalFlag flags are read. We can bypass it by patching the return value to 0 for both checks. One thing to make a note here is that PEB is moved in the EBX for using it in later stages of the code, and all anti-debug checks referred to PEB from EAX.

image

Part 2: PEB traversal

Next part of the code loads DLL by accessing PEB to resolve APIs dynamically at run time. The below code will set EDX to the image base address of ntdll.dll by traversing thru the PEB data structure.

image

CodeComment
mov esi,dword ptr ds:[ebx+C]ESI = PEB->PEB_LDR_DATA
mov esi,dword ptr ds:[esi+1C]ESI = PEB->Ldr.InInitializationOrderModuleList.Flink
mov edx,dword ptr ds:[esi+8]EDX = image base of ntdll (LDR_MODULE’s BaseAddress)

To see in little detail,

  1. the code gets to the PEB_LDR_DATA over the offset 0xC.
  2. Over the offset 0x1C of PEB_LDR_DATA, it gets to the pointer of InInitializationOrderModuleList.
  3. Since ntdll.dll is the first module loaded, it’s the first LDR_MODULE entry in InInitializationOrderModuleList.

So with ESI pointing to PEB->Ldr.InInitializationOrderModuleList.Flink,

  • [ESI+0] points to the list entry’s Flink,
  • [ESI+4] points to the list entry’s Blink, and
  • [ESI+8] is the BaseAddress value of the first LDR_MODULE entry (ntdll.dll’s LDR_MODULE).

The below diagram shows the overall data structure.

image

The next set of code MOV EDI, ESI will position the specific address in the smoke payload and LODSD the hex by making EAX to store specific API HASH 0976055C.

image

Part 3: Hashing and Resolving API

Later there is a call to API resolution function which will be used to resolve the needed API. The code can be divided into three segments,

  • The first segment - Locates the export table of an DLL.

  • The Middle segment – Does hashing and compare.

  • The Final segment - Gets the absolute address of resolved API and moves to a register for later usage.

image

We found the ntdll.dll has been retrieved and now we need to parse dll image and find the export table. All three code blocks are explained below.

CodeComment
mov edi,dword ptr ds:[ebx+3C]EDI = DOS –> e_lfanew
mov edi,dword ptr ds:[edi+ebx+78]EDI = RVA of export table [edi+ebx = PE Header]
add edi,ebxEDI = VA of Export table
push edi
mov ecx,dword ptr ds:[edi+18]ECX = Number of Names
mov edx,dword ptr ds:[edi+20]EDX = RVA of Export names table
add edx,ebxEDX = VA of Export Names table

smoke.40139B

dec ecx   //Name Counter 
push ecx
mov esi,[edx+ecx*4] //	[edx+ecx*4] = Array to store entries 4bytes long. ESI = RVA of (n)th entry.
add esi,ebx	 //ESI= VA of (n)th Name entry.
mov eax,esi

smoke.4013A6

//Hashing algorithm
xor ecx,ecx 
xor ch,[eax]
rol ecx,8
xor cl,ch
inc eax //Incremented the counter
cmp [eax],0 //	Check if end of name function byte is 0
jne smoke.4013A6  // Jump to hashing next byte if [eax] ne 0
cmp ecx,ebp // compare with HASH `0976055C`
pop ecx
jne smoke.40139B //	If hash didn't match, jump back to loop, and start for next name function
pop edi
CodeComment
mov eax,[edi+24]EAX= RVA of function ordinal table
add eax,ebxEAX= VA of function ordinal table
movzx ecx,[eax+ecx*2]ECX= Get resolved API ordinal
mov eax,[edi+1C]EAX= RVA of AddressOfFunctions or the Export Table
add eax,ebxEAX= VA of the Exported Table
mov eax,[eax+ecx*4]EAX= RVA of resolved API
add eax,ebxEAX= VA of resolved API
mov [esp+1C],eax
popad
ret

As we have statically determined the working process of the code, we can deduce that the resolved API will be stored in EAX, and setting the BP on its return can reveal what API was resolved while dynamically debugged.

image

First, zwAllocateVirtualMemory is resolved from ntdll.dll. This API will be used multiple times later in the code to dump data.

The next call takes us for couple of memory allocations and move some encrypted data from the payload.

image image

Part 4: Decrypting C2

Later the following call takes us to a piece of shellcode that was already present in raw file memory. The goal of this shellcode is to decrypt the encrypted C2 domain, which is highlighted in red in the image below. The decryption procedure is straightforward and uses the XOR key FF.

image image

It’s pretty interesting that VT only has one hit for the C2 address, and the web page is now unavailable.

image

End of Part 1

This is not the end of the application, we have only seen half of its ability, and the next section includes numerous hashing, code injection, anti-VM and analysis checks, bot ID creation, and HTTP post action on C2.

Junk code and API Hashing, as well as exploiting the LDR structure to load modules, are tactics I notice more frequently in my study.

This research has taught us how to identify API hashes and how to efficiently analyse shellcode. The overall summary shown as flowgraph: Draw.io graph


References

  1. PEB structure (winternl.h): PEB structure from the official MSDN documentation. Few fields are shown as reserved, and use undocmented ntinternals site for those info.

  2. PEB_LDR_DATA structure (winternl.h) PEB_LDR_DATA structure from the official MSDN documentation.

  3. The undocumented information for the PEB, PEB_LDR_DATA and LDR_MODULE. https://undocumented.ntinternals.net/

  4. Another nice article of PEB, PEB_LDR_DATA and LDR_MODULE. https://www.geoffchappell.com/studies/windows/km/ntoskrnl/inc/api/pebteb/peb/index.htm


Subscribe to receive all of my latest articles in your inbox!

* indicates required