Wednesday, December 14, 2011

Malware Analysis Tutorial 7: Exploring Kernel Data Structure

Learning Goals:

  1. Explore kernel data structures effectively, e.g., using WinDbg.
  2. Understand the important kernel structures of Windows to maintain live information about processes and threads.
  3. Know the difference between hard and soft breakpoints and can use them effectively during debugging.
  4. Practice code reverse engineering to understand assembly code.
Applicable to:
  1. Computer Architecture
  2. Operating Systems Security
  3. Assembly Language
  4. Operating Systems
1. Introduction

     This tutorial shows you how to explore kernel data structures of windows using WinDbg. It is very beneficial to us for understanding the infection techniques employed by Max++. We will look at some interesting data structures such as TIB (Thread Information Block), PEB (Process Information Block), and the loaded modules/dlls of a process. We will examine what Max++ did to some important kernel DLL files.

1.1 Lab Setup
If you have not installed WinDbg on your host machine (note: not the VM instance), please follow Tutorial 1 first to install the VirtualBox platform (a small LAN consisting of one Linux gateway and one Windows instance infected with Max++). Then please follow Tutorial 4 to install WinDbg on the host machine (note: not the VM instances) and configure the piped COM port for the VM instance to be debugged.  The following is the steps of launching the VM instance and WinDbg:

  1. Launch the Windows guest OS in VirtualBox first. Boot it in the "Debugged" mode. (Follow Tutorial 4 for how to include the "Debugged" boot option).
  2. On your Host machine, start a command window and change directory to "c:\Program Files\Debugging Tools for Windows(x86)" and type the following.
    windbg -b -k com:pipe,port=\\.\pipe\com_11
  3.  You should see in the WinDbg window the following "Breakpoint on INT 3". It means that it currently stops at a software breakpoint (INT 3). Type "g" (means "go") to let it continue. If necessary, "g" it a second time.
  4. Occasionally you might find that your windows guest OS is frozen. Simply in the WinDbg window (at the host) type "g".
  5. Now start the Immunity Debugger in the windows guest OS, and load the Max++ (see Tutorial 1 for where to get Max++ binary).
  6. In the Code Pane of IMM, right click to go to "0x401018" and then set a HARD BREAKPOINT (right click and select "Breakpoint->Hardware, on Execution") at it. This is where we stopped at in Tutorial 6. As you see right now, the instruction at 0x401018 is "DEC DWORD [EAX+20]". Later, when we stop at this address, the instruction will be overwritten, due to the self-extracting feature of Max++, see details in Tutorial 6.  Then Press F9 (continue) to run to 0x00401018.

 1.1.1 Why Hardware Breakpoint?
  Notice that, you have to use hardware breakpoint in Step 6. Why not software breakpoint? Think about how software breakpoint is implemented. When you set a software breakpoint in a debugger, the debugger actually modifies the first byte of the instruction at that location to "INT 3". When the execution gets to the "INT 3", the windows kernel calls debugger to handle the interrupt (which then stops and highlights it in the debugger window, and when you resume the execution or cancel the breakpoint, the debugger writes the original opcode back).

 Recall that the malware does self-extraction (see Tutorial 6). It overwrites the "INT 3" and you will never be able to stop at the desired location 0x401018! That's the reason we use hardware breakpoint. When a hardware breakpoint is set, the address is recorded in one of the four HW breakpoint registers provided by an Intel CPU. The CPU examines the registers everytime one instruction is executed and stops at it. The only drawback is that you can set up to 4 hardware breakpoints at any time.

1.2 Analysis Objective
We will analyze around 20 instructions, from 0x00401018 to 0x0040105B. The assembly code is shown in Figure 1.

Figure 1. Code Segment to Analyze (0x401018 to 0x40105B)

2. FS Register, TIB, and PEB

As shown in Figure 1, instruction 0x00401018 (MOV EAX, DWORD FS:[18]) does some important trick . It is reading the memory word located at FS:[18] into EAX. Here FS, like SS and DS, is one of the segment registers provided in Intel x86 register file.  The FS:[18]is an address specified using the displacement addressing mode. The address is calculated as [value stored in FS] + 0x18. 

Whenever you see some code accessing the FS register, you should pay special attention! FS points to the most important Windows kernel data structure related to the current process/thread. Check out reference [1] for details and you will see that FS:[18] stores the entry address of TIB (Thread Information Block) - also called TEB.

Then the instruction at 0x40101E (MOV EAX, [EAX+30]) takes the word located at EAX+0X30. What does this mean? Since now EAX has the entry address of the TIB, it is now taking some data field which is 0x30 bytes away from the beginning of the TIB record.

We need to figure out the internal data structure of TIB. There are two ways: (1) MSDN document, and (2) take advantage of the WinDbg kernel debugger. For the most well known data structures like TIB, people have already done the address calculation for you. For example, by reading [1], you would know that offset 0x30 stores the entry address of PEB (process information block). But for most cases, for a kernel data structure, you'll have to manually calculate the offset (i.e., figure out the size of all the previous attributes in the structure and sum them up).

The most convenient way would be using WinDbg. Now come back to our WinDbg window in the host machine and type the following: (Ctrl+Break). This is to interrupt the running of the guest windows and get the control back to WinDbg. Then type the following:

dt nt!_TEB

This is to say, display the data type of "_TEB" located in the nt module. If you need information of the "nt" module, you can type


This displays the loaded modules and you can see that  "nt" is the module name for "ntoskrnl.dll".

WinDbg is actually very powerful, by appending "-r n" to the dt command, you can display the data types recursively, i.e., when a data field itself is a complex data type, you can display its contents. For example, dt nt!_TEB -r 2 display the contents recursively and the extraction level is 2.

From the WinDbg dt dump, you can immediately infer that 0x30 of TEB is the entry address of PEB.

3. Loaded Module List
We now proceed to the next few instructions.  Using the technique introduced in Section 2, we can infer that instruction at 0x401021 (MOV ECX, [EAX+C]) loads into the ECX the pointer to LDR (loaded module list). The information of PEB structure can be found on MSDN [2], however, you will find that WinDbg actually can provide more detailed information, including many undocumented attributes.

Now we need to look at the structure of LDR (_LIST_ENTRY). Executing dt nt!_PEB_LDR_DATA in WinDbg, we have the following dump:

kd> dt _PEB_LDR_DATA
   +0x000 Length           : Uint4B
   +0x004 Initialized      : UChar
   +0x008 SsHandle         : Ptr32 Void
   +0x00c InLoadOrderModuleList : _LIST_ENTRY
   +0x014 InMemoryOrderModuleList : _LIST_ENTRY
   +0x01c InInitializationOrderModuleList : _LIST_ENTRY
   +0x024 EntryInProgress  : Ptr32 Void
kd> dt _LIST_ENTRY
   +0x000 Flink            : Ptr32 _LIST_ENTRY
   +0x004 Blink            : Ptr32 _LIST_ENTRY

 Notice that ECX now contains the address of the offset 0xC of the _PEB_LDR_DATA, starting at this address is a _LIST_ENTRY structure which contains two computer words (each word is 4 bytes long). The first four bytes is the Flink, which points to the next _LIST_ENTRY, and the next four bytes is the Blink, which points to the previous _LIST_ENTRY. So this is exactly a doubly linked list structure! More details of the PEB_LDR_DATA structure can be found in MSDN document [4]. However, again, notice that the documentation in [4] is not complete and is NOT accurate! The most authorative information should be from WinDbg.

Now let us proceed to instruction 00401029 (MOV EAX, DWORD [ECX]). This is essentially to move the contents of the FLink to EAX. Now according to [4], the EAX now has the entry address of the_LDR_DATA_TABLE_ENTRY for the next module. However, it is WRONG! the correct information is that EAX now contains the address of the offset 0x8 of _LDR_DATA_TABLE_ENTRY (i.e., the address of the data field "InMemoryOrderLinks")

Now comes the interesting part. Look at instruction 0x0040102D (MOV EDX, DWORD [EAX+20]), what does this mean? Let's examine the data structure LDR_DATA_TABLE_ENTRY first.

   +0x000 InLoadOrderLinks : _LIST_ENTRY
      +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
      +0x004 Blink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
   +0x008 InMemoryOrderLinks : _LIST_ENTRY
      +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
      +0x004 Blink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
   +0x010 InInitializationOrderLinks : _LIST_ENTRY
      +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
      +0x004 Blink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
   +0x018 DllBase          : Ptr32 Void
   +0x01c EntryPoint       : Ptr32 Void
   +0x020 SizeOfImage      : Uint4B
   +0x024 FullDllName      : _UNICODE_STRING
      +0x000 Length           : Uint2B
      +0x002 MaximumLength    : Uint2B
      +0x004 Buffer           : Ptr32 Uint2B
   +0x02c BaseDllName      : _UNICODE_STRING
      +0x000 Length           : Uint2B
      +0x002 MaximumLength    : Uint2B
      +0x004 Buffer           : Ptr32 Uint2B
   +0x034 Flags            : Uint4B
   +0x038 LoadCount        : Uint2B
   +0x03a TlsIndex         : Uint2B
   +0x03c HashLinks        : _LIST_ENTRY
      +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
      +0x004 Blink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
   +0x03c SectionPointer   : Ptr32 Void
   +0x040 CheckSum         : Uint4B
   +0x044 TimeDateStamp    : Uint4B
   +0x044 LoadedImports    : Ptr32 Void
   +0x048 EntryPointActivationContext : Ptr32 Void
   +0x04c PatchInformation : Ptr32 Void

We know that the instruction MOV EDX, DWORD [EAX+20] is to load the contents of the word located at EAX+0x20. But where is EAX pointing at? It's pointing at offset 0x8 of the _LDR_DATA_TABLE_ENTRY. Thus EAX+0x20 is pointing at offset 0x28 (see the emphasized area of the data structure dump above), which is the "Buffer" field of the FullDllName.

In Windows, _UNICODE_STRING is Microsoft's effort to cope with the multi-cultural/language needs for localization of windows in different parts of the world. It consists of two parts: (1) length of the string, and (2) the real raw data of the string. So the "Buffer" field encodes the full DLL name in unicode!

What it essentially means is that code at 0x0040102Dis starting to process/read the DLL name! To verify our conjecture, look at the register EDX in the Immunity Debugger (Figure 3).You can see that the first module name we are looking at is "ntdll.dll".

Figure 3: EDX points to DLL Name

4. Challenges of the Day
Now let us try to get the whole picture of the code from 0x00401018 to 0x00401054. You might notice that we have actually a nested 2-layer loop here.

The outer loop is from 0x40102E to 0x401054, this is essentially a do-while loop. The inner loop is from 0x401036 to 0x401046. Our challenges today are:
(1) What does the inner loop from 0x401036 to 0x401046 do?
(2) What does the out-loop do?

A hint here: the code we discussed today tries to search for a module and do some bad things to that module (these malicious operations will start at 0x40105C). Use your immunity debugger to find it out. We will show you these malicious operations in the next tutorial.

1. Wiki, "Windows Thread Information Block", Available at
2. Microsoft, "PEB Structure", Available at
3.Microsoft, "PEB_LDR_DATA structure", Available at


  1. Hello, thanks for the nice tutorial. Still one thing I don't get:
    If after execution of
    401021 mov ecx, [eax+c]
    It contains the address of PebLoaderData, how can it later (see that there is add ecx, 1c) be pointing at InLoadOrder list? Shouldn't it be InInitializationOrder list?

  2. Check the "kd> dt _PEB_LDR_DATA" 2 paragraphs below - the dump generated by WinDbg, you'll see why.

  3. Hi Dr. Fu. Thank you for this great tutorial !
    On figure 1, line 00401026, comment should say : "now ECX has InInitializationOrderModuleList" and NOT InLoadOrderModuleList

  4. i am can't understand where _LDR_DATA_TABLE_ENTRY is coming from? i can see it using WinDbg but i can't go to it logically using the structures of _LIST_ENTRY -> Flink ?

  5. _LDR_DATA_TABLE_ENTRY is from internal MS documentation. Search it online and you can find its data structure definition.

  6. For kernel structures good to use this tool

  7. PEB (process information block) = pEb = Process Environment Block.

  8. What you actually got was the BaseDllName since you got the address of the InInitializationOrderModuleList you should be at InInitializationOrderLinks [+0x10] then the basename should be in [+0x10 + 0x20]