Sunday, February 19, 2012

Malware Analysis Tutorial 17: Infecting Driver Files (Part I: Randomly Select a System Module)

Learning Goals:
  1. Understand the frequently used tricks by malware to infect driver files
  2. Understand the role of registry in Windows operating systems
  3. Practice analyzing function cals
  4. Practice reverse engineering sophisticated data structures in memory
Applicable to:
  1. Operating Systems
  2. Assembly Language
  3. Operating System Security
1. Introduction
This tutorial shows some frequently used tricks by malware to infect system drivers. We show how Max++ examines the list of active system modules (drivers) in the system and pick up randomly the candidates for infection. In this tutorial, we will practice the use of WinDbg for examining sophisticated system data structures and learn some important system calls such zwQuerySystemInformation.

We will analyze the code from 0x3C2408  in this tutorial. This is to continue the analysis in Tutorial 16, where we showed that Max++ injects a thread into another running process and the thread will remove the binary executable Max++Loader2010.exe from the disk once the Max++ process terminates.

2. Lab Configuration
(0) Start WinXP image in DEBUGGED mode. Now in your host system, start a windows command window and CD to "c:\Program Files\Debugging Tools for Windows (x86)" (where WinDBG is installed). Type "windbg -b -k com:pipe,port=\\.\pipe\com_12" (check the com port number in your VBox instance set up). When WinDbg initiates, types "g" (go) twice to let it continue.

(1) Now launch IMM in the WinXP instance, clear all breakpoints and hardware breakpoints in IMM (see View->Breakpoints and View->Hardware Breakpoints).

(2) Go to 0x4012DC and set a hardware breakpoint there. (why not software bp? Because that region will be self-extracted and overwritten and the software BP will be lost). Pay special attention that once you go to 0x4012DC, directly right click on the line to set hardware BP (currently it's gibberish code).

(3) PressF9 several times run to 0x4012DC. You will encounter several breakpoints before 0x4012DC. If you pay attention, they are actually caused by the int 2d tricks (explained in Tutorial 3 and 4, and 5). Simply ignore them and continue (using F9) until you hit 0x4012DC.

Figure 1 shows the code that you should be able to see. As you can see, this is right before the call of RtlAddVectoredException, where hardware BP is set to break the LdrLoadDll call (see Tutorial 11 for details).
Figure 1: code at 0x4012DC
(4) Now scroll down about 2 pages and set a SOFTWARE BREAKPOINT at 0x401417. This is right after the call of LdrLoadDll("lz32.dll"), where Max++ finishes the loading of lz32.dll. Then hit SHIFT+F9 several times until you reach 0x401417 (you will hit 0x7C90D500 twice, this is somwhere inside ntdll.zwMapViewSection which is being called by LdrLoadDll).

Figure 2: code at 0x401407

(6) Now we will set a breakpoint at 0x3C2408  .  Goto 0x3C2408 and set a SOFTWARE BREAKPOINT. SHIFT+F9 there. Press SHIFT+F9 to run to 0x3C2408 . (You may see a warning that this is out range of the code segment, simply ignore the warning).

(Figure 3 shows the code that you should be able to see at 0x3C2408 . The first instruction should be TEST EAX, EAX.  The value of the EAX register at this moment should be 0x1, and the control flow will continue to 0x3C2410 (PUSH 10000). We will start our analysis from here.
Figure 3: Code Starting at 0x3C2408

Section 3. Check Infection Status
The first action Max++ takes is to examine if the system has already been affected. It checks the existence of a virtual volume named ""\??\C2CAD972#4079#4fd3#A68D#AD34CC121074". Notice that this is a very interesting name, if you look at the Mcirosoft UNC file name specification [1], it's actually not quite a legal name. Usually UNC names with extended length should start with "\\?\" instead of two question marks inside it [as far as I know]. If the system is never infected before, the call zwOpenFile at 0x3C242D should fail and return an error code. Only when Max++ successfully creates a virtual drive and overwrites the disk operation driver (so that the driver can handle \??\ prefix), the call will report success.

It is interesting to delve more into the call zwOpenFile and look at its parameters. In the past, we have used IMM debugger to directly watch the parameters. When the data type of a parameter is complex, it's more convenient to use WinDbg.  We will in the following show a sample use of WinDbg.

The formal declaration of zwOpenFile can be easily found out by Google. The first three parameters of zwOpenFile are interesting to us: (1) OUT PHANDLE FileHandle, (2) ACCESS_MASK DesiredAccess, and (3) POBJECT_ATTRIBUTES ObjectAttributes, according to [2].

Figure 4. Stack Contents of zwOpenFile Call

Now if we look at the contents of the stack when the call is made (as shown in Figure 4), we might notice that the value for the third parameter (ObjectAttributes) is 0x3D3150. Notice that its data type is POBJECT_ATTRIBUTES where "P" stands for the pointer. Clearly, it means that starting at 0x3D3150, there is a data structure named OBJECT_ATTRIBUTES. It is possible to get the formal declaration of OBJECT_ATTRIBUTES structure and interpret the bits and bytes by yourself. But a simpler way is to use WinDbg to help with it.

For this purpose, let's start the WinDbg in the VBox Windows image. File-> Attack Process -> Max++ (note: not the external one in the host) and then click and select to run Noninvasively. (as shown in Figure 6).

Figure 6. Run WinDebug in Noninvasive Mode
Simply type "dt  _OBJECT_ATTRIBUTES 0x3D3150 -r2" in the WinDbg command window, we have the dump of the contents as shown below. Here "-r2" means to display 2 layers of details level (recursively). Shown in Figure 7 is the dump of the OBJECT_ATTRIBUTES. You can see that the file it tries to access is "\??\C2CAD972#4079#4fd3#A68D#AD34CC121074". The call will return an error code in EAX, which drives the execution flow to 0x3C243B (see Figure 3) and then to 0x3C2461.

Section 4. Search for Modules to Infect (Function 0x3C1C2C)
We now examine the logic of Function 0x3C1C2C (as shown in Figure 7). The first part of the function is to call 0x3D0BC0. This is a function frequently called by the Max++ code. We leave the analysis details to you.

Challenge 1. Analyze the functionality of function 0x003D0BC0 (what are its input? and what are its output? Hint: it adjusts the position of ESP register according to some input and then puts the address of the next immediate instruction into EAX).

Figure 7. First part of Function 0x3C1C2C

The major bulk of the function is the a loop from 0x003C1C5B to 0x003C1D04 (shown in Figure 8). During the first iteration, the code calls zwQueryInformation to get the list of system modules (all running driver processes in the system) and find out the system module whose name is "ndis.sys". Note that NDIS stands for Network Driver Interface Specification, the ndis.sys is a driver file that controls network adapters. You can roughly infer what the loop is doing, but it's beneficial to repeat the analysis process on your own so that you can sharpen your reverse engineering skills. We now expose some of the technical details here:

(1) zwQuerySystemInformation. This is a very important system call provided by the OS. The function prototype is shown in the following, from MSDN.
NTSTATUS WINAPI ZwQuerySystemInformation(
  __in       SYSTEM_INFORMATION_CLASS SystemInformationClass,
  __inout    PVOID SystemInformation,
  __in       ULONG SystemInformationLength,
  __out_opt  PULONG ReturnLength
The first parameter is an integer that represents the type of information to query. There are many types available, e.g., system performance information, time of the day etc. In our case, the system information class is SystemModuleInformation, which provides the information of running system modules. Notice that the size of the SystemInformation (second parameter) can vary, you have to pass the BUFFER LENGTH of your preallocated buffer to zwQuerySystemInformation. If zwQuerySystemInformation needs more space, it will inform you that the space is not enough (using the ReturnLength, the 4th parameter). In that case, the Max++ code will go back to reallocate space.[see the Call of 0x3D0BC0 at 0x3C1C61, see Challenge 1]

Figure 8. Read System Module Information

(2) System Module Information. Let's now delve into the data returned by zwQueryInformation. Using a simple google search, we can find the definition of _SYSTEM_MODULE_INFORMATION and _SYSTEM_MODULE. As shown below [the code is from] .

ULONG               ModulesCount;
SYSTEM_MODULE       Modules[0]; /* FIXME: should be Modules[0] */

typedef struct _SYSTEM_MODULE
PVOID               Reserved1;                      /* 00/00 */
PVOID               Reserved2;                      /* 04/08 */
PVOID               ImageBaseAddress;               /* 08/10 */
ULONG               ImageSize;                      /* 0c/18 */
ULONG               Flags;                          /* 10/1c */
WORD                Id;                             /* 14/20 */
WORD                Rank;                           /* 16/22 */
WORD                Unknown;                        /* 18/24 */
WORD                NameOffset;                     /* 1a/26 */
BYTE                Name[MAXIMUM_FILENAME_LENGTH];  /* 1c/28 */

Note that the second parameter  Modules[0] of _SYSTEM_MODULE_INFORMATION is a real array (i.e., not a simple pointer to the entry address of the array). In another word, the size of _SYSTEM_MODULE_INFORMATION can vary, depending on the value of ModulesCount. Similarly, the last attribute NAME[MAXIMUM-FILENAME_LENGTH] of _SYSTEM_MODULE is a real character array (not a pointer)

Figure 9 shows the first 0x90 bytes of the _SYSTEM_MODULE_INFORMATION structure.

Challenge 2. Given Figure 9, can you infer how many modules are located in the _SYSTEM_MODULE_INFORMATION? (i.e., how many modules are loaded in the system right now?)

Challenge 3. Given Figure 9, can you infer the name of the first module? What is the value of NameOffset in the first _SYSTEM_MODULE?

Figure 9. System Module Information

To help you further understand the logic in Figure 8, we list more challenges below:
Challenge 4. Given Figure 8, observe the instructions from 0x3C1C73 and 0x3C1C78, they push the parameters for zwQueryInformation.  which of the registers (EAX, ESI etc.) contains the buffer to hold system information?

Challenge 5. Given Figure 8, observe the instructions at 0x3C1CDF. The instruction is LEA ESI, DS:[EAX+EDI+1C]. Explain the meaning and purpose of this instruction? Specifically, explain what is contained in EAX, EDI and what's the meaning of offset 0x1c?

Challenge 7. Find a way to get out of the loop and what is the ImageBaseAddress of "ndis.nys"? (you have to first properly set up a conditional breakpoint).

Challenge 8. Explain the  logic between 0x3C1CF3 (NEG EAX) to 0x3C1CFE (JE 0x3C1E1E), and explain why would the JE instruction not jump when the module name is "NDIS.sys".

Figure 10. After NDIS.sys is found

Section 5. Test Registry (Function 0x003C18D4)
When the module of NDIS is found, the control flow continues to 0x003C1D04 (shown in Figure 10). The code first did a name comparison of "ndis.sys" and "win32k.sys", and if not equal, it calls function 0x003C18D4 at 0x003C1D27. We'll provide the analysis of function 0x003C18D4 below.

Fogure 11. First Part of Function 0x3C18D4
Figure 11 shows the first part of Function 0x3C18D4. Most of the functions are very easy to analyze. It first calculates the string length of the ndis.sys file and checks if the file name is not too long. Then it checks if the first letter of the file name is ascii code 2E "." (if not, it continues). Then it searches for the file suffix and verifies if it ends with ".sys". These are fairly routine checks to make sure that the file (ndis.sys) is a normal system driver file. Lastly, it calls lz32.003C250C at 0x003c193e. We now try to analyze function 0x003c250c.

5.1 Function 0x003C250C (Building Object_Attributes)

In general, to analyze a function, we need to figure out three things: (1) what are the input parameters? (2) what are the output parameters? (3) what does the function do?

Analysis of malware functions can be a challenging job, as malware authors will not necessarily have to follow the typical C language calling conventions (i.e., pushing parameters to stack and use standard registers for return).It creates trouble for us to understand the functionality of a function.

Figure 12 shows the body of function 0x003C250C.By studying the instructions we know that the input parameter of this function is the ESI register. The function body calls RtlInitUnicodeString and then is saving some values to the RAM (at ESI+4, ESI+8 etc.). It seems that the function is building some data structure however we have no way to know what exactly the type is. In this case, we have to trace how the return data is used (note that the return of this function is saved in EAX, look at the instruction at 0x3C2535).

Figure 12. Function Body 0x3C250C

Figure 13 shows the caller of 0x3C250C. Notice that at 0x003C1943 (right after the call of function 0x3C250C), the code pushes EAX (the return of function 0x3C250C) into stack, and then it pushed another two words into stack and calls zwOpenKey. Clearly, the return value of 0x003C250C is used as the 3rd parameter of ZwOpenKey, which immediately leads us to the conclusion: function 0x3C250C is building an object of _OBJECT_ATTRIBUTES. Figure 14 shows the details of the object constructed. You can see that the main component is the ObjectName, which contains value "\registry\...\NDIS".

Figure 13. How the return data of Function  0x3C250C is Used

Figure 14. Memory Contents of Object Attributes

5.2 Rest of Function 0x003C18D4 (check existence of registry file)

We now continue to analyze the rest of Function 0x003C18D4 (shown in Figure 15). Interestingly, after calling the ZwOpenKey, the code immediately call ZwClose to close the registry handle immediately. What's its purpose then? If you check the control flow, you would notice that depending on the return value of ZwOpenKey, the function returns 1 when the registry key is successfully opened, and returns 0 when the ZwOpenKey fails.
Figure 15. Rest of Function 0x3C18D4

Figure 16. Reads NDIS.sys

Section 6. Scanning Proper Driver Files (Starting from 0x3C12DC).
As shown in Figure 16, at 0x3C1D27 the code returns from 0x3C18D4 (check the existence of registry for NDIS). Then it performs zwReadFile twice, first to read 0x40 bytes and then to read 0xF8 bytes. The second read loads the entire PE header (0xF8 bytes) into 0x12D388.

Figure 17. Check Proper Size

Now comes the interesting part, see Figure 17. At 0x3C1DDF, the code checks the EXPORT TABLE SIZE of the PE header and see if it is NOT zero. If it's zero, it will continue the search and check another system module. NDIS.sys fails the check (its export table size is 0).

Then at 0x003C1DF3, it calls zwQueryInformationFile and reads the FILE_STANDARD_INFORMATION of the module (this time, the mup.sys). It then reads the file size of the module and compare it with value 0x4C10. If the file size is below 0x4C10, it continues the search.

If everything is fine so far, at 0x003C1E0E, it resets the ID of the _SYSTEM_MODULE to 1 and it writes the actual file size of the system module file back into the _SYSTEM_MODULE (i.e., reset the ImageSize attribute).

For each system module found (with EXPORT TABLE SIZE>0 and file size>0x4c10), the code increments counter at 0x12D5D4 by 1.

Challenge 9. Explain the logic of the code of Figure 17 in details. For example, for instruction at 0x3C1DDF, how would you know that it is checking EXPORT TABLE SIZE?

Section 7.  Random Pick of Drivers(Starting from 0x3C1E30).
We now examine the next section of the malicious logic (shown in Figure 18). When the code executes to 0x3C1E30 (out of the big loop), at EBP-30 (0x12D5D4), it stores the number of system modules that satisfy the criteria: (1) size greater than 0x4C10; and (2) EXPORT TABLE SIZE > 0. On our system, there are 0x19 modules that satisfy the condition.

The next section (from 0x003C1E30 to 0x3C1E6D) randomly picks up a module to infect. First, it calls GetTickCount to get the current time, and then it calls RtlRandom using the current time as seed. Then it does a DIV operation (at 0x3C1E51) on 0x19 (the number of modules), after which, the remainder is stored in EDX.

Figure 18. Randomly Pick a Driver File

The loop from 0x3C1E59 to 0x3C1E6D is used to get the _SYSTEM_MODULE. Given that the random ID stored in EDX, without the loss of generality, let's assume that it's 5. It tries to get the 5th satisfactory module by visiting every module in the list of _SYSTEM_MODULES retrieved by earlier call of zwQuerySystemInformation. Notice that at 0x003C1E59, it compares [ESI+14] with BX (value 1). Here ESI points to the _SYSTEM_MODULE, and the offset 0x14 stores ID. Recall that earlier the code marks each module satisfying the criteria with ID 1, this immediately explains the purpose of the comparison code. After the code jumps out of the loop, ESI points to the selected _SYSTEM_MODULE. Up to now, the random selection of a system module is completed. We will explain the infection process in the next tutorial.


  1. hello ..

    juz want to ask questions

    how can i get polymorphic malware collection ?

    tq very much Dr Fu