- Use Immunity Debugger to Analyze and Annotate Binary Code.
- Understand the Techniques for Self-Extraction in Code Segment.
- Computer Architecture
- Operating Systems Security
In this tutorial, we discuss several interesting techniques to analyze decoding/self-extraction functions, which are frequently used by malware to avoid static analysis. The basic approach we use here is to execute the malware step by step, and annotating the code.
We will examine the following functions in Max++ (simply set a breakpoint at each of the following addresses):
1.2 General Techniques
We recommend that you try your best to analyze the aforementioned functions first, before proceeding to section 2. In the following please find several useful IMM tricks:
- Annotating code: this is the most frequently used approach during a reverse engineering effort. Simply right click in the IMM code pane and select "Edit Comment", or press the ";" key.
- Labeling code: you could set a label at an address (applicable to both code and data segments). When this address is used in JUMP and memory loading instructions, its label will show up in the disassembly. You can use this to assign mnemonics to functions and variables. To label an address, right click in IMM code pane and select "Label".
- Breakpoints: to set up software breakpoints press F2. To set up hardware breakpoints, right click in code pane, and select Breakpoints->Hardware Breakpoint on Execution. At this moment, set soft breakpoints only.
- Jump in Code Pane: you can easily to any address in the code segment by right clicking in code pane and enter the destination address.
2. Analysis of Code Beginning at 0x00413BC2
As shown in Figure 1, there are four related instructions, POP ESI (located at 0x00413BC1), SUB ESI, 9 (located at 0x00413BC2), and the POP ESP and RETN instructions.
|Figure 1. Code Starting at 0x00413BC2|
As discussed in Tutorial 5, the RETN instruction (at 0x00413BC0) is skipped by the system when returning from INT 2D (at 0x00413BBE). Although it looks like the POP ESI (at 0x413BC1) is skipped, it is actually executed by the system. This results in that ESI now contains value 0x00413BB9 (which is pushed by the instruction CALL 0x00413BB9 at 0x00413BB4). Then the SUB ESI, 9 instruction at 0x00413BC2 updates the value of ESI to 0x00413BB0. Then the next LODS instruction load the memory word located at 0x00413BB0 into EAX (you can verify that the value of EAX is now 0). Then it pops the top element in the stack into EBP, and returns. The purpose of the POP is to simply enforce the execution to return (2 layers) back to 0x413BDD.
Note that if the INT 2D has not caused any byte scission, i.e., the RETN instruction at 0x00413BD7 will lead the execution to 0x413A40 (the IRETD instruction). IRETD is the interrupt return instruction and cannot be run in ring3 mode (thus causing trouble in user level debuggers such as IMM). From this you can see the purpose of the POP EBP instruction at 0x413BC6.
Conclusion: the 4 instructions at 0x00413BC2 is responsible for directing the execution back to 0x00413BDD. This completes the int 2d anti-debugging trick.
3. Analysis of Function 0x00413BDD
|Figure 2: Function 0x00413BDD|
As shown in Figure 2, this function clears registers and calls three functions: 0x413A2B (decoding function), 0x00401000 (another INT 2D trick), and call EBP (where EBP is set up by the function 0x00401000 properly). We will go through the analysis of these functions one by one.
4. Analysis of Function 0x00413A2B.
|Figure 3: Function 0x00413A2B|
Function 0x00413A2B has six instructions and the first five forms a loop (from 0x00413A2B to 0x00413A33), as shown in Figure 3. Consult the Intel instruction manual first, and read about the LODS and STORS instruction before proceeding to the analysis in the following.
Essentially the LODS instruction at 0x00413A2B loads a double word (4 bytes) from the memory word pointed by ESI to EAX, and STOS does the inverse. When the string copy finishes, the LODS (STOS) instruction advances the ESI (EDI) instruction by 4. The next two instructions following the LODS instruction perform a very simple decoding operation, it uses EDX as the decoding key and applies XOR and SUB operations to decode the data.
The loop ends when the EDI register is equal to the value of EBP. If you observe the values of EBP and EDI registers in the register pane, you will find that this decoding function is essentially decoding the region from 0x00413A40 to 0x00413BAC.
Set a breakpoint at 0x00413A35 (or F4 to it), you can complete and step out of the loop. To view the effects of this decoding function, compare Figure 4 and Figure 5. You can see that before decoding, the instruction at 0x00413A40 is an IRET (interrupt return) instruction and after the decoding, it becomes the INT 2D instruction!
|Figure 4: Region 0x00413A40 to 0x00413BAC (before decoding)|
|Figure 5: Region 0x00413A40 to 0x00413BAC (after decoding)|
Now let's right click on 0x00413A2B and select "Label" and we can mark the function as "basicEncoding". (This is essentially to declare 0x00413A2B as the entry address of function "BasicEncoding"). Later, whenever this address shows in the code pane, we will see this mnemonic for this address. This will facilitate our analysis work greatly.
5. Analysis of Code Beginning at 0x00410000
Function 0x00410000 first clears the ESI/EDI growing direction and immediately calls function 0x00413A18. At 0x00413A18, it again plays the trick of INT 2D. If the malware analyzer or binary debugger does not handle by the byte scission properly, the stack contents will not be right and the control flow will not be right (see Tutorials 3,4,5 for more details of INT 2D).
In summary, when the function returns to 0x00413BED, the EBP should have been set up property. Its value should be 0x00413A40.
6. Analysis of Code Beginning at 0x00413A40
We now delve into the instruction CALL EBP (0x00413A40). Figure 6 shows the function body of 0x00413A40. It begins with an INT 2D instruction (which is continued with a RET instruction). Clearly, in regular/non-debugged setting, when EAX=1 (see Tutorial 4), the byte instruction RET should be skipped and the execution should continue.
|Figure 6: Another Decoding Function|
Challenge of the Day
The major bulk of the function is a multiple level nested loop which decodes and overwrites (a part) of the code segment. Now here comes our challenge of the day.
(1) How do you get out of the loop? [hint: the IMM debugger has generously plotted the loop structure (each loop is denoted using the solid lines on the left). Place a breakpoint at the first instruction out of the loop - look at 0x00413B1C]
(2) Which part of the code/stack has been modified? What are the starting and ending addresses? [Hint: look at the instructions that modify RAM, e.g., the instruction at 0x00413A6F, 0x00413A8D, 0x00413B0E.