Wednesday, January 4, 2012

Malware Analysis Tutorial 10: Tricks for Confusing Static Analysis Tools

Learning Goals:
  1. Explore Use of Stack for Supporting Function Calls
  2. Practice Reverse Engineering
Applicable to:
  1. Operating Systems.
  2. Computer Security.
  3. Programming Language Principles.

1. Introduction
This tutorial explores several tricks employed by Max++ for confusing static analysis tools. These tricks effectively prevent static program analysis tools that plot call graph and extract system call invocation information of the malware. By "static" we mean that the tool does not actually execute/run the malware. Most "smart" virus scanners are static analysis tools. Many of them employ heuristics to tell if a binary executable is malicious or not by examining the collection of the system function calls in that binary. For example, if a binary invokes too many operations related to registry, then an alert should be flagged. If such analysis can be blocked, the malware can significantly improve its survival rate. Note that, however, such tricks cannot block "dynamic" tools which actually run the malware (typical examples include CWSandBox and Anubis).

2. Lab Configuration
You can either continue from Tutorial 9, or follow the instructions below to set up the lab. Refer to Tutorial 1 and Tutorial 4 for setting up VBOX instances and WinDbg. We will analyze the function starting at 0x4014F9. 

Figure 1. Function 0x4014F9 to Analyze


(1) In code pane, right click and go to expression "0x40105c"
(2) right click and then "breakpoints -> hardware, on execution"
(3) Press F9 to run to 0x40105c
(4) If you see a lot of DB instructions, select them and right click -> "During next analysis treat them as Command".
(5) Exit from IMM and restart it again and run to 0x40105c. Select the instructions (about 1 screen) below 0x40105c, right click -> Analysis-> Analyze Code. You should be able to see all loops now identified by IMM.
(6) Now go to 0x401147, you will notice that it's "CALL 0x4014F9". Press F4 to run to the point and then Press F7 to step into the function 0x4014F9.



3. Two-Layer Function Return
We now analyze the first trick of a two-layer function return which disrupts call graph generation. In the following we analyze function 0x00401838. Observe the instructions from 0x401502 to 0x401505 (in Figure 2). Our first impression would be that Function 0x401038 takes three parameters: a pointer to string "ntdll.dll", 0x7C903400, and 0x7C905D40. However, later you will find that it is not the case: function 0x00401838 is simply used to confuse static analysis tools.

Figure 2. Two Layer Function Call at 0x401505
Figure 3 displays the function body of 0x401838. It starts with a call of 0x00413650 and then a bunch of other instructions (later, you will notice that these instructions will never be executed).

Figure 3. Function body of 0x401838
Notice that at 0x0040183B, it calls function 0x00413650, whose function body is displayed in Figure 4. There are only two instructions: POP EAX, and RETN.
Figure 4. Function Body of 0x413650
Now we have the interesting part. Look at the stack contents in Figure 5. First of all, starting from the third computer word, we have the three words pushed by the code earlier (they are pointer to "ntdll.dll", 0x7c903400, and 0x7c905d40). Then you might notice that the top two words are the RETURN ADDRESSES pushed by the CALL instructions.

Each CALL instruction consists of essentially two steps: push the address of the next instruction to the stack (so that it can return when the function being called is completed) and then jump to the entry address of the function. Thus, it is not difficult to infer that 0x00401840 is pushed by the CALL 0x00413650 at 0x0040183B (see Figure 3), and 0x0040150A is pushed by the CALL 0x00401838 at 0x00401505 (see Figure 2). So the POP EAX will pop off 0x00401840 and save it to EAX. When the RETN instruction is executed, it is directly returning to 0x40150A (i.e., it jumps two layers back)! Clearly, the instructions starting from 0x00401840 are never executed and the two layer jumping can confuse quite a number of static analysis tools when they try to plot call graphs.

Figure 5. Stack Contents at 0x00413650


4. Invoking NTDLL System Calls using Encoded Table
Next we show an interesting technique to invoke ntdll.dll functions without the use of export table. We will analyze the instruction at 0x401557 (as shown in Figure 6), it calls function 0x4136BF. Later, you will find that function 0x4136BF invokes zwAllocateVirtualMemory without exposing the entry address of zwAllocateVirtualMemory explicitly and it does not use export table.

We leave the analysis of  the logic between 0x40150A and 0x401557 (shown in Figure 6) to readers. Basically the code is to establish an encoded translation table in stack from 0x0012D538 to 0x0012D638.

Figure 6. Code Between 0x40150A and 0x401557

Now observe the function body of 0x004136BF in Figure 7. The first instruction is CALL 0x004136DC. You might notice that between 0x004136BF and 0x004136DC, there are some gibberish code. If you read it more carefully, you will find that they are actually the contents of string "zwAllocateVirtualMemory" where the byte at 0x004136C4 is the character "z".

Think about the CALL 0x004136BF again. It is essentially two stesp:
    PUSH 0x004136C4   # note 0x004136C4 is the beginning of "zwAllocateVirtualMemory" (see Figure 8)
    JUMP 0X004136DC

Figure 7. Function Body of 0x4136BF.

Now you can pretty much guess the point of the code: it is trying to invoke zwAllocateVirtualMemory! But how is it accomplished? Let's delve deeper. At 0x004136DC it is calling function 0x00401172 and when it returns, at 0x004136E1, it JMP EAX. Can you guess the functionality of 0x00401172?


Figure 8. Stack Contents at 0x4136DC before the Call of 0x00401172
We list some hints below (see Figure 9):
(1) The loop between 0x0040118E and 0x004011A0 is to compute the checksum of the function name
(2) The loop between 0x4011A2 and 0x004011BA is a binary search. The search is performed on the encoded export table (as discussed in Tutorial 9). Each entry has two elements: (1) check sum of the function name, and (2) the entry address of the function.

You could easily infer that function 0x00401172, given the name of a function, returns its entry address. Once it returns, the return value is saved in EAX. Then JMP EAX at 0x004136E1 will invoke the function.


5. Challenge of the Day
(1) What is the checksum of zwAllocateVirtualMemory?
(2) What are the parameters of zwAllocateVirtualMemory? Why does Max++ call this function?
(3) Look at 0x40117B, what is stored in the thread local storage (EAX+2C)? How do you make your conclusion?

20 comments:

  1. This page has a lot of typos. All though this is good tutorial.

    ReplyDelete
  2. I am by the names of Joseph Lwomwa Student of Computer Science specialized in Computer security at Makerere University in a country called Uganda found in Africa whose capital city is Kampala. I am currently writing my research and my research topic is "A HYBRID ALGORITHM TO DETECT MALWARE AND ELIMINATE ZERO DAY ATTACKS" which hybrid algorithm is composed of static Heuristic algorithm and static signature based algorithms.

    The reason i am writing to you is that i realized that probably you could be in the best position to help me out on several things such as;
    i) whether this topic of mine is genuine and relevant at this current point in time

    ii)The structure of the heuristic algorithms that have been used in malware detection and in the various anti-malware detection tools

    iii) The structure of the signature detection algorithms that have been used in malware detection and in the various anti-malware detection tools

    iv) plus relevant material that could be a great resource towards coming up with this algorithm, and the reason am asking for your assistance is because i strongly believe you could be best suited in helping me.

    I am further writing to you because i have searched several papers but with nothing close to the structures of these algorithms since several of these researchers have connections with the malware companies that give or expose little about the algorithms that have been developed and used in the research.

    I will be very glad and grateful for your positive assistance and help. My email is lwomwajoseph@gmail.com

    ReplyDelete
  3. This is one of the best blog on Tricks for Confusing Static Analysis Tools. Thanks for sharing.

    ReplyDelete

  4. Thanks for sharing, very informative blog.
    ReverseEngineering

    ReplyDelete
  5. You’re so awesome! I do not think I’ve truly
    read a single thing like this before. So nice to discover somebody with some original
    thoughts on this issue. Seriously.. thank
    you for starting this up. This website is one thing that is required on the internet, someone with a little
    originality!
    Here is the link of Latest Designer:
    https://softserialskey.com/proteus-design-suite-professional-crack/
    Download it Free & Enjoy!

    ReplyDelete
  6. Great Article
    Cyber Security Projects


    Networking Security Projects

    JavaScript Training in Chennai

    JavaScript Training in Chennai

    The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

    ReplyDelete
  7. While using the printer on the off chance that it stalls out in the center of the work, it can raise the temper of the user. We have seen numerous user putting their inquiries dependent on ordinance printer in error state issue, for instance what do I do if my group printer is in error state?, how would I get my standard printer out of error state, etc. Consequently, to tackle every one of these inquiries we have the arrangement and it is referenced underneath. Printer in an error state canon

    ReplyDelete
  8. Fix Brother Printer Activation Error Code - 30 message may keep you from printing, and here are users detailing comparative issues. This error message may now and then show up while attempting to print PDF records. How to fix error code 30 in windows 10 In the event that this occurs, the component Print to PDF can be using to tackle the issue. This is another normal issue that can happen, and on the off chance that you don't have the essential security consents, it can happen. This error can influence any brand of Brother Printer, and the reason is no doubt an obsolete Brother Printer driver if the issue shows up. This error message may show up on practically any Windows variant, however by using one of our answers, you ought to have the option to fix it.

    ReplyDelete
  9. Assuming the initial steps don't solve the issue and the printer is in an error state hp is still disconnected, checking the software might help. In Mac, Windows, or Linux, Open the PC control board by right-clicking for Windows on the operating frameworks image, located at the bottom left of the screen.

    ReplyDelete
  10. Your site is awesome. Thank you for doing such a wonderful job. Please keep it up. If you could click 30 times in 30 seconds, could you? It may surprise you to learn that there is not only a way to count clicks, but many people compete globally in the game. With this article, you can check the number of clicks you made in your time limit. Read more Clicks Per Second.

    ReplyDelete
  11. Thank your for sharing the tips and solution with everyone you are the best. Most of the people don't even know how to write content properly that's why I always give people tips for writing a persuasive paper. As it will not just only help you in doing your assignments but it will help you in any kind of writing paper.

    ReplyDelete
  12. sampleassignment.com is a full-service, professional academic assistance company that offers writing help services for all levels of assignments. We have a team of writers from various backgrounds who are experts in research and writing. They are committed to providing the best service for students. We also offer urgent and last minute assignment help Australia for those who need it!

    ReplyDelete
  13. The new report by Expert Market Research titled, Global Veterinary Diagnostic Imaging Market is projected to reach USD 3.2 billion by 2027 from USD 3.6 billion in 2022. Veterinary imaging is a branch of veterinary medicine that is used to obtain medical images of animal for diagnosis of the disease. Imaging systems such as radiography, CT imaging, ultrasound imaging, X-ray, MRI, endoscopy imaging, and others, are used to diagnose diseases in companion animals, live-stock animals, large animals and others. The prominent players in the veterinary imaging market are GE Healthcare (US), Carestream Health (US), Agfa-Gevaert N.V. (Belgium), Esaote S.p.A (Italy), IDEXX Laboratories, Inc. (US), Canon Inc. (Japan), Heska Corporation (US), Mindray Medical International Limited (China), Siemens Healthineers (Germany), FUJIFILM Holdings Corporation (Japan), Samsung Electronics Co., Ltd. among others.

    ReplyDelete

  14. Your article reflects the issues people care about.

    ReplyDelete
  15. The article provides timely information that reflects multi-dimensional views from multiple perspectives.

    ReplyDelete
  16. Really very informative and creative. This sharing concept is a good way to enhance the knowledge.

    ReplyDelete
  17. "I can't thank you enough for your kind words! Your appreciation for my blog motivates me to continue sharing my thoughts and experiences. It's readers like you who make all the hard work worth it. Thank you for being amazing!" For more information visit us on Clinical Research Courses in pune

    ReplyDelete