Research performed by Takahiro Haruyama, VMware Threat Analysis Unit (TAU)

The Carbon Black Threat Analysis Unit (TAU) recently analyzed a series of malware samples that utilized compiler-level obfuscations. For example, opaque predicates were applied to Turla mosquito and APT10 ANEL. Another obfuscation, control flow flattening, was applied to APT10 ANEL and Dharma ransomware packer.

ANEL (also referred to as UpperCut) is a RAT program used by APT10 and observed in Japan uniquely. According to SecureWorks, all ANEL samples whose version is 5.3.0 or later are obfuscated with opaque predicates and control flow flattening.

Opaque predicate is a programming term that refers to decision making where there is actually only one path. For example, this can be seen as calculating a value that will always return True. Control flow flattening is an obfuscation method where programs do not cleanly flow from beginning to end. Instead, a switch statement is called in an infinite loop having multiple code blocks each performing operations, as detailed later in this paper in Figure 10.

The obfuscations looked similar to the ones explained in Hex-Rays blog, but the introduced IDA Pro plugin HexRaysDeob didn’t work for one of the obfuscated ANEL samples because the tool was made for another variant of the obfuscation.

TAU investigated the ANEL obfuscation algorithms then modified the HexRaysDeob code to defeat the obfuscations. After the modification, TAU was able to recover the original code.

The below image depicts, an example of an obfuscated function:

Figure 1: obfuscated function example (all codes cannot be displayed in a screen)

The image below shows the same function once it has been deobfuscated:

Figure 2: de-obfuscated result of the same function

Details

HexRaysDeob is an IDA Pro plugin written by Rolf Rolles to address obfuscation seen in binaries. In order to perform the deobfuscation, the plugin manipulates the IDA intermediate language called microcode. If you aren’t familiar with those structures (e.g, microcode data structures, maturity level, Microcode Explorer and so on), you should read his blog post. Rolles also provides an overview of each obfuscation technique in the same post.

HexRaysDeob installs two callbacks when loading:

optinsn_t for defeating opaque predicates (defined as ObfCompilerOptimizer)
optblock_t for defeating control flow flattening (defined as CFUnflattener)

Opaque Predicates

Before continuing, it is important to understand Hex-Rays maturity levels. When a binary is loaded into IDA Pro, the application will perform distinct layers of code analysis and optimization, referred to as maturity levels. One layer will detect shellcode, another optimizes it into blocks, another determines global variables, and so forth. The optinsn_t::func callback function is called in maturity levels from MMAT_ZERO (microcode does not exist) to MMAT_GLBOPT2 (most global optimizations completed). During the callback, opaque predicates pattern matching functions are called. If the code pattern is matched with the definitions, it is replaced with another expression for the deobfuscation. This is important to perform in each maturity level as the obfuscated code could be modified or removed as the code becomes more optimized. We defined two patterns for analysis of the ANEL sample.

Figure 3: opaque predicates pattern matching functions switch

**Pattern 1: ~(x * (x – 1)) | -2**

Below is an example of one of the opaque predicates patterns used by ANEL:

Figure 4: opaque predicates pattern 1

The global variable value dword_745BB58C is either even or odd, so dword_745BB58C * (dword_745BB58C – 1) is always even. This results in the lowest bit of the negated value becoming 1. Thus, OR by -2 (0xFFFFFFFE) will always produce the value -1.

In this case, the pattern matching function replaces dword_745BB58C * (dword_745BB58C – 1) with 2.

Pattern 2: read-only global variable >= 10 or < 10

Another pattern is the following:

Figure 5: opaque predicates pattern 2

The global variable value dword_72DBB588 is always 0 because the value is not initialized (we can check it by is_loaded API) and has only read accesses. So the pattern matching function replaces the global variable with 0.

There are some variants with this pattern (e.g., the variable – 10 < 0), where the immediate constant can be different, like 9.

Data-flow tracking for the patterns

We also observed a pattern that was also using an 8-bit portion of the register. In the following example, the variable v5 in pseudocode is a register operand (cl) in microcode. We need to check if the value comes from the result of x * (x – 1).

Figure 6: register operand (pseudocode) in pattern 1

Figure 7: register operand (microcode) in pattern 1

In another example, the variable v2 in pseudocode is a register operand (ecx) in microcode. We have to validate if a global variable with above-mentioned conditions is assigned to the register.

Figure 8: register operand (pseudocode) in pattern 2

Figure 9: register operand (microcode) in pattern 2

Data-flow tracking code was added to detect these use-cases. The added code requires that the mblock_t pointer information is passed from the argument of optinsn_t::func to trace back previous instructions using the mblock_t linked list. However, the callback returns NULL from the mblock_t pointer if the instruction is not a top-level one. For example, Figure 9 shows jnz (m_jnz) is a top-level instruction and setl (m_setl) is a sub-instruction. If the setl is always sub-instruction during the optimization, we never get the pointer. To handle this type of scenario, the code was modified to catch and pass the mblock_t of the jnz instruction to the sub-instruction.

Control Flow Flattening

The original implementation calls the optblock_t::func callback function in MMAT_LOCOPT (local optimization and graphing are complete) maturity level. Rolles previously explained the unflattening algorithm in a Hex-Rays blog. For brevity I will quickly cover some key points to understand the algorithm at a high level.

Normally the call flow graph (CFG) of a function obfuscated with control flow flattening has a loop structure starting with yellow-colored “control flow dispatcher” like this, shown after the First Block:

Figure 10: function obfuscated with control flow flattening

The original code is separated into the orange-colored “first block” and green-colored flattened blocks. The analyst is then required to resolve the correct next block and modify the destination accordingly.

The next portion of first block and each flattened block is decided by a “block comparison variable” with an immediate value. The value of the variable is assigned to a specific register in each block then compared in a control flow dispatcher and other condition blocks.

Figure 11: block comparison variable example (blue-highlighted eax in this case)

If the variable registers for the comparison and assignment are different, the assignment variable is called “block update variable” (which is further explained later).

The algorithm looks straightforward however some portions of the code had to be modified in order to correctly deobfuscate the code. This is further detailed below.

Unflattening in multiple maturity levels

As previously detailed, the original implementation of the code only works in MMAT_LOCOPT maturity level. Rolles said this was to handle another obfuscation called “Odd Stack Manipulations”, referred in his blog). However the unflattening of ANEL code had to be performed in the later maturity level since the assignment of block comparison variable heavily depends on opaque predicates.

As an example in the following obfuscated function, the v3 and v7 variables are assigned to the block comparison variable (b_cmp). However the values are dependent on opaque predicates results.

Figure 12: simple obfuscated function example

Once the opaque predicates are broken, the loop code becomes simpler:

Figure 13: simple obfuscated function example (opaque predicates deleted)

Unflattening the code in later maturity levels like MMAT_GLBOPT1 and MMAT_GLBOPT2 (first and second pass of global optimization) caused additional problems. The unflattening algorithm requires mapping information between block comparison variable and the actual block number (mblock_t::serial) used in the microcode. In later maturity levels, some blocks are deleted by the optimization after defeating opaque predicates, which removes the mapping information.

In the example below, the blue-highlighted immediate value 0x4624F47C is assigned to block comparison variable in the first block. The mapping can be created by checking the conditional jump instruction (jnz) in MMAT_LOCOPT.

Figure 14: mapping between block comparison variable 0x4624F47C and block number 9

Additionally here is no mapping information in MMAT_GLBOPT2 because the condition block that contains the variable has been deleted. So the next block of the first one in the level can not be determined.

Figure 15: mapping failure

To resolve that issue, the code was written to link the block comparison variable and block address in MMAT_LOCOPT, as the block number is changed in each maturity level. If the code can’t determine the mapping in later maturity levels, it attempts to guess the next block number based on the address, considering each block and instruction addresses. The guessing is not 100% accurate however it works for the majority of obfuscated functions tested.

Figure 16: the output log showing ea and block number translation

Figure 17: the result of deobfuscation in this case

Control flow handling with multiple dispatchers

Though the original implementation assumes an obfuscated function has only one control flow dispatcher, some functions in the ANEL sample have multiple control dispatchers. Originally the code called the optblock_t::func callback in MMAT_GLBOPT1 and MMAT_GLBOPT2, as the result was not correct in MMAT_CALLS (detecting call arguments). However, this did not work for functions with three or more dispatchers. Additionally, Hex-Rays kernel doesn’t optimize some functions in MMAT_GLBOPT2 if it judges the optimization within the level is not required. In this case, the callback is executed just once in the implementation.

To handle multiple control flow dispatchers, a callback for decompiler events was implemented. The code catches the “hxe_prealloc” event (according to Hex-Rays, this is the final event for optimizations) then calls optblock_t::func callback. Typically this event occurs a few times to several times, so the callback can deobfuscate multiple control flow flattenings. Other additional modifications were made to the code (e.g., writing a new algorithm for finding control flow dispatcher and first block, validating a block comparison variable, and so on).

After the modification, for example, the following functions with multiple control flow dispatchers can be unflattened.

In the case of two control flow dispatchers:

Figure 18: example #1 with two control flow dispatchers (graph)

Figure 19: example #1 with two control flow dispatchers (before)

Figure 20: example #1 with two control flow dispatchers (after)

In the case of three control flow dispatchers:

Figure 21: example #2 with three control flow dispatchers (graph)

Figure 22: example #2 with three control flow dispatchers (before)

Figure 23: example #2 with three control flow dispatchers (after)

Implementation for various conditional/unconditional jump cases

As shown in Figure 24, the original implementation supports the following two cases of flattened blocks to find a block comparison variable for the next block (the cases are then simplified). In the second case, block comparison variable is searched in each block of endsWithJcc and nonJcc. If the next block is resolved, the CFG (specifically mblock_t::predset and mblock_t::succset) and the destination of goto jump instruction are updated.

Figure 24: originally-supported two cases of blocks

I found and implemented three more cases in the ANEL sample. Of these, two cases are shown below:

Figure 25: newly-supported two cases of blocks

The code tracks the block comparison variable in each predecessor and more (if any conditional blocks before the predecessor) to identify each next block for unflattening.

And, in the third case that was implemented, the block comparison variables are not assigned in the flattened blocks but rather the first blocks according to a condition. For example, the following microcode graph shows edi is assigned to esi (the block comparison variable in this case) in block number 7 but the edi value is assigned in block number 1 and 2.

Figure 26: newly-supported case (assigned in first blocks)

If the immediate value for block comparison variable is not found in the flattened blocks, the new code tries to trace the first blocks to obtain the value and reconnects block number 1 and 2 as successors of block number 7, in addition to normal operations mentioned in the original cases.

Another example function did the same processing twice:

Figure 27: newly-supported case (assigned in first blocks twice #1)

Figure 28: newly-supported case (assigned in first blocks twice #2)

In this case, the code parses the structure in first blocks then reconnects each conditional blocks under the flattened blocks (#1 and #2 as successors of #13, #3 and #4 as successors of #11).

Last, but not least, in all cases explained here, the tail instruction of the dispatcher predecessor can be a conditional jump like jnz, not just goto. The modified code checks the tail instruction and if the true case destination is a control flow dispatcher, it updates the CFG and the destination of the instruction.

Other minor changes

The following changes are minor compared with above referenced ones.

Additional jump instructions are supported when collecting block comparison variable candidates and mapping between the variable and ea or block number (jnz/jle in JZCollector, jnz in JZMapper)
An entropy threshold adjustment due to check in high maturity level
Multiple block tracking for getting block comparison variable

And the last change that was introduced in regards to the block update variable referred in the overview. Some functions in the ANEL sample utilize this, however the assignment is a little bit tricky:

Figure 29: block update variable usage with and instruction

By using the and instruction, the immediate values used in comparison look different from assigned ones. The modified code will consider this.

Evaluation

The modified tool was tested with an ANEL 5.4.1 payload dropped from a malicious document with the following hash (previously reported by FireEye): 3d2b3c9f50ed36bef90139e6dd250f140c373664984b97a97a5a70333387d18d

The code is able to deobfuscate 34 of 38 functions (89%). It should be noted every function is not always obfuscated. The failure examples are:

Not yet implemented cases (e.g., a conditional jump of the dispatcher predecessor’s tail instruction in goto N predecessors case, consecutive if-statement flattened blocks)
An incorrect choice of control flow dispatcher and first block (algorithm error)

These fixes will be prioritized for future releases.

Additionally there is a known issue with the result (e.g., the remaining loop or paradoxical decompiled code), using the following IDAPython command in Output window:

idc.load_and_run_plugin(“HexRaysDeob”, 0xdead)

The command will instruct the code to execute only opaque predicates deobfuscation in the current selected function. This allows an analyst to quickly check if there are any lost blocks by control flow unflattening. For instance, in one of the failure cases, the pseudocode changes like this:

Figure 30: one failure case pseudocode (before)

Figure 31: one failure case pseudocode (after)

After the check, the original result can be restored by using the following command.

idc.load_and_run_plugin(“HexRaysDeob”, 0xf001)

Conclusion

The compiler-level obfuscations like opaque predicates and control flow flattening are started to be observed in the wild by analyst and researchers. Currently malware with the obfuscations is limited, however TAU expects not only APT10 but also other threat actors will start to use them. Unfortunately, in order to break the techniques we have to understand both of the obfuscation mechanisms and disassembler tool internals before we can automate the process.

TAU modified the original HexRaysDeob to make it work for APT10 ANEL obfuscations. The modified code is available publically here. The summary of the modifications is:

New patterns and data-flow tracking for opaque predicates
Analysis in multiple maturity levels, considering multiple control flow dispatchers and various jump cases for control flow flattening

The tool can work for almost all obfuscated functions in the tested sample. This implementation will deobfuscate approximately 89% of encountered functions. This provides researchers and analyst broad tool to attack this type of obfuscation, and if it adopted in other families. In should be noted that the tool may not work for the updated versions of ANEL if they are compiled with different options of the obfuscating compiler. Testing in multiple versions is important, so TAU is looking for newer versions ANEL samples. Please reach out to our unit if you have relevant samples or need assistance in deobfuscating the codes.

It’s difficult to create a generic tool that can defeat every compiler-level obfuscated binary but experience and knowledge about IDA microcode can be useful for additional new tools.

Why CISOs Should Invest More Inside Their Infrastructure

Serpent - The Backdoor that Hides in Plain Sight

How Not to Build a SOC

Podcast: Discussing the latest security threats and threat actors - Tom Kellermann (Virtually Speaking)

Defeating Compiler-Level Obfuscations Used in APT10 Malware

Details