This article was authored by Oleg Boyarchuk and Stefano Ortolani.
It is no mystery that Emotet’s development recently picked up. After its resurrection (some researchers pointing to TrickBot as the threat actor responsible), it bootstrapped two new botnets (Epoch 4 and Epoch 5), and it recently looked at replacing its own modules with native 64-bit implementations. Tracking its network infrastructure is however linked to the ability of decrypting and extracting its configuration file containing, besides the encryption keys that can be in turn used to identify the botnet, a list of compromised hosts that the payload would connect to upon execution. Unfortunately, either willingly or as a side-effect of other changes (or rather, some obfuscation improvements), since mid-May Emotet samples have started to transition to a new method of storing the configuration data within the binary: not anymore as a single blob of data, but as a split collection of fragments, each one obfuscated separately.
In this blogpost we briefly detail this change and show how to keep extracting the list of network indicators despite this new obfuscation technique.
To provide actionable threat intelligence, many vendors of dynamic analysis systems often decide to add routines to automatically extract C2 configuration data of known malware families, including Emotet. For a very long time Emotet kept its encrypted C2 config in the beginning of .data section of the PE module. As we noticed in our previous blog post, this did not change when Emotet decided to migrate to pure 64-bit binaries: each and every sample had a function which was responsible for config decryption (see Figure 1).
Figure 1: Encrypted C2 config passed as a parameter to the decryption routine (c6fe1cf52c7f3299f07a1e1c05e19e2013330e4c).
Figure 2: Decrypted C2 config is an array of IP:port pairs (c6fe1cf52c7f3299f07a1e1c05e19e2013330e4c).
Unfortunately, that is not the case anymore: in a recent wave of Emotet samples, the configuration data is not embedded in the binary in one single blob; rather, each new sample now features an accumulator function (see Figure 3) which returns a pointer to an array of function pointers, each returning a single C2 IP address and port (see Figure 4).
Figure 3: C2 accumulator function from the new wave (b409ca9851fecca61e6cb0aaaa56fdaafc7242f5).
This means that it is now impossible to statically retrieve a blob of data from a sample, decrypt it, and extract a list of IP addresses, as the C2 configuration is now spread across the code. Another challenge is that IP address and port cannot be extracted from a static view of the disassembled code because of some newly added obfuscation, as shown in Figure 4: the code now performs a series of mathematical operations on a set of hardcoded values before leading to what are the actual values representing IP address and port.
Figure 4: Obfuscated C2 function from the new wave which returns 188.8.131.52:8080 (b409ca9851fecca61e6cb0aaaa56fdaafc7242f5).
How to Defeat Obfuscation
A straightforward approach to deal with this kind of obfuscations is to use code decompilers (for example Hex-Rays). An often-underestimated advantage of decompilers is the ability to also reduce code complexity as a by-product of lifting the binary code to higher level languages; Figure 5 shows an example of a decompiled and deobfuscated code fragment. While ideal for manual analysis, decompilers are expensive and are not guaranteed to work in the general case (deobfuscation tends to be hit-or-miss).
Figure 5: C2 function from the new wave deobfuscated by Hex-Rays (b409ca9851fecca61e6cb0aaaa56fdaafc7242f5).
Running the code in a code emulator such as QEMU or Qiling (both free for commercial use) is often a more reliable way to extract the required data because they can emulate both the CPU and the underlying OS environment. In this scenario, the starting point needs to be the inner DLL extracted as shown in our blog post. Once that is done, static analysis can be used to identify the accumulator function, and thereby obtain the full list of functions used to decode the C2 data. The last step is computing the physical offset within the module and feeding it to the emulator as a starting instruction. Figure 6 contains a quick implementation to decode the C2 data from the function shown in Figure 5 (i.e., sub_7FFA1B6AEAA4); in this case the physical offset was 0x1DEA4.
Figure 6: Small program to decode a single network indicator given a physical offset.
While a reader might start to conclude (as we predicted) that Emotet has finally resumed playing cat and mouse games with security researchers, we still believe these improvements to be part of a more comprehensive refactoring, and that payloads will likely keep evolving following internal roadmaps yet to become clear to the public. Unfortunately, all these changes are the only signal that security defenders can use to improve detectors and keep users safe from this ever-evolving threat.