Evolution of Excel 4.0 Macro Weaponization

The evolution of the Excel 4.0 (XL4) macro malware proceeds apace, with new variations and techniques regularly introduced. To understand the threat landscape, the VMware NSBU Threat Analysis Unit extended its previous research on XL4 macro malware (see the previous blog) to analyze new trends and techniques.

Against analysis engines, the new samples have some novel evasion techniques, and they perform attacks more reliably. These variants were observed in June and July. Figure 1 depicts the Excel 4.0 macro malware wave.

Excel 4.0 Macro Malware Wave — Figure 1: Malicious XL4 submission: May-Aug 2020

Broadly, the samples can be categorized into three clusters. Based on the variation of the samples in these three clusters, the weaponized documents can be grouped into multiple variants.

Cluster 1: Relative Reference

The samples in this cluster appeared in the month of June. They use FORMULA.FILL for obfuscation and to move the payload around the sheet. The formula uses relative references to access values stored in the sheet. There are variations in this category; the variants introduce new techniques to enhance evasion and avoid detection.

FORMULA.FILL is used to populate specific cells of the sheet with formulas. The formulas generate ASCII characters, and these characters are then concatenated by an additional formula to generate the final payload. More precisely, FORMULA.FILL(“=CHAR(R[29040]C[-203])”, R20829C252:R20909C252) populates the cell from R20829C252 to R20909C252 with the formula =CHAR(R[29040]C[-203]). The formula uses relative addressing (R[29040]C[-203]) to access the value stored in the sheet from R49869C49 (that is, row 20829 + 29040 and column 252 – 203 ) to R49949C49. The CHAR formula converts these numeric values to the corresponding ASCII characters (see Figure 2).

CHAR Formula Converts Numeric Values — Figure 2: Generated ASCII characters using relative addressing (e.g., cell 20829 will contain the ‘&’ character)

The obfuscated payload (Figure 3) refers to the cells from R20829C252 to R20909C252, where each cell represents one character. These characters are then concatenated to generate the final payload. Each payload represents a line in the macro. For example, cell 57827 will contain the string ‘=IF(GET.WINDOW(23)<3GOTO(R31930C230))’.

De-obfuscated Macro Lines — Figure 3: De-obfuscated macro lines

The de-obfuscated payload is copied into another cell of the sheet using the =FORMULA function, leveraging relative/offset–based addressing.

Eventually, an ON.TIME(NOW(), {payload_address}) statement is used to pass the control flow of the macro to the first line of the de-obfuscated payload, causing the execution to proceed, line by line, through the de-obfuscated macro.

The decoded payload is grouped into the following three code blocks.

An execution environment check (Figure 4) that detects sandboxing environments, single-stepping, and, based on the machine type (32 or 64-bit architecture), passes control to either of the following code blocks

A second–stage payload downloader for a 32-bit machine

A second–stage payload downloader for a 64-bit machine

Apart from the techniques described in the previous blog, the following new techniques have been added to the execution environment check:

A check for the non-default VBAWarnings registry key is used to detect the sandbox environment. In order to add this evasion technique, the payload does not use reg.exe to read the policy settings from the registry. Instead, it creates a script file to read the registry key, which stores the policy settings to warn users when Visual Basic for Applications (VBA) macros are present in the Excel document. The script file is created by the macro and then invoked using explorer.exe. This is done to evade behavioral detection engines, since executing reg.exe from a document is considered suspicious compared to invoking explorer.exe.
The formula IF(NUMBER(SEARCH(“32”,GET.WORKSPACE(1)) is used to check the architecture of the execution environment, and, based on the architecture, to determine which code will be used to download the second–stage payload. These two downloaders are associated with the identified architecture (32–bit or 64–bit), making the attack more reliable, as earlier samples were not able to execute due to environment mismatch.

Execution Environment Checks — Figure 4: The de-obfuscated payload: execution environment checks

The two architecture-specific downloaders use different techniques to download and execute their second-stage payloads.

On a 32–bit machine, the sample downloads and executes the second-stage payload using the technique described in the previous blog, i.e., the sample uses URLDowloadToFileA from Urlmon to download a DLL, and then executes the downloaded DLL using rundll32.exe.

On a 64-bit machine, rather than directly invoking URLDowloadToFileA from Urlmon, the payload uses a stealthier technique to download and load the second–stage payload (see Figure 5). The macro creates two VBS script files; the first script downloads the DLL, and the next script executes the downloaded DLL.

More precisely, the download script performs the following actions:

Creates a ServerXMLhttp object with a GET request object to download the 2nd stage payload from one of the download URLs stored in the payload (the script tries these URLs in succession)

Uses “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)” as the user agent string, to make the request stealthier

In case of success (response code 200), the script creates an ADODB stream object to dump the response message body into a file

De-obfuscated Payload — Figure 5: The de-obfuscated payload: second-stage downloader for 64-bit machine

Then, the execution script uses rundll32.exe to execute the downloaded DLL. Note that the VBS script is executed using explorer.exe; as a result, rundll32.exe appears to be invoked by explorer.exe, which makes the attack less suspicious.

Variation

Instead of using the “IF” condition on the return value of the macro function—which provides the characteristic of the execution environment to determine the execution flow—these samples use the return value from the macro functions to create the payload. These return values are used to generate ASCII characters, which are then concatenated to generate the payload (see Figure 6). Finally, control is transferred to the payload for execution.

Checks to Detect Sandbox and Debugging — Figure 6: Environment checks to detect sandbox and debugging

As the sandboxing environment will have different characteristics, the macro function will return a different value. This will result in an invalid payload, and, eventually, the execution terminates when control is transferred to the invalid payload.

For example, in Figure 6 the cell R63986C175, which has formula =LEN (GET.WORKSPACE(31)) +731, will either have value 735 or 736 depending on the execution mode (i.e., whether the execution mode is single-step or not.) Further, the value is used to generate a character that is then used to generate the payload. Therefore, if single-stepping mode is enabled, the corresponding character value will generate the wrong payload. The dependency on the execution environment to generate payload code makes it hard for the automatic static de-obfuscator tools to de-obfuscate the code. Static de-obfuscators are not aware of the correct execution environment, which is needed to generate the correct code. For example, tools such as XMLMacroDeobfuscator, which utilizes an internal XLM emulator to interpret the macros without fully executing the code, will not be able to extract or decode the obfuscated script.

Cluster 2: Custom Function Name

The REGISTER function provides an option to register a Windows API function with a custom name. Later, the registered custom name can be used directly to call the function. The samples in this cluster use this technique to add another layer of indirection. As the original function names are not used directly to invoke the functions, the static analysis engine cannot directly extract the strings that represent the function names. In addition, the strings that represent function names, DLL names, and URLs are obfuscated.

In order to execute the payload, the sample first de-obfuscates the strings representing function names, DLL names, and URLs, and stores them in cells. Then, the payload refers to these strings through cell addresses in the REGISTER macro function to register the Windows API function and assign to it a custom name. Finally, the custom name is used to actually call the function (this process is shown in Figure 7).

Payload Using Register to call Windows Functions — Figure 7: Payload using REGISTER to call Windows functions

More precisely, the initial function $AI$22412() constructs one de-obfuscated string by concatenating various ASCII characters that are sprinkled around the sheet, and then stores it in cell $BB$54. The function then invokes an additional function (namely, $BU$59903), which carries out a similar task. This results in a chain of function invocations that fills the spreadsheet with de-obfuscated strings.

Fills Spreadsheet With De-obfuscated Strings — Figure 8: String de-obfuscator

Once all strings have been created, control is transferred back to the starting subroutine, which uses the REGISTER macro function to register a Windows API function with a custom name. For example, as shown in Figure 9, the Windows API function URLDownloadFileA is registered with the custom name “LJITkWaB”, and in the next line the function is invoked using the custom label.

Function Call Using Custom Label Name — Figure 9: Custom function name

Variation

There is a variant in this cluster that uses the same REGISTER technique, but it has a more complicated string obfuscation approach to make static de-obfuscation more difficult: Rather than using character concatenation to construct/de-obfuscate strings, this variant uses a more complicated logic, as shown in Figure 10.

Complicated Logic Construct and De-Obfuscate Strings — Figure 10: Variant of string de-obfuscator

In order to create the plain text strings, the de-obfuscation sub-routine accesses various values sprinkled around the sheet. These values are sprinkled in sequence so that the values can be accessed just by incrementing the cell pointer. For example, one of these sequences starts from P28572, as shown in Figure 10. There is a loop in the de-obfuscation routine to generate an ASCII character using each value stored from P28572 to P28597. Each ASCII character is then concatenated to generate the string. P28598 stores “qltxYMK”, marking the end of the sequence, which is used in the IF condition to terminate the loop.

There is one de-obfuscation subroutine for each string that uses similar logic but works on a different set of values sprinkled around the sheet. Once all strings are created, control is transferred back to the starting subroutine.

Cluster 3: Powershell

The samples in this cluster use powershell.exe to download and execute PowerShell scripts as a second-stage payload from the XL4 macro. These samples appeared in June and July. They are not weaponized with sophisticated evasion or obfuscation, but they make use of the “very hidden” option to hide the macro sheet. The payload can be extracted by using XMLMacroDeobfuscator which is used to decode XLM macros. The output from the XMLMacroDeobfuscator is shown in Figure 11; the macro executes powereshell.exe to download and execute the PowerShell script.

Macro Executes PowerShell Script — Figure 11: Payload using PowerShell

Conclusions

Excel 4.0 macros continue to be valuable to attackers, as they deliver a reliable method to get malicious code to run on a target. In many environments, Excel worksheets with macros are used extensively for legitimate purposes, and, therefore, they cannot easily be disabled without affecting essential business processes. Therefore, analysts and security vendors will need to get used to consistently updating tooling and signatures, as attacks continue to evolve.

Appendix

IOCs

Cluster 13

0e8c52a99cd633978f9a7e1e22b0d19d1bbe69bf

3f57a6638749e81b658a5993874c58afcff40608

5ca9a9c70f62f998ed13f3410c09ce25c3474d98

5b2fcb6d4816d9e11d7c51cb0b7ba67a608abc14

5a2e6257bd3e343d91ee440123f53c4eb2f252bf

Cluster 14

7c7f242e98a8b2251f95ac6fa5963988e3ddb6fb

c8300ca749eaae3d879ef4a83c54780344383560

caed135aee542d905e39eaa551c6ea6ebb2b040b

ef511fa9e4f2f3227cae307b7c3b99c2be09e06d

Cluster 15

f3905a1ff485c94eca9df2fc5c6dc16921591315

a3816c37d0fbe26a87d1cc7beff91ce5816039e7

79e753a3c5e9c35241e8a06ffa56fff6189a29cf

579853532fadf08ef8ed7369d6d596af619bdf5a

References:

Why CISOs Should Invest More Inside Their Infrastructure

Serpent - The Backdoor that Hides in Plain Sight

How Not to Build a SOC

Podcast: Discussing the latest security threats and threat actors - Tom Kellermann (Virtually Speaking)

Evolution of Excel 4.0 Macro Weaponization – Part 2

Cluster 1: Relative Reference

Variation

Cluster 2: Custom Function Name

Variation

Cluster 3: Powershell

Conclusions

Appendix

Cluster 13

Cluster 14

Cluster 15

Cluster 1: Relative Reference

Variation

Cluster 2: Custom Function Name

Variation

Cluster 3: Powershell

Conclusions

Appendix

Cluster 13

Cluster 14

Cluster 15

Related Articles