The Problem with Malware Analysis
Threat researchers get thousands of samples of malware every day and, as every researcher knows, it is very difficult to analyze them in a way that allows for intelligent decisions regarding whether a sample’s reputation is good or bad. There are already some quick methods to analyze the malware—such as static analysis—but they compromise the amount of data that can be extracted.
And that’s not just good enough for us here in the Threat Analysis Unit (TAU) at Carbon Black.
Malware analysis has always been this cat-and-mouse game where malware authors use stealth obfuscation techniques, packers, and encoders to stop threat researchers from fully analyzing their samples. And these attackers are very good at this—so much so that it’s becoming increasingly difficult for us researchers to capture any kind of salient features from malware samples.
The Limitations of Today’s Tools
Static and dynamic analysis have been the go-to tools for malware analysis, but they have inherent drawbacks. For instance, static analysis can extract individual features very quickly and cheaply. But beyond that, it’s almost impossible to get any other data because of the obfuscation techniques used by malware authors.
We can use dynamic analysis – but that’s a slower, more expensive option that only works on a very limited portion of malware samples. Malware authors have been able to reduce the effectiveness of this approach, too, blocking us with the same tactics we come across during static analysis – obfuscation, anti-analysis checks, and the not-all-malware-is-created-equal concept.
The bottom line is that the cost of static analysis is very low, while the cost associated with dynamic analysis is high. While there is higher coverage and immediate discovery with static analysis, there are relatively few features within a limited set of information. Conversely, with dynamic analysis, while the data is richer, there is a lower amount of coverage, with a longer, much slower discovery period.
Finding a Route Around the Roadblocks
Where did all of this leave us? Well, we knew that each sample contained a total set of features and information – and that we were only getting a subset of that now. This meant our only choice was to find a way to extract data at a faster rate than we were able to at the time—at scale.
We started with a few questions: How could we get more information from binaries? How could we come up with an alternative approach that would reduce the cost of features typically extracted via dynamic analysis while increasing the amount of features extracted from static analysis? How could we get past the obfuscation techniques and other roadblocks planted by malware authors? And how could we do all this, at a lower cost, and at a scale not realized before?
Emulation as an Answer
We decided to start exploring how to get information from binaries faster than any existing tool or process could today. We essentially set out to emulate Window processes—without Windows.
Why emulation? It’s inexpensive, can run it inside a container, and would provide us with full introspection and control of the analysis. Our goal was to try and fake out the malware and its authors. We wanted to emulate only what was necessary to get a minimum viable solution that mocked functions, system calls, and OS subsystems.
This effort resulted in Binee, which is short for binary emulation environment. We created a tool that malware researchers can now use as part of their reverse engineering processes.
We’ve open-sourced it as a next generation software process emulator that runs on Windows, OS X, and Linux. We are going to introduce Binee at DEFCON on Saturday, August 10 and also make it available to the threat community on GitHub on August 12 (we’ll post the links when available).
We’re proud to say that, as far as we know, no one has done Windows emulation before to this degree. While we’re not done perfecting it yet, it’s a tool we are already using today in Carbon Black’s threat analysis engine.
The path to developing and launching Binee was by no means an easy one. Stay tuned and, in our next blog, we’ll share with you some of the most challenging—and sometimes unexpectedly frustrating—issues that we came up against along the way. And, of course, we’ll share how we were able to move past each one to create something truly awesome.
Meanwhile, if you are a threat researcher and want to join the team at Carbon Black, take a look at our openings. Or check out Binee on GitHub on August 12 and join us in the effort to make this emulator even better.