Introduction

Cobalt Strike [1] is a tool to support red teams in attack simulation exercises. To this end, Cobalt Strike provides several techniques that allow a red team to execute targeted attacks to compromise a target network, established a bridge head on a host, and then move laterally to gain additional access to computers, accounts, and, eventually, data.

While the goal of Raphael Mudge, the author of Cobalt Strike, was to provide a framework to test network defenses to support the development of effective detection mechanisms and incident response procedures, the power provided by the tools was not lost on malicious actors (see, for example, [2]).

Soon, Cobalt Strike was copied, modified, and included in the toolset used in attacks against targets of all kinds. For example, recently Cobalt Strike was used as part of both the SolarWinds supply-chain attack [3] and the ransomware attacks against Colonial Pipeline [4]. The tool is so popular that there are Telegram channels and GitHub repositories dedicated to obtaining or producing modified, pirated copies of the Cobalt Strike software [5].

Given its “dual nature” and wide adoption by both sides of the security battlefield, it is not surprising that security teams struggle to develop detection approaches to identify instances of Cobalt-Strike-related traffic, and, in particular, traffic associated with the command-and-control channel to compromised hosts [6].

One of the challenges associated with the detection of Cobalt Strike command-and-control traffic is the lack of large-scale datasets that can be leveraged for machine learning or statistical analysis. To this end, we developed a grammar-based approach to the generation of Cobalt Strike configurations and an infrastructure for the automated collection of traffic samples.

We believe that this approach could be useful to other practitioners.

A Brief Cobalt Strike Tutorial

Cobalt Strike has a client-server architecture, in which several users (e.g., the members of the red team performing the attack) connect to a Team Server using the Aggressor client application.

The Team Server is the host that directly attacks the target network, and acts has a command-and-control component and team collaboration tool. Multiple Team Servers might be leveraged during an attack so that various activities (from phishing to initial compromise, to lateral movement) can be associated with different pieces of the infrastructure.

One of the most important components of the Cobalt Strike framework is the Beacon component. This component is installed on a host as part of the initial breach process to maintain control of the compromised host. In practice, this often means that an initial exploit (e.g., a remote heap overflow against a service running on the target host) causes the execution of a stager, which is a small piece of code whose task is to download a bigger component and execute it (by saving it to a file and running it, or by injecting this second component in the memory of an existing process). This bigger component, in many cases, is Beacon.

Beacon’s task is to call back to a specified Team Server and ask for commands to be executed. If a command is received, the command is carried out and the results of the command are sent back to the Team Server.

One of the important features of the Cobalt Strike framework is that it allows for the creation of Beacon components that use a variety of different techniques for “calling back home”, or “beacon out”. For example, a Beacon component can be configured to use HTTP or DNS to reach its C2 host, and to use a low-frequency beacon to effectively hide among benign network traffic (this is often referred to as a “low-and-slow” connection style). Of course, a Beacon component also allows interactive access to the compromised host, if necessary.

In addition to client-to-server C2 communication, Cobalt Strike provides a form of “peer-to-peer” beaconing, in which a compromised host infected with a Beacon component can use another compromised host (also infected with a Beacon component) to eventually reach the external C2 server. This Beacon-to-Beacon communication happens using SMB named pipes and can be structured hierarchically.

In addition to different styles and frequencies for beaconing, Cobalt Strike implements the concept of “Malleable C2”. The framework defines a domain-specific language so that one can customize the information exchanged as part of the beacons. For example, one might specify that the information needs to be Base64 encoded and prepended with a specific string. The capability of modifying the C2 communication almost arbitrarily has two main advantages: On the one hand, one can make the communication “blend in” with benign traffic to avoid detection; on the other hand, one can mimic known malware or adware beacons so to deceive security tools into classifying the traffic as a known (and possibly low-risk) threat.

Malleable C2 Configs

To automatically generate configurations for the communication between a Beacon component and the Team Server, we modeled the domain-specific language used to specify the C2 interaction as a grammar. To this end, we used the gramfuzz tool [7], which allows for the generation of instances of a grammar in a randomized fashion.

The gramfuzz tool is a Python module that supports the specification of a document in terms of its components. For example, a configuration file can be express as the composition of a global options section, followed by a section defining the behavior of the HTTP server, followed by a section that focuses on the characteristic of the HTTP GET request, and finally a section about the HTTP POST request configuration, as in the following Python snippet:

Def(“cs-C2Profile”,

CSRef(“cs-global-options”),

CSRef(“cs-http-config”),

CSRef(“cs-http-get”),

CSRef(“cs-http-post”),

cat=TOP_CAT,

sep=”\n”

)

The defined sections are then specified further. For example, the global options are a list of settings:

CSDef(“cs-global-options”,

CSRef(“cs-sample-name”),

CSRef(“cs-sleeptime”),

CSRef(“cs-tcp-port”),

CSRef(“cs-host-stage”),

CSRef(“cs-useragent”),

)

Finally, the value of specific fields can be described in terms of ranges of values. For example, one can specify that the sleeptime field in the global options section can take a value between 1,000 and 30,000:

CSDef(“cs-sleeptime”, CS_Set(“sleeptime”, Q(Int(min=1000, max=30000))), NEWLINE)

Once the grammar for the C2 configuration is specified, one can simply use gramfuzz to create a grammar generator and request the generation of randomized instances of the grammar, possibly specifying the level of recursion for the generation process:

fuzzer = gramfuzz.GramFuzzer()

fuzzer.load_grammar(GRAMMAR_PATH)

fuzzer.gen(cat_group=cat_group, num=num, max_recursion=recursion)

The result is a malleable C2 configuration file, which is then passed to c2lint, a tool distributed as part of the Cobalt Strike framework, that can be used to make sure that a profile has a valid format.

Once the configuration file passes the c2lint tests, it is ready for deployment.

C2 Server and Victim Deployment

To implement a fully automated system, we had to design an infrastructure capable of deploying both a Team Server and a victim host with Beacon installed, record their interaction, and clear the state of the system for the next pair to be deployed. To achieve these requirements, we designed an orchestrator using a worker queue pattern backed by Docker and RabbitMQ.

Conceptually, the system can be divided into four types of components: the C2 queue, the workers, the servers, and the victims.

The C2 queue is a RabbitMQ queue holding the C2 profiles, generated in the previous stage of the pipeline, that needs to be deployed and executed. A worker is a Docker container responsible to pick the profile at the top of the queue and spawn two additional containers: one for the Team Server and one for the victim host.

A reader familiar with Cobalt Strike might have noticed an inconsistency at this point: the victim is deployed inside a Linux container, but Cobalt Strike payloads are only available for Windows. To solve this problem, we decided to execute the payload using Wine [8], a popular compatibility layer capable of running Windows executables on POSIX-compliant systems. To avoid any possible side effects introduced by differences between the Wine network stack and the Windows native one, we compared network traces for the same type of interaction generated using both systems: After carefully comparing several of these traces we didn’t detect any notable differences.

Another problem we faced was implementing a system that would make the Team Server and the victim host interact automatically. For this purpose, we implemented a set of scripts using the Aggressor scripting language directly provided by Cobalt Strike. This language allows the user to extend Cobalt Strike functionality and interact with the framework in a programmatic manner. For example, it is possible to listen to specific Beacon events and react accordingly by sending commands back to the victim or generate new artifacts, such as Beacons.

Specifically, we use these capabilities to coordinate the communication exchange between the victim and the Team Server by using the following events:

  • ready: fired when a client is connected to the Team Server and it’s ready to act. We react to this event by generating a new Beacon executable and by creating a listener for it.
  • beacon_initial: fired when the Beacon contacts the server for the first time. We react to this event by sending the first command to the client (directory listing in our experiment).
  • beacon_output_ls: fired when the Beacon output of the ls command is received by the server. We react to this event by sending an echo command that tells the victim to end the experiment.
  • heartbeat_5m: ping event automatically fired every five minutes. We react to this event by sending an echo command that tells the system to end the experiment. This act as a safeguard in case the event beacon_output_ls never reaches the server due to some error.
  • beacon_output: fired when the Beacon output is received by the server. We react to this event by sending an exit signal to the victim. This will close the communication and the infrastructure will start the cleanup process.

Note that we focused only on HTTP-based C2 communication (ignoring DNS-based and SMB-based configuration), as this is the most-used form of communication observed in the wild.

We capture the traffic using an instance of tcpdump deployed on each Team Server container. To make the process scalable and repeatable, we had to embed the startup and tear down of the tcpdump service directly into the lifecycle of the victim and the server.

A simple interaction can be summarized as follows:

  1. The victim host waits for the Team Server to be ready;
  2. The victim host sends a command to generate the malicious payload to the Team Server;
  3. The victim host downloads the malicious payload (Beacon) from the Team Server;
  4. The victim host sends a command to start the tcpdump service to the Team Server;
  5. The victim host infects itself with the payload;
  6. The interaction between Team Server and victim host is carried out;
  7. When the experiment is finished, the malicious payload stops, and the victim host sends a command to tear down the tcpdump service to the Team Server;
  8. The pcap file containing the collected traffic is moved to a shared storage volume.

An experiment can terminate for two reasons. The first reason is that the Team Server receives as a Beacon output a special custom token “END_EXPERIMENT”. The second reason is that the execution times out after ten minutes as a measure to enforce the cleaning of stuck containers.

Using this procedure, we were able to collect a dataset containing several hundreds of thousands of traffic samples.

Conclusions

Automatically generating samples of Cobalt Strike traffic is important for approaches that require large datasets, such as those using machine learning techniques.

While some approaches for the automated generation of C2 configurations exist (see for example C2concealer [9]), we propose a systematic approach based on a grammar-based fuzzer, which can generate very complex C2 configurations, combined with a scalable experimental harness that allows for the automated generation of network traces.

We are making the produced dataset available to other researchers on demand.

Please contact threat-intelligence-team@groups.vmware.com for further information.

Bibliography

[1] “Cobalt Strike,” [Online]. Available: https://www.cobaltstrike.com/.
[2] J. Platt, “Inside a TrickBot Cobalt Strike Attack Server,” 2020. [Online]. Available: https://labs.sentinelone.com/inside-a-trickbot-cobaltstrike-attack-server/.
[3] DHS, “Emergency Directive 21-01: Mitigate SolarWinds Orion Code Compromise,” 13 December 2020. [Online]. Available: https://cyber.dhs.gov/ed/21-01/.
[4] FireEye, “Shining a Light on DARKSIDE Ransomware Operations,” 11 May 2021. [Online]. Available: https://www.fireeye.com/blog/threat-research/2021/05/shining-a-light-on-darkside-ransomware-operations.html.
[5] R. Gold, “Threat Actors Use Of Cobalt Strike: Why Defense Is Offense’s Child,” [Online]. Available: https://www.digitalshadows.com/blog-and-research/threat-actors-use-of-cobalt-strike-why-defense-is-offenses-child/.
[6] Fox-IT, “Identifying Cobalt Strike team servers in the wild,” 26 February 2019. [Online]. Available: https://blog.fox-it.com/2019/02/26/identifying-cobalt-strike-team-servers-in-the-wild/.
[7] d0c-s4vage, “gramfuzz,” [Online]. Available: https://github.com/d0c-s4vage/gramfuzz. [Accessed 21 June 2021].
[8] A. Julliard, “Wine,” 21 June 2021. [Online]. Available: https://www.winehq.org/.
[9] J. Leon, “Introducing C2concealer: a C2 Malleable Profile Generator for Cobalt Strike,” 23 March 2020. [Online]. Available: https://fortynorthsecurity.com/blog/introducing-c2concealer/.
[10] FortyNorthSecurity, “C2concealer,” 23 March 2020. [Online]. Available: https://github.com/FortyNorthSecurity/C2concealer.
[11] Pallets, “Jinja,” 15 May 2021. [Online]. Available: https://jinja.palletsprojects.com/en/3.0.x/.
[12] d0c-savage, “gramfuzz,” 12 December 2016. [Online]. Available: https://github.com/d0c-s4vage/gramfuzz.
[13] MTA, “Malware Traffic Analysis,” 15 May 2021. [Online]. Available: https://www.malware-traffic-analysis.net/.