Command-and-control (C2) frameworks serve as a means to remotely manage and access compromised devices. They allow for the creation of various payload types, called implants, that are dropped on victim machines by attackers, enabling them to retain access and control over the infected victim.
While legitimate penetration testing utilizes C2 frameworks to evaluate system security and identify potential attacks, cyber-criminals have also taken advantage of these tools for malicious purposes. The likes of Cobalt Strike, Metasploit, and Brute Ratel have become increasingly popular in breaching enterprise networks.
For a better understanding of how C2 frameworks operate, refer to Figure 1, which presents a simplified scenario of a compromised machine.
Figure 1: How C2 frameworks operate.
Appropriate firewall configurations and robust endpoint protection systems can aid in preventing scenarios like this, but attackers can customize implants to increase the likelihood of flying under the radar. For instance, attackers can modify network traffic to resemble legitimate communication with the C2 server or modify the binary footprint of the implant through a parametric generation process. However, the possibility to create implants using various configuration options can be leveraged by security researchers to create a vast dataset of samples and study their invariants.
This blog post builds on the aforementioned idea and aims to explore the potential of exploiting the polymorphic capabilities of these implants to create a large dataset of samples. This dataset will then be analyzed using a machine learning pipeline to identify any invariants that can be used to improve our defenses against these attacks.
To achieve this goal, we developed a framework, called C2F2, that abstracts the underlying C2 framework and automatically generates C2 implants with randomized options by iteratively changing the settings used at configuration time. This process is repeated for each C2 framework, leading up to a diversified yet representative dataset of malicious implants. By analyzing this large collection of samples, security professionals can also gain insights into the tactics, techniques, and procedures (TTPs) used by threat actors to develop and deploy these implants.
C2F2: A C2 Framework Framework
The C2 frameworks targeted by C2F2 are the following:
Before diving into the implant generation process, we had to take several preparatory steps to ensure that the process can run smoothly and effectively. Firstly, we had to understand the set of possible options and respective values for each C2 framework. Secondly, it was necessary to understand how to instrument and interact with each C2 framework to generate the implant.
Interacting with various C2 frameworks can be a challenging task due to their differing interfaces. While some frameworks, like Sliver, offer user-friendly command-line interfaces with multiple options, others can only be queried via bespoke mechanisms; for example, Cobalt Strike can only be interacted with using the Aggressor Script language. Additionally, Brute Ratel and Covenant proved to be the most challenging. The former required us to fully reverse-engineer the communication protocol used by the C2 server and the implant, while the latter, despite having various functionality exposed via a RESTful API, required us to create the missing implant generation functionality ourselves.
Specifically, the Brute Ratel’s protocol requires the client to perform authorization twice: the first time by sending the login and the password via an HTTP POST request, and then the second time with the token, received from the first authorization attempt, sent via a newly established WebSocket channel. After successful authorization, the WebSocket connection is used for client-server communication where the client sends commands (e.g., “create a badger profile with the following parameters”) and the server replies with status codes and the additional data (e.g., with a Base64-encoded payload) that the client might have requested. Both channels (the initial POST request and the WebSocket connection) are JSON-based and use HTTPS as a transport layer.
Covenant, on the other hand, has a documented set of APIs, that include functions to create implants. The APIs are based on JSON and use HTTPS as a transport layer. While most of the functions indeed generate implants, some of them (e.g., the function to generate a .NET executable implant) return no payload with the standard reply. The payload is generated but never returned because that functionality is not implemented. To overcome this limitation, the code of the framework was amended to store the generated payload on disk.
Once the options for each implant type had been identified, we used a domain-specific language (DSL) to express them in a format that can be easily consumed by the implant-generation process. This required careful consideration and planning to ensure that the process is efficient and scalable. Finally, we implemented an algorithm that given the grammar of one of the implant configurations expressed with our DSL, can generate random configurations that are consistent with it. These configurations are then used to generate the implants. This approach can be used to generate a large number of implant variations, each tailored to specific target environments, and can be used to test the effectiveness of various detection and defense strategies and to generate signatures or detection procedures.
We designed our framework based on two key properties of the problem we needed to solve. Firstly, generating a single implant can be a time-consuming process that may take several minutes to complete due to the complexity of the steps involved; for example, in the case of Cobalt Strike, creating a Malleable C2 profile, starting a Cobalt Strike server, using Aggressor Scripts to generate the implant binary, and waiting for the result to be produced require several minutes. Secondly, the process of generating one implant is independent of other implants, meaning that multiple implants can be generated simultaneously in parallel, which can significantly increase the efficiency of the process. To address these two factors, we ensured that our framework was capable of handling asynchronous long-lasting jobs and designed it to be easily parallelizable.
Our infrastructure consists of four key components that work together to facilitate the process of generating implants:
- Generator: This component retrieves the correct grammar based on the specified C2 framework type and generates a random implant configuration. The configuration is then stored in the designated storage backend for future use.
- Submitter: Once an implant configuration is available, the submitter creates a job and sends it to the appropriate queue based on the configuration type.
- Receiver: This component pulls jobs from the queue and sets up the worker to generate the corresponding implant. Once the job is completed, the receiver collects the result.
- Worker: Each worker is specialized in generating implants for a specific C2 framework. Given an implant configuration, the worker generates the corresponding implant. By dividing the workload across multiple workers, our infrastructure is able to generate multiple implants in parallel, significantly reducing the time required to generate large numbers of implants. Overall, these components work seamlessly together to automate the implant generation process and improve the efficiency of our system.
Figure 2: C2F2 architecture for the generation of implant at scale.
As shown in Figure 2, the workflow for generating implants using the C2F2 system can be broken down into the following steps:
- The user chooses the C2 framework type and the number of implants to be generated and initiates the process.
- The generator component retrieves the appropriate grammar and generates the given number of random implant configurations, which are then stored in the selected storage backend.
- The submitter creates a job for each generated configuration and sends it to the appropriate queue based on its type.
- The receiver pulls jobs from the queue and sets up the appropriate specialized worker.
- The worker reads the configuration file and, using the appropriate C2 framework, generates the implant.
- Once the worker has generated the implant, the receiver collects the result and stores it in the designated storage backend.
- Steps 4, 5, and 6 are repeated until the queue is empty.
Alternatively, the user can provide a custom implant configuration, in which case the submitter sends the job directly to the appropriate queue and the process continues as usual.
Overall, this process ensures that implant generation is standardized and customizable, making it more efficient and effective.
Generating a vast number of valid C2 implant configurations was a top priority when designing the C2F2 implant generation system. To achieve this goal, we aimed to significantly minimize the need for manual intervention. This involved just one initial step in our exploratory phase, where researchers from our team read each C2 framework specification and encoded the set of all possible configuration types in our domain-specific language. Thanks to having the models expressed in a DSL, we could generate configuration files that were valid by default, adhering to the grammar encoded in the model. Furthermore, the model (showcased in Figure 3) also allows us to verify any external configurations, guaranteeing that only valid configurations entered our system.
Figure 3: Generation and validation of models in C2F2.
To encode our configuration models, we decided to utilize Pydantic, a Python library for data validation that uses the Python type system. Our decision to use Pydantic was based on multiple factors. Firstly, it allowed us to encode the models directly using Python’s syntax. This made it easier for our team to work with the tool and allowed us to quickly build understanding and expertise. Additionally, Pydantic provides a high degree of flexibility, making it perfect for our needs.
To give an example of how Pydantic can be used, we can examine one of our implant models created for the Shad0w C2 framework. The model is shown in Figure 4.
Figure 4: Model generated for Shad0w.
Pydantic offers more flexibility than just using primitive types when defining our model types. For example, it allows us to have more control over the value range for a given field, and it also supports more complex types such as enumeration and custom types. After defining our models, Pydantic allows us to obtain the schema definition as a Python dictionary, which is then parsed and interpreted as the model grammar by our generator. Furthermore, Pydantic also provides a method for us to validate a random JSON against the model schema, ensuring that the generated configs adhere to the specifications.
A concrete example of a generated valid configuration file for Shad0w can be observed in Figure 5. The generated configuration validates against the Shad0w model schema and is interpretable by our system.
Figure 5: Valid configuration generated for Shad0w.
In this blog post we presented C2F2, a framework designed to instrument existing C2 frameworks. We show how it is possible to leverage C2F2 to generate a large dataset of implants by leveraging the configuration options provided by the selected C2 frameworks. Generating a large number of implants is the first building block of any pipeline designed to analyze and behaviorally detect backdoors at scale. While we are working on releasing the framework to the public by mid-2023, we hope that providing an early preview can foster further discussions and feedback on the topic from the community.