Threat Intelligence hugely revolves around the craft of handling multiple threat intelligence feeds. Whether they simply contain a stream of malicious indicators of previous attacks or, for example, a painstakingly assembled list of Cobalt Strike team servers found in the wild, steady consumption of threat intelligence feeds (and disposal of degraded indicators) is the cornerstone of a sound security posture for thousands of companies.
As is usually the case, as soon as a need to consume (and produce) data is established, the first challenge is to agree on the best practices to encode the information and how to exchange it. Challenges at this stage are so common, that there are now timeless jokes to describe this exact process (see XCKD), making the phase of defining a new standard often a discouraging endeavor, especially when trying to achieve global adoption (if not at least consensus).
Best practices surrounding threat intelligence, however, have had the time to grow and mature over the course of the past 10 years, and even though no international body stood up to the task of regulating the production and ingestion of threat intelligence feeds, a few fitting standards, like MISP or STIX/TAXII, have emerged to supersede CSV or, even worse, Excel files.
In this blog post, we focus on threat feeds implemented in MISP format and show how to process them in an easy and lightweight manner. We also introduce a library developed by VMware TAU aimed at easing feed consumption and generation.
MISP Feeds
A MISP threat intelligence feed is nothing more than a collection of MISP events, objects, and attributes, as defined in the MISP standard. While feeds distributed as CSV files are straightforward to understand and process (they are often just a list of time-indexed indicators of some sort), encoding a stream of indicators in MISP requires becoming comfortable with some levels of indirection.
A MISP feed is a collection of MISP events, each event containing a collection of indicators of compromise. Each time the feed is updated, a new MISP event (containing new indicators of compromise) is added to the MISP feed. Indicators are not added to the MISP event as simple strings (as it is the case in a CSV feed); instead, they are encoded as MISP attributes or as MISP objects for more complex data types, such as files. This means that updating a threat feed with a new batch of indicators requires (1) creating a new MISP event, and (2) adding to the newly created event a MISP attribute or object for each indicator.
From a file system perspective, a MISP feed is made of at least three different components:
- A manifest file, named “manifest.json”.
- A CSV file, named hashes.csv.
- Additional event files (in JSON format), one for each event.
The manifest file “manifest.json” acts as a header of the whole feed, providing a time-indexed list of MISP events (and, therefore, of JSON files) belonging to the feed; this is basically an index to be used at consumption time to understand which files need to be downloaded. In Figure 1 one can see the manifest in the top left corner, and the referenced event on the right-hand side.
Figure 1: A basic example of a MISP feed.
The JSON file encoding the event itself has in turn all the information related to the published indicators; this means that retrieving all updates added to the feed since an arbitrary point in time requires parsing the manifest and downloading only the JSON files related to events that are tagged in the manifest with a date and timestamp matching the original query. Figure 1 shows where the date string is encoded (the “timestamp” field is omitted for simplicity), and how the UUID of the MISP event is used as a file name for the JSON file containing the event details.
The last component of a well-formed threat feed is the “hashes.csv” file; its purpose is to act as a direct mapping between indicators (in their hashed form) and the JSON file to which they belong. The benefit is that the operation of verifying whether an indicator is part of a feed is really fast, and it further allows retrieving the whole event (likely to contain more metadata) if required.
A basic implementation of the logic required to do a one-off conversion of all MISP events from a running instance into a threat intelligence feed is available here.
Word of advice: the underlying logic does not handle incremental updates and/or MISP objects, but it provides nevertheless a starting point to understand the format.
Limitations
While the standard is quite mature, and the number of different attributes and objects that MISP supports is ever-growing if not already exhaustive, there is no pre-determined way to implement the logic required to produce or consume a periodically updated threat feed (unlike with STIX/TAXII, where TAXII specifies how updates are exchanged).
While this might be considered a mere (!) implementation detail, if the source of a MISP feed is not a MISP instance where the threat intelligence data is already modeled in a compatible fashion, there are also additional challenges when converting arbitrary feeds into MISP-compliant threat feed.
For example, MISP objects (often used to represent complex data types such as files) do not support tags, which in turn are often used to encode labels. At the same time, if MISP attributes are chosen to represent multiple hashes of the same file, there is no way to link all those attributes together without resorting to MISP objects.
To overcome this and other limitations, VMware TAU decided to release “feed-manager-for-misp,” a set of utilities and classes written in Python to ease the handling of feeds. More precisely the provided code supports the following tasks:
- Generation and consumption of threat feeds in MISP format.
- Handling of incrementally updated threat feeds.
- Storage and retrieval of threat feeds from S3 buckets.
- Encoding of complex data types (file objects) into MISP entities.
- Managing tags when processing MISP objects.
Feed Manager for MISP
While the source code is available on GitHub developers can also install the package using pip. We decided to keep the number of dependencies to a minimum, making “pymisp” an optional dependency, required only when generating a threat feed. The underlying rationale is to keep consuming a feed as simple and as lightweight as possible.
The code base is organized into three logically separated modules:
- “consumer.py” contains all classes to consume a threat feed; consumption can be done from three different sources, a local directory, a remote HTTP server, or an S3 bucket.
- “generator.py” is the twin module and is used to implement the logic to incrementally update a threat feed. As it was the case with the previous module, the storage layer is abstracted so that it is possible to write directly to local directories as well as S3 buckets, if required.
- “translator.py” is the latest module and the two classes ‘TagUtils’ and ‘IndicatorTranslator’ contain a selection of utility methods to handle and process MISP entities.
The project also includes two different examples showcasing how to consume and generate a threat feed (executables “consume_feed.py” and “generate_feed.py”). Figure 2 and Figure 3 show the pseudo-code lying at the core of both examples.
Figure 2: Python pseudo-code to generate a threat feed.
In both cases accessing the underlying storage is made abstract by providing the required storage class (the module “storage.py” contains classes to read and write into S3 buckets as well as relying on the local file system).
Figure 3: Python pseudo-code to consume a threat feed.
The pseudo-code displayed in Figure 3 showcases how retrieving just the last 7 days of data is straightforward, minimizing the amount of data that needs to be transferred each time.
Conclusions
Threat intelligence feeds are the cornerstone of the security posture of an organization. Whether used for detection or research, in production, or in a SOC, they are the basic blocks of how threat intelligence is shared and made actionable. In this blog post, we explained how MISP feeds are implemented and what the most common limitations and challenges are. We also published a new library to ease generating and consuming threat intelligence feeds in MISP format.