Supply-chain attacks can be so destructive that they are often considered black-swan events. Often, the most upsetting aspect of the attack is that it manages to compromise what is normally deemed to be safe by definition — whether that’s a software component or an MSP (managed service provider). The result is that our understanding of perimeters, security boundaries, and/or best practices is often flipped upside down.
Consider, for example, the SolarWinds attack back in December 2020: disguised as a normal software update, attackers managed to implant a pre-crafted backdoor on thousands of customers, which led many frantic security teams to discover that their network perimeter had already been breached several months before. Another (and even more destructive) attack took place in July 2021: by exploiting a vulnerability in Kaseya VSA servers, attackers managed to infect hundreds of MSPs, which in turn deployed the REvil ransomware to thousands of customers, breaking the assumption of a safe boundary between different IT infrastructures.
Fast forward to October 2021. An innocent bug report alerted the entire NPM developer community that a core open-source library had been hacked. Fortunately, the community quickly handled and fixed the issue. But, had it not been detected, the potential impacts of this incident left security professionals aghast at the way modern CI/CD pipelines can actually become targets. In this blogpost, we further analyze this incident, and show that regardless of adopted best practices, sometimes the devil really is in the details: when open-source libraries are compromised, it’s extremely difficult to limit and control the damage.
The Attack
The compromised software library was ‘ua-parser-js’ (https://www.npmjs.com/package/ua-parser-js). It’s a JS (JavaScript) library used to decode the user agent employed by web browsers; but it can also detect operating systems, or even devices, making it possible for an application to shape the user experience. At the time of this writing, there were over one thousand applications or libraries (1238 to be exact) dependent on this JS library — from CyberChef, a web app that allows testing of cryptographic primitives within a browser, to FBJS, a set of common JS utilities used by Facebook.
Table 1: The JS library version history.
Version | Timestamp | |
0.7.28 | 2021-04-10 14:42:47 | Clean version |
0.7.29 | 2021-10-22 12:15:21 | Malicious version |
0.8.0 | 2021-10-22 12:16:06 | Malicious version |
1.0.0 | 2021-10-22 12:16:19 | Malicious version |
0.7.30 | 2021-10-22 16:16:08 | Clean version |
0.8.1 | 2021-10-22 16:23:53 | Clean version |
1.0.1 | 2021-10-22 16:26:19 | Clean version |
The attacker managed to compromise the credentials of the lead developer and pushed, unbeknownst to everybody, a changeset updating all major versions (see Table 1). It took the NPM community four hours to realize that the attack had happened, help the lead developer secure his account, and push an update neutering the attack.
The changeset consisted of two additional preinstall scripts, a batch script for Windows, and a Bash script for Linux and macOS. The scripts would download either “jsextension” (https://www.virustotal.com/gui/file/ea131cc5ccf6aa6544d6cb29cdb78130feed061d2097c6903215be1499464c2e/detection) or “jsextension.exe” (https://www.virustotal.com/gui/file/7f986cd3c946f274cdec73f80b84855a77bc2a3c765d68897fbc42835629a5d5) from 185.173.36[.]219, a host based in Russia, and execute it. The Linux executable is just a variant of XMRig, a well-known cryptominer, while the Windows binary also exhibits some info-stealer capabilities to harvest passwords and credentials. Both executables have already been analyzed at length (see Figure 1 and Figure 2 for the analysis overview provided by NSX Defender) by many other security researchers. (A detailed technical analysis is available at https://blog.sonatype.com/npm-project-used-by-millions-hijacked-in-supply-chain-attack.)
While the attacker could have chosen to modify the JS library itself and thereby infect the end-user, the cybercriminal instead opted to add a script that executed at installation time. This was a significant choice, as it confined the execution of the malicious payload to only those endpoints where the command “npm install” normally executes: (1) the NPM developers’ workstations, and (2) CI/CD server(s) in data centers. This partly explains the choice of such malware: info-stealers (and cryptominers) for the workstations, naturally full of private and confidential data, and cryptominers for data centers, where CPU resources can be easily abused; only ransomware would have been more destructive.
In this specific incident, aside from a handful of reportedly infected workstations, the attacker did not manage to leverage this library to establish a foothold in a data center. However, even though the malicious library was not immediately pulled, one would think that publishing a new library version that simply worked — and allowed all dependant projects to quickly work around the issue, with the notable exception of applications pinning the malicious version of the library — is not an optimal solution. While technically correct, the method employed by NPM to resolve dependencies could have potentially made the problem much trickier to solve.
It’s a Dependency Problem
When developing a piece of software, it’s rare to find an instance in which no additional libraries are required. On the contrary, developers want to develop modular components so that they can be reused across different projects. The job of maintaining the list of dependencies — and the required versions of those dependencies — is orthogonal to the programming language and is often handled by a system component called a package manager.
The most commonly used package manager for JavaScript software is NPM. To specify dependencies, a developer needs to write a file named “package.json” and defineproject metadata along with the required libraries. When choosing which versions to depend on, there are two main strategies:
- Pin the dependency by specifying the version number.
- Specify that updates to the library are accepted by using the “^” (minor update) and “>=” (major update) operators.
While the first option may be required in some settings (it’s useful for compliance as it allows reproducible builds), pinning a specific version discards any updates, even non-API-breaking security ones. For this reason, it is often advisable to at least allow for minor updates, in order to quickly catch up when upstream critical bugs are fixed.
Balancing these two options depends on many factors, but it is safe to assume that more sensitive environments favor the first alternative, and they can then opt to regularly review available updates (a bit like Linux distributions do). Unfortunately, the issue is that, regardless of which policy is chosen, it is not possible to enforce such a policy for libraries that themselves have multiple dependent libraries. In other words, once we specify that a piece of software depends on a specific version of a library “A”, if there is also another library “B” that depends on this very same library “A”, there is no explicit way for a developer to enforce that library “B” will use the same version they decided to depend on for their software project.
At the same time, a library developer might find this behavior perfectly sound: after all, correctness and functionality of a library can only be guaranteed if the set of dependencies used at development/testing time is the same as that used at deployment time (a fact that is painstakingly understood by system administrators). When there is a security breach, however, there is a clear scenario in which this type of enforcement may actually be desirable.
Consider the aftermath of the “ua-parser-js” incident: when developers started updating their own dependencies, they immediately searched for other libraries that were also dependent on the vulnerable version of “ua-parser-js” (https://github.com/faisalman/ua-parser-js/issues/536#issuecomment-949826851). This problem is known as “enforcement of transitive dependencies”, and while there are utilities designed to handle this specific scenario (https://github.com/rogeriochaves/npm-force-resolutions), NPM does not offer by default a way to enforce library versions across dependencies (unlike YARN, another JavaScript package manager, which provides a functionality known as “selective dependency resolutions” (https://classic.yarnpkg.com/lang/en/docs/selective-version-resolutions/). Fortunately, one of the advantages of an open-source community is that it is relatively easy to build enough momentum whenchanges are required, and some developers promptly created an RFC (https://github.com/npm/statusboard/issues/343) to implement this specific functionality. One can only hope approval will be swift as well!
Conclusions
Supply-chain attacks are most feared because they subvert our assumptions: perimeters, security boundaries, and best practices may not be enough to prevent an attack from unfolding. When the supply chain attack affects the software development life cycle (SDLC), the complexity introduced by modern open-source frameworks can further exacerbate the challenges facing defenders when securing data centers. Even if in this instance a rogue update of a JavaScript library led to limited damage that could easily be remediated, it’s troubling to see how CI/CD servers can so easily be converted to a new attack vector and become potential entry points for sophisticated attacker groups.