Have You Heard About Adversarial Machine Learning Yet?

Being involved in the software supply chain security project The Update Framework for nearly two years now, I naturally wonder about vulnerabilities. As I began  browsing the machine learning projects in the open source community space, my first reaction was, “What about security?” Yes, machine learning (ML) is already in use for developing new tools to strengthen software security, but what about securing the machine learning systems themselves?

It turns out that the problem is well-known in the scientific community with research dating back to 2004 and a growing list of already published Adversarial Papers. Real world attacks are already possible since machine learning systems are not in the research labs anymore but broadly deployed in all types of software. Then what are we – the practitioners – waiting for?

New Attack Vectors

ML systems are not only susceptible to the known software threats, but they also introduce a whole new set of attack vectors. The number of new attacks discovered is growing to the extent that The MITRE Corporation, the author of the MITRE ATT&CK® framework, developed an additional knowledge base called MITRE ATLAS™ (Adversarial Threat Landscape for Artificial-Intelligence Systems) which “enables researchers to navigate the landscape of threats to machine learning systems.”

The term ‘adversarial machine learning’ was coined to describe the malicious efforts to mislead an ML system. The attacks are classified in one of four categories: evasion, poisoning, model stealing (extraction), and data inference.


In an evasion attack, the ML model is tricked into misclassifying data. A particularly clandestine illustration of an evasion attack are the so-called “adversarial examples” typical for computer vision applications. These are input images specially crafted by an attacker by adding distortions big enough to force the ML model to incorrectly classify the image but nearly imperceptible by a human. Figure 1 shows an image from the ImageNet dataset, correctly classified as “koala” and misclassified as  “weasel” after an adversarial attack.

Figure 1. Adversarial example, HopSkipJumpAttack

The evasion attacks happen during model deployment (inference) and the attacker does not need access to the training data. Depending on the access to the model, the attack can be white-box or black-box. For those of you who don’t know the difference between the two, during a white-box attack, the adversary has full access to the model’s  internal structure and parameters while during a black-box attack the model is seen as a “black box”, only its inputs and outputs can be observed.


Poisoning attacks the ML model training data. The adversary can modify, remove, or add new training data which leads to incorrect predictions on real world data when the model is deployed.

Model Extraction

As the name suggests, during a model extraction or stealing, the target of the attacker is learning the structure and the parameters of a confidential model. A typical example is trying to copy a proprietary stock trading model.


Data inference attack is where the adversary does not have access to the training data but can probe the model trying to infer any sensitive data on which the model is trained, thereby implanting serious privacy ramifications for the people or companies whose data was breached.

Introducing Adversarial Robustness Toolbox (ART)

In the meantime, ML researchers are continuously developing defenses against these attacks, as well as the tools to apply both when developing and deploying ML models.

One such tool is the Adversarial Robustness Toolbox (ART), a Python library for Machine Learning Security.

“ART provides tools that enable developers and researchers to evaluate, defend, certify, and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, scikit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video, etc.) and machine learning tasks (classification, object detection, generation, certification, etc.).” 

– Github

By utilizing the toolbox, a ML engineer can test the robustness of their model against all the listed types of attacks, as well as try applying some of the known defenses for strengthening their systems.

Another cool thing about ART is that it’s open source, hosted by the Linux Foundation’s LF AI & Data so everyone can use it, improve it, and share their progress with the community.

Go Ahead and Kick Its Tires

In a future blog post, we will be showing you how to use ART to test your ML model with some of the popular attacks and defenses. Until then, check out the example notebooks and give it a try yourselves!

Additional reference:

Vulnerability Disclosure and Management for AI/ML Systems: A Working Paper with Policy Recommendations

Stay tuned to the Open Source Blog and follow us on Twitter for more deep dives into the world of open source contributing.


Leave a Reply

Your email address will not be published.