How to Build Trustworthy AI with Open Source

Artificial intelligence (AI) is a set of technologies that enables machines to behave intelligently, maybe even better than humans. AI has left the research laboratories and is already transforming every aspect of our lives: the way we communicate, entertain, work, do business and even how we live and think. Automation via AI systems can deliver great economical and social benefits and even promises to help solve global challenges. And it’s just getting started. But with the growing adoption of AI, comes the risk of being wrong.

With great power comes great responsibility​

As we apply AI at a massive scale, we need to ensure we promote reliable, robust and trustworthy solutions. Since this has always been our primary goal in building software systems, what is different now? AI differs from traditional software in that it doesn’t receive clear instructions on what to do; instead, through cleverly derived algorithms, it discovers statistically significant patterns from past data. In other words, data determines how AI systems behave.

This predictive AI enables us to create even more powerful applications. Now we are asking AI systems new kinds of questions. But our questions are sometimes subjective, controversial or don’t have a single right answer. The answers provided are probabilistic and when they are also low-risk like “You will likely enjoy these movies”, it’s not catastrophic if they miss the mark. Conversely, when the outcomes involve our health, justice, autonomous driving, etc., the margins of error are impactful. Technologists should apply AI with caution. We cannot directly transfer our human and moral dilemmas to the machines without supervision. How secure and trustworthy are AI systems and the data they rely on?

Ethical problems

As AI systems heavily depend on data, we must be careful to ensure we’re consistently feeding it high-quality and bias-free data. Here are some real-world examples:

  • New employee selection processes often include AI systems that are meant to provide a simplified, bias-free method to filter credentials and narrow the field of candidates. Amazon had been using such AI systems, which were later found to be biased against women. The system taught itself that male candidates were preferable for technical jobs. ​
  • Inappropriate or indistinct data labeling or unbalanced data sometimes causes detrimental algorithmic outcomes. Raw data is sifted and labeled to provide meaningful context for training machine learning models. ImageNet is an image database with millions of images that are publicly available for research and educational use. Programmers responsible for identifying a photo of a bird, car or flower were found to introduce unnecessary bias. For example, a picture with a young man drinking beer was categorized as an “alcoholic.” In addition, ImageNet users figured out that the database was unbalanced in gender and skin color.
  • In the U.S. healthcare industry, an algorithm helps hospitals and insurance companies identify which patients may benefit from additional “high-risk care management” programs. The algorithm has been found to be unjust toward Black patients. The main reason was that one of the input parameters used to train it was the patient’s past healthcare spending. It was later proven that evaluating past costs of healthcare had absolutely no bearing on future prediction of health deterioration. ​

These examples show that ethical problems can arise from a variety of sources. As AI is applied at a massive scale for solving critical problems, we need to be careful to not amplify biases.​ The issues in these examples were unintentional and have been already addressed, but not before they had already skewed the outcome of what would otherwise have been carefully created algorithms.

Explainable AI

Explainable AI is a research field on machine learning (ML) interpretability techniques that aims to understand ML model predictions and explain them in human-understandable terms to build trust with stakeholders. ​Explainable AI is a key part of broader, human-centric responsible AI practices. Interpretable explanations provide audible metadata to regulators in order to trace unexpected predictions back to their inputs to inform corrective actions.

How are we sure that our ML systems are making the right decisions? Often we see ML systems as a black box. The software becomes more powerful and complex but less transparent. ​

Security and privacy

ML systems are not only susceptible to known software threats, but they also introduce a whole new set of attack vectors. The term “adversarial machine learning” was coined to describe the malicious efforts to mislead an ML system. The so-called adversarial examples could mislead AI systems and cause dangerous situations.​ An attacker could cause a classifier to interpret a slightly modified physical stop sign as a “speed limit 45” sign. ​The perturbation could be a set of black and white stickers that an adversary can attach to a physical road sign or to mimic graffiti​.

Stop signs
Source: Cornell University.

In another example, eyeglass frames were used to impersonate a popular celebrity, demonstrating the possibility of physically realizable attacks to impersonate an identity or evade the face recognition system. Social platforms use AI to prohibit uploading videos or images that contain violence. With the use of adversarial attacks, users can overcome these restrictions.

Glasses Picture
Source: ACM Digital Library.

Open source software: Community​, collaboration​, common knowledge base

Once faced with new challenges, it is time to design solutions. One possible practice is for any company incorporating AI in its product line or internal tools to work on the problems independently. This has the benefit of creating custom solutions for propriety markets and works well on a smaller scale. But achieving a trustworthy AI is an overwhelming task and working within a community greatly increases the opportunities for success. ​The open source software community provides collaboration through building and sharing a common knowledge base. T​he most widely used machine learning frameworks are, in fact, open source projects and the needs of the community drive their development. Tensorflow, PyTorch and Kubeflow are a few of the active projects.

A map to explore open source projects​

How do we find our way amongst the many existing projects and ecosystems? As any other software, open source software needs governance. The Linux Foundation AI & Data is one such organization that hosts and promotes the collaborative development of open source projects related to AI. Besides tracking the hosted projects, LF AI & Data also maintains an interactive landscape of noteworthy AI projects grouped in several big categories. Anyone can request a project be part of the interactive map if it covers basic criteria.

A yet smaller category in the landscape is trusted and responsible AI. It includes an overview of the most popular open source projects in the three subcategories. Explainability, bias and fairness, and adversarial correspond to the main challenges discussed above. Depending upon which principle you work toward, you can compare and select the most appropriate open source project for your needs. Besides the technical capabilities of the projects, you can evaluate the project’s health by relying on metrics such as the number of contributors, recent commits and releases, and the project’s license.

Stepping into the future

AI systems pose a myriad of new challenges. Problems related to building, deploying and maintaining machine learning models have given rise to a new discipline called MLOps (ML operations). Novel varieties of security vulnerabilities have created AI systems as well as posed new questions on ethics, trust and responsibility. As ML systems become more powerful and complex but less transparent,the need for explainability increases.​ But this also means there are new opportunities for collaboration, research and exploration driving the solutions of complex problems. We look forward to stepping into this new era of AI-driven technologies.

View our talk on trustworthy AI at the Open Source Summit EU 2022.

Stay tuned to the Open Source Blog and follow us on Twitter for more deep dives into the world of open source contributing.


Leave a Reply

Your email address will not be published. Required fields are marked *