This summer, co-maintainer of Python-TUF Lukas Pühringer and I delivered a talk on implementing PEP 458 to secure PyPI Downloads at Europython 2022 Dublin, the most prominent Python Community event in Europe.
The PEP 458 focuses on communication between PyPI and users, and consists of implementing repository signing with The Update Framework (TUF), a flexible framework and specification that will help to protect the PyPI repository against attackers.
“TUF was designed under the assumption that integrating projects would already have a software update system that its community was looking to supplement with better security features – as in the case of PyPI,” co-maintainer Joshua Lock notes in his blog Enhancing Software Update Security with TUF.
Warehouse and PyPI
PyPI is the Python Package Index, the principal community repository for Python software. Many worldwide developers are using Python. Sometimes libraries/applications are developed by individual developers that could be reused. PyPI is a repository for publishing it. Therefore it has become vital to have for the ecosystem. The Warehouse is the underlying technology that powers PyPI which provides navigation and organization of the assets within the PyPI. With more than 380,000 projects and 600,000 users, it’s a big infrastructure and serves more than two billion requests and 900 terabytes of data daily.
Securing Package Distributions
With so many projects and users, PyPI is a very attractive target for attackers. Attackers use repository and account compromises to upload malicious packages that often perform harmful activities on systems which install the package, leading to security issues in released applications that can reach project and software installations. With unknown, hidden and sometimes nefarious components, project outcomes are not reliable without steadfast security.
This project’s aim is to add more protection for developers and users by implementing PEP 458 (the minimal TUF design for PyPI), which also acts as a building block for maximum security. This framework provides an easy onramp for future enhancements, such as the already proposed PEP 480, which adds optional developer signing of packages.
Python runs everywhere: servers, containers, PCs, IoT, etc. There is also a distribution platform and a client available for developers and users alike. Warehouse is the platform, and pip is the client.
The design goal is to have a desired package delivered every time a pip install <package> is executed.
The Integration Journey
Following the recent release of python-tuf 1.0.0, I opened a pull request (PR) in the Warehouse that implements the initial TUF setup in the Warehouse development environment. The new python-tuf package added an API to manage the Metadata for PEP 458 or any other complex system upgrade, which otherwise would not have been feasible. A practical example is loading only the required metadata roles instead of the entire repository metadata for a single role operation; in the Warehouse scenario, loading the entire repository metadata would mean loading and parsing over 16,000 JSON files.
During my journey, I first learned about TUF, gained a fundamental understanding of PEP 458, and explored further into the Warehouse architecture and how it works. From an old PR that attempted to implement TUF to Warehouse, I began refactoring and building a new design which includes significant changes in the python-tuf. In the past, the repository was opinionated by the library. Now there is flexibility provided by the API to handle the Metadata in a distributed way. This provides a lot more versatility in the software flows for developers who would like to customize their applications. The new pull request was a big step, but the initial PR was so big and was not the entire implementation—long story short, not easy for the Warehouse maintainers to review.
New Developments
Recently I updated the PR with python-tuf 2.0.0. The significance of this new benefit is using a feature TAP15 called Succinct hash bin delegations. We reduced many lines of code in the Warehouse and more importantly, the size of the client downloadable repeat metadata (pip users, for example). Some extra contributions from the TUF team also helped simplify the repository implementation in the PR.
Lukas Pühringer is working on a Design Document to complement the PEP 458 and help reviewers. After this PR merge, we aim to have small pull requests to help both sides – the implementation team and the maintainers.
Stay tuned!
Stay tuned to the Open Source Blog and follow us on Twitter for more deep dives into the world of open source contributing.