Python, like JavaScript, is one of the most popular programming languages on the planet. And just like JavaScript is used in 98% of all global websites, Python is at the heart of today’s AI and Machine Learning explosion being driven by ChatGPT, Stable Diffusion, and others. As such, it’s no wonder that Python has been a prime target for bad actors looking to exploit weak links in vendors’ supply chains in order to compromise their software.
Malicious Packages in PyPI

For example:

  • August 2022 – Python’s “secretslib” package was found to covertly hijack Linux machines in order to run cryptominers.
  • October 2022 – dozens of Python packages, any of which once imported into a development environment, were found to be stealing data using W4SP.
  • December 2022 – a PyTorch build was subject to a dependency confusion attack when it pulled in the compromised PyPI version of its torchtriton dependency rather than the local version.
  • March 2023 – when setup.py was run during the installation of dozens of Python packages, they connected to an external URL in order to download malicious code.

Because supply chain attacks can target any stage of the software development lifecycle, finding (and fixing) weak links can seem like a never ending task. This blog will show you how to address three of the most commonly exploited vectors of supply chain attack:

  • On Import – malicious exploits present in Python code imported into the organization
  • On Install – setup.py can be hijacked to install malicious code
  • On Build – non-local resources can compromise locally built Python artifacts

Importing Python Securely

Python code is generally imported into an organization in one of two ways:

  • As source code
  • As a prebuilt package

In order to ensure security, the latter method should always be avoided, especially if the prebuilt package contains binary code, which makes it impossible to even just view the code to make sure it’s clean. 

Following best practices, source code should be staged in a repository of some kind, and scanned to identify malicious code. As an additional precaution, consider quarantining any newly released code that generates a warning. The Python Package Index (PyPI) is extremely efficient at removing compromised packages in a timely manner, which means you may be able to rely on PyPI to do your work for you. 

Import Workflow

If you lack a repository for staging, you could do worse than using a GitHub repository and their automated scanning capabilities. Alternatively, the ActiveState Platform incorporates an ingestion pipeline that ensures all Python source code is vetted for security and integrity before being considered for inclusion in our catalog or our quarantine zone for subsequent follow up.

Installing Python Securely

As noted above, setup.py is a common vector of attack for Python packages since it can automatically execute malicious code, or else attempt to include remote malicious resources during package installation. In other words, simply running pip install or pip download can be enough to compromise your system.

Luckily, pip defaults to installing wheels, which don’t require setup.py. The threat arises when no wheel is available for the target operating system. Of course, prebuilt wheels pose their own problems, as discussed in the previous section. 

To ensure Python is installed securely, you’ll need to build all of your dependencies from source code in an environment that has no connection to a public network. But this kind of dependency vendoring can be complex and expensive in terms of time and resources, especially for smaller organizations. 

Alternatively, you can use the ActiveState Platform to automatically build your Python dependencies from source code. During the builds process, setup.py is run in a hermetically sealed container, which has no access to an external network. Installation on your local machine is done using our CLI tool, the State Tool, which does not run setup.py. In this way, you can completely avoid setup.py threat vectors.

Building Python Securely

Building Python securely means ensuring (at least) two key things:

  • Vetted Source Code – builds can only be as secure as the code that goes into them.
  • Reproducibility – if the same “bits” input don’t always result in the same “bits” output, there’s no guarantee the artifacts you’re working with haven’t changed from build to build.

Unfortunately, reproducibility is rarely implemented due to the complexity associated with creating deterministic builds. For example, ActiveState’s State of Supply Chain Security survey of more than 1500 organizations big and small across the globe showed that only ~22% of respondents could claim build reproducibility.

Reproducibility by Org

To create reproducible builds you’ll need to:

  • Ensure all code required for a build has been vetted and is present locally to avoid the threat of dependency confusion.
  • Ensure all build environments are ephemeral, isolated and hermetically sealed to avoid inclusion of potentially malicious remote resources. 
  • Discard the build (and all interim artifacts) if the hash of any artifact does not match the expected result.

Alternatively, the ActiveState Platform always creates reproducible builds, or else fails safe if it discovers that build integrity has been compromised. 

Conclusions – Mitigating Python Supply Chain Vulnerabilities

Securing the Python software supply chain is more possible than ever since new tools (like SBOMs and Software Attestations), as well as services like the ActiveState Platform can help organizations ensure the integrity and security of the Python code they import, build and install.

But most organizations continue to be focused on Python vulnerabilities, which is arguably the strongest link in the Python supply chain since, according to Snyk, 87% of Python vulnerabilities have a known fix. 

Additionally, according to the latest Veracode State of Open Source Software Report:

  • 25% of Python vulnerabilities are fixed in less than five hours.
  • 50% of flaws are addressed in the same hour they are reported.

In order to truly secure the Python supply chain, the focus needs to expand to include the import, build and installation processes, as well.

Next steps:

Watch our webinar in conjunction with Techstrong, where we explain in more detail the “3 Steps to Software Supply Chain Security Success.”

Read Similar Stories

Jupyter in Excel

Getting Started on the Journey to a Secure Software Supply Chain

Learn how to recognize the first stage of your Software Supply Chain Security Journey, and how to overcome the anarchy that characterizes it.

Learn more >

Data Visualization in Jupyter

Introducing SLSA 1.0: Securing the Code You Import & Build

The SLSA 1.0 specification provides verifiable controls and best practices to help you secure your software supply chain. Learn how. 

Learn More >

Top 10 ML Packages

How to Update Your Python Version Without Risk

Python codebases are rarely updated due to time constraints, complexity & fear of breaking the build. Learn how to overcome these pains.

Learn More >