Dependency confusion (also known as dependency repository hijacking, substitution attack, or repo jacking for short) is a software supply chain attack that substitutes malicious third-party code for a legitimate internal software dependency. There are various approaches to creating this kind of attack vector, including:

  • Namespacing – by uploading a malicious software library to a public registry (such as the Python Package Index [PyPI] or JavaScript’s npm registry) that is named similar to a trusted, internally-used library, systems that omit a namespace/URL check may mistakenly pull in the malicious code.
  • DNS Spoofing– by using a technique like DNS spoofing, systems can be directed to pull dependencies from malicious repositories while displaying what looks like legitimate, internalURLs/paths.
  • Scripting– by modifying build/install scripts or CI/CD pipeline configurations, systems can be tricked into downloading software dependencies from a malicious source rather than a local repository.

What is a Dependency Confusion attack?

As originally proposed by security researcher Alex Birsan who was awarded a number of bug bounties for his work, a dependency confusion attack attempts to fool a developer or system into downloading a compromised software dependency from a source external to the organization during an install or build process. One way to accomplish this kind of attack is as follows:

  1. A hacker researches the name of a private package used internally by an organization to develop their software apps.
  2. The hacker creates a similar package, embeds malware, names it the same as the internal package name and sets the version number to be higher than the one discovered through research.
  3. The hacker uploads the malicious package to a public repository.
  4. The next time the package manager requests the private package, it may pull the compromised public package from the open source ecosystem rather than the local repository (for example, pip will default to installing the dependency with the higher version number).

The compromised dependency is typically a clone of the original (to fulfill all functional requirements for use in an application), along with malicious code designed to exfiltrate data, implant a backdoor in the execution environment, or otherwise implement a security threat.

How can I protect against Dependency Confusion?

Preventing dependency confusion exploits is key to improving cybersecurity in general, and software supply chain security in particular. Unfortunately, there is no one solution that can mitigate all potential substitution threats since it affects every programming language’s package manager including JavaScript’s npm, Python’s pip, Ruby’s rubygems, Java’s maven or gradle, and so on. Instead, there are a number of best practices that can be implemented/followed to help manage the risks, including:

  • Utilize Scopes/Namespaces – some package managers allow for namespaces, IDs or other prefixes, which can be used to ensure that internal dependencies are pulled from private repositories (i..e, GitHub) defined with the appropriate prefix/scope.
  • Secure the Build Environment – create a dedicated, locked-down, secure build environment with strict permissions and monitored for vulnerabilities. This will help mitigate the risk that attackers insert malicious dependency paths in build scripts and CI/CD configurations, or pull in remote transitive dependencies during a build step.
  • Validate Hashes/Checksums – wherever possible, validate that a dependency’s checksums match those documented on official package sources. This can be difficult to automate with changing dependencies/versions, but once a definitive set of dependencies is created, you can take advantage of your package manager’s support for lock files and automated hash checking.
  • Vendor Dependencies – rather than pulling dependencies from private registries and public repositories on demand every time an environment is built, reduce the risk of dependency confusion by embedding the source code for all dependencies – internal and external – in your code repository. Package managers can then be configured (and verified) to utilize only a single source for all dependencies. While dependency vendoring is an effective approach, be warned that it can also be quite complex.

Related Links