Binary scanners inspect and analyze binary code in order to identify open source components, security vulnerabilities and additional sensitive information. But why would you use a binary scanner instead of just scanning the source code?
The question of whether binary scanners work just as well as source code scanners has been a long running debate. Of course, in some cases, you may not have access to source code in the first place, leaving you with no choice but to use a binary scanner. In other cases, source code scanning may be a better option. At any rate, no matter which approach you take, the goal is always to reduce security risk.
Some of the pros of binary vs source code scanners include:
- The ability to evaluate and analyze the production version of your app, as well as third-party applications.
- The ability to discover exploitable pathways through the code, such as a database query that originates as an HTTP request and reaches the database unsanitized or unvalidated.
- The ability to potentially identify vulnerabilities introduced by the compiler itself.
Some of the cons of binary vs source code scanners include:
- Depending on the size/complexity of the application, binary scans can be time consuming, slowing down CI/CD cycles.
- The need to disassemble, de-obfuscate, decompile and otherwise reverse engineer compiled code, which can only provide an approximation of the original source code.
- Binary analysis relies on modeled datatypes, data and control paths that only approximate the actual reality.
In other words, binary scanners make assumptions about how the application works. For example, without access to source code, they tend to prolifically generate false positives based on simple things like remote code calls. As a result, new applications (or changing applications) will require extensive resources to perform manual inspection of the scanner’s findings. While some vendors have trained/AI-trained their scanner on voluminous data to reduce their rate of false positives, the question still remains: why do we continue to rely on binary scans, especially when it comes to open source packages?
The answer seems to be that the vast majority of organizations import prebuilt open source packages, rather than source code. In order to determine whether those packages are fit for use (i.e., comply with security, licensing and code quality requirements), organizations rely on binary scanning. But open source package source code is almost always available (there are some exceptions: I’m looking at you, older versions of Java!), meaning that organizations don’t necessarily need to rely on binary scanners.
However, importing source code instead of prebuilt binary packages would require a seismic shift in thinking, software development processes and DevOps effort. Is it worth it? Let’s find out in this blog, which focuses on whether binary code scanning is the best approach when it comes to identifying threats in open source software.
How Binary Scanners Work
Most binary scanners provide both basic and advanced capabilities, starting with a simple scan to establish the software’s contents, and progressing to more advanced analyses that model the flow of data through the application. These processes typically don’t require any need to reverse engineer the code.
But binary scanners can look deeper by decompiling the code in order to identify components, and can perform pattern matching to detect security flaws in order to generate a security report that may even provide advice on how to remediate any issues found.
Unfortunately, pattern matching means binary scanners are reliant on previously identified malware signatures or known malicious behaviors since they cannot fully parse the code nor understand the application’s file structure. This dependency means most binary scanners (and most source code scanners, as well, for that matter) are ineffective against new malware or sophisticated variants embedded in open source packages.
Further, the larger/more complex the software package is, the more complicated things get. For example, exercising all the data flows in a complex package can be extremely time consuming, but the scanner can’t continue indefinitely, meaning some flows are left unplumbed. This can give a false sense of security, since some scans may be more or less complete/accurate than others.
More importantly, binary scanners treat remote resources such as property files, environment variables, operating system arguments, etc, as suspect, which will inevitably result in the generation of an abundance of false positives, all of which will need to be investigated either by your development team (slowing code delivery) or a dedicated security team, further adding to their alert fatigue.
Keep in mind that binary scanners that claim to support a wide range of languages typically provide excellent results only for a subset. If your organization’s tech stack contains more than one or two languages, you may need to deploy multiple binary scanners to ensure wider coverage. Of course, this will result in even more alerts being generated, all of which need to be investigated.
Binary Scanner Alternatives
ActiveState takes a novel approach to ensuring the open source code in your application is safe to use: we don’t just help you secure your software supply chain, we are your software supply chain.
ActiveState’s enterprise offering works only with the source code of open source packages, importing and scanning that source code rather than using prebuilt/precompiled binaries in a process I explained in a previous blog, GitHub’s Malicious Repo Explosion & How to Avoid It.
In a nutshell, we scan and vet the imported source code, and then build it securely using a SLSA Build Level 3 Continuous Integration (CI) service that not only generates reproducible builds, but also avoids threats that might be introduced during the build process.
“Outsourcing” your open source supply chain in this way means you can recover the time and resources wasted on binary scans because:
- ActiveState imports the source code for your project according to your requirements, eliminating the need to source and vet open source packages yourself.
- ActiveState scans the source code and quarantines any that are problematic, eliminating the time and resources you spend investigating alerts.
- ActiveState securely builds your open source components and packages them as a runtime for Windows, Mac and Linux, reducing the threat of supply chain attacks.
ActiveState is DevSecOps done right, giving you back time and resources so your development and security teams can focus their efforts on your code – not open source code.
Conclusions: The Role of Binary Scanners
Binary scanners still have a role to play for those organizations that want to verify the security of their built software applications prior to delivery to their customers. This is not something ActiveState can currently help with, but we can take a lot of the heavy lifting off your plate when it comes to securing your open source from supply chain attacks:
- Reduce alert fatigue, which is a key contributor to security risk when issues go uninvestigated.
- Reduce the risk of malware by securely building your open source packages from vetted source code.
- Reduce the risk of undiscovered threats during your CI/CD cycle by eliminating the need to perform time-consuming binary scans on anything but the final product.
Some vendors have begun training their binary scanners using AI in order to reduce the amount of false positives they generate. While there are still drawbacks with speed, malware identification and data flow exploits, it does help address alert fatigue, which as 3CX found to their detriment is a key contributor to cybersecurity risk.
Next Steps:
Watch our webinar Achieving the Impossible: 3 Steps to Minimize Risk & Reap the Benefits of Secured Open Source