What is Cloudera and Cloudera Machine Learning?
Cloudera specializes in data management by offering a comprehensive platform for big data and analytics that includes various software and services to help organizations store, process, analyze, and manage their data more effectively.
Cloudera Machine Learning (CML) is a platform offered by Cloudera that focuses on enabling organizations to develop, train, and deploy machine learning models at scale. It is designed to simplify the end-to-end machine learning lifecycle and make it more accessible for data scientists, engineers, and other stakeholders. CML provides the convenience of conducting machine learning development within a cloud-based environment. However, it falls short of implementing essential security measures required to ensure the utmost protection of your projects (i.e. a secure software supply chain).
Secure Python for ML Development
Open source software design is often hailed for its inherent safety, thanks to the fact that the source code is accessible to anyone in its human-readable form. This transparency allows the global community to scrutinize the code for vulnerabilities to maintain its integrity. However, challenges arise when developers compile this source code into machine language, creating “binaries”. These binary files are extremely challenging to dissect once assembled, making them a fertile ground for concealing malicious software. Failure to secure your supply chain can lead to the introduction of insecure products into the market.
Within the Python ecosystem pre-compiled binaries are often combined with human-readable Python code to create “wheels.” These wheels are often assembled by the Python community and distributed through public repositories. These publicly available, open source wheels have increasingly become a vehicle for distributing malware.
A significant number of organizations that develop their own software rely heavily on open source technologies, like binaries and wheels, to build their machine-learning applications. This presents security challenges in the veracity and authenticity of the expected open source technologies.
How ActiveState can help
To tackle these security challenges, the ActiveState Platform steps in as a secure factory, facilitating the creation of Cloudera ML Runtimes. This Platform automates the process of building Python from thoroughly vetted source code. In doing so, it adheres to the highest standards of Supply-chain Levels for Software Artifacts (SLSA) while providing essential tools for monitoring, maintaining, and verifying the integrity of open source components within your software stack. Your ActiveState runtime provides attestations, SBOMs, and secure software artifacts all wrapped in a Docker image you can import into CML
Cloudera and ActiveState strongly believe that open source security and innovation can coexist. Unlike other ML platforms which rely solely on insecure public sources for extensibility, Cloudera customers can now enjoy supply chain security across the entire open source Python ecosystem by using ActiveState’s existing Platform service.
Securing your Cloudera runtime with ActiveState
With the ActiveState Platform, you can easily generate ML Runtimes to securely extend your Cloudera Machine Learning (CML) environment with the latest Data Science and Machine Learning tools and frameworks.
With the introduction of Cloudera’s Powered by Jupyter (PBJ) ML Runtimes, integrating Runtimes built on the ActiveState Platform into Cloudera Machine Learning (CML) is an easy process. You can use the ActiveState Platform to build a customized ML runtime that can then be integrated into CML. This integration paves the way for more streamlined management, enhanced observability, and a robust and secure software supply chain, ensuring the continued reliability of your ML applications.
For a more detailed explanation of how to use Cloudera with ActiveState, visit our documentation.
By incorporating ActiveState into their process, CML customers can be confident that their AI projects are secure from concept to deployment by leveraging the ActiveState Platform.