Before we start: This Python tutorial is a part of our series of Python Package tutorials.
Scikit-learn is an open source machine learning library for Python. You have a number of options when it comes to installing scikit-learn, depending on your needs:
- If you don’t have Python installed, you can install scikit-learn as part of a Python distribution, such as ActivePython.
- If you already have Python and prefer to install pre-built binaries, you can install scikit-learn by simply running the following command:
pip install scikit-learn
- Pre-built binaries may contain malicious code, especially if you mistakenly install a typo-squatted version. Instead, consider installing Python libraries from source code. The simplest way to build scikit-learn from source is to use the ActiveState Platform to automatically build and package it for Windows, Mac or Linux.
Scikit-Learn Step by Step Installation
For most users, the best approach is to install the binary version of scikit-learn using an official release from pypi.org, the Python Package Index. You can do so with the following steps:
1. Scikit-learn requires Python 3.6+. To check which version of Python you have installed, run the following command:
python3 --version
The output should be similar to:
Python 3.8.2
2. If you have a valid Python version you can run the following command to download and install a pre-built binary of scikit-learn:
pip install scikit-learn
The following dependencies will be automatically installed along with scikit-learn:
- NumPy 1.13.3+
- SciPy 0.19.1+
- Joblib 0.11+
- threadpoolctl 2.0.0+
Alternatively, if you already have scikit-learn and/or any of its dependencies are already installed, they can be updated as part of the installation by running the following command:
pip install -U scikit-learn
You can verify your Scikit-learn installation with the following command:
python -m pip show scikit-learn
The output should be similar to:
If you want to create plots and charts based on the data you use in scikit-learn, you may also want to consider installing matplotlib. For information about matplotlib and how to install it, refer to ‘What is Matplotlib in Python’?
How to Import Scikit-Learn in Python
Once scikit-learn is installed, you can start working with it. A scikit-learn script begins by importing the scikit-learn library:
import sklearn
It’s not necessary to import all of the scitkit-learn library functions. Instead, import just the function(s) you need for your project. For example, to import the linear regression model, enter:
from sklearn import linear_model
Or try:
from sklearn.linear_model import LinearRegression
The following tutorials will provide you with step-by-step instructions on how to work with machine learning Python packages:
Get a version of Python, pre-compiled with Scikit-Learn and other popular ML Packages
ActiveState Python is the trusted Python distribution for Windows, Linux and Mac, pre-bundled with top Python packages for machine learning – free for development use.
Some Popular ML Packages You Get Pre-compiled – With ActiveState Python
Machine Learning:
- TensorFlow (deep learning with neural networks)*
- scikit-learn (machine learning algorithms)
- keras (high-level neural networks API)
Data Science:
- pandas (data analysis)
- NumPy (multidimensional arrays)
- SciPy (algorithms to use with numpy)
- HDF5 (store & manipulate data)
- matplotlib (data visualization)
Get ActiveState Python for Machine Learning for Windows, macOS or Linux here.
Why use ActiveState Python instead of open source Python?
While the open source distribution of Python may be satisfactory for an individual, it doesn’t always meet the support, security, or platform requirements of large organizations.
This is why organizations choose ActiveState Python for their data science, big data processing and statistical analysis needs.
Pre-bundled with the most important packages Data Scientists need, ActiveState Python is pre-compiled so you and your team don’t have to waste time configuring the open source distribution. You can focus on what’s important–spending more time building algorithms and predictive models against your big data sources, and less time on system configuration.
ActiveState Python is 100% compatible with the open source Python distribution, and provides the security and commercial support that your organization requires.
With ActiveState Python you can explore and manipulate data, run statistical analysis, and deliver visualizations to share insights with your business users and executives sooner–no matter where your data lives.
Download ActiveState Python to get started or contact us to learn more about using ActiveState Python in your organization.