Capturing and digitizing images was one of the first tasks tackled by computer vision researchers, with the first scanner being created ~1959. But CV requires a huge amount of data. To understand why, just imagine each image as a large matrix of dots, each of which has its own set of attributes, including color, size, and position in relation to the surrounding dots.
The actual analysis of the contents (i.e., all of the dots) in an image is another intensive task. Models can be designed to recognize distinct components of an image, but they require an extensive library of pre-labeled examples. This task is usually called data labeling, which some of the packages in this post can help with. However, trained models are not good enough if they can’t be used to evaluate non-labeled images, as well. That requires another type of effort to actually distribute and execute applications based on inferences drawn from the model.
Once you’ve digitized the image and recognized the contents, you can then apply image processing techniques to improve the quality, such as:
- Transforming – can include the process of cropping, colorizing, converting, filtering, etc of an image.
- Resizing – generally used to make an image larger (with and without adding information) or smaller.
- Projecting – the process of mapping of a 2D (flat) image onto a 3D (curved) surface.
- Technical Enhancements – such as the process of applying a red-eye reduction filter to older photographs.
Several of the packages listed below include multiple algorithms for modifying captured images, as well as processing them as numerical matrices.
Now, with a little Python, almost all of those titanic tasks can be accomplished with little effort. In addition, the models that are produced can be run over commodity hardware. This article will introduce you to frameworks that simplify building CV applications using different types of devices for executing CV models.
Getting Started with Python Computer Vision
Before you begin, make sure that you’ve installed the Computer Vision Python runtime environment, which contains a version of Python 3.10 and most of the packages in the post installed into a virtual environment, ready to run.
In order to download and install this ready-to-use Python project, you will need to create a free ActiveState Platform account. Just use your GitHub credentials or your email address to register. Signing up is easy and it unlocks the ActiveState Platform’s many other dependency management benefits.
Or you can also use our State tool CLI to install the Computer Vision Python runtime environment:
- For Windows users, run the following at a CMD prompt to automatically download and install the Computer Vision Python runtime and project code into a virtual environment:
powershell -Command "& $([scriptblock]::Create((New-Object Net.WebClient).DownloadString('https://platform.www.activestate.com/dl/cli/911674306.1670279101_pdli01/install.ps1')))" -c'state activate --default Pizza-Team/Computer-Vision'
- For Linux users, run the following to automatically download and install the Computer Vision Python runtime and project code into a virtual environment:
sh <(curl -q https://platform.www.activestate.com/dl/cli/911674306.1670279101_pdli01/install.sh) -c'state activate --default Pizza-Team/Computer-Vision'
Ready? Let’s go.
Python’s Top Computer Vision Packages
While Python is not the only programming language that supports CV, it is the dominant language. However, image processing is extremely compute intensive, which is why many of the Python packages include libraries written in C/C++.
OpenCV
OpenCV, which is currently one of the most popular CV libraries available, is a C++, Python, and Java library that provides a huge number of utilities for processing images, videos, objects, backgrounds, neural networks, and, of course, operations in matrices. It is also compatible with Linux, Android, macOS, and even Windows.
Most suitable for:
- Real-Time image processing
- Face recognition
Advantages:
- Open source
- Large community
- Several image processing, object detection, video processing, and tracking utilities
Limitations:
- Documentation can be sparse
SimpleCV
You don’t need to learn all the formalities or concepts related to computer vision to develop a professional application. SimpleCV abstracts many of these complicated (but fascinating) ideas to provide a computer vision framework that is easy to learn. SimpleCV is compatible with a wide range of input sources, including the often-undervalued Microsoft Kinect.
Most suitable for:
- Application development
Advantages:
- Simplifies image acquisition and processing tasks
- Easy to learn
- Compatible with Kinect
- Simple documentation
Limitations:
- Smaller community than OpenCV
Scikit-Image
The scikit-image library is a scientific approach to computer vision that provides an interesting set of utilities for working with images, transforming them geometrically, and adjusting their contents. This library is a great place to start for people who want to learn about the possibilities of simple algorithms. Its API is consistent with that of its well-known counterpart, scikit-learn.
Most suitable for:
- Learning and experimenting with computer vision concepts/algorithms
Advantages:
- Familiar scikit-learn API definition
- Compatible with OpenCV images
Limitations:
- No object detection utilities out of the box
- No video processing (it’s recommended that you convert video to sequences of images)
TensorFlow
TensorFlow, one of the most flexible machine learning frameworks on the market, has been around since 2015. It provides modeling capabilities that can run over CPU/GPU/TPU as well as specializations that can be deployed in browsers (TensorFlow.js) or on mobile devices (TensorLite). TensorHub contains reusable public models that cover many use cases (not just computer vision).
Most suitable for:
- Deploying models on heterogeneous devices
Advantages:
- Support for several image processing algorithms
- Support for video processing
- Comes close to the “model once, deploy everywhere” model
- Awesome documentation
- Large community
Limitations:
- The two programming models (Graph and Eager) can be confusing to non-experts
- Some API duplications
PyTorch
Another very popular option is PyTorch, which implements several object detection, image estimation, image segmentation, and image classification algorithms. The dynamic computation model makes it flexible, and given that it is based on C++ and CUDA libraries, it’s also fast as well as compatible with CPU/GPU hardware acceleration out of the box.
Most suitable for:
- Deep learning models
Advantages:
- Flexible computation model
- Large number of image processing utilities
- Native GPU acceleration
- Large community
Limitations:
- Steep learning curve
- Limited model execution portability
DeepFace
DeepFace is a niche library with a specific scope, namely face recognition and attribute analysis. It is capable of processing streaming data sources, and can be used as a library or an API. Its utilities can also be complemented with other packages to create a complete suite. DeepFace wraps face detectors from OpenCV, SSD, Dlib, MTCNN, RetinaFace, and MediaPipe.
Most suitable for:
- Face recognition and analysis
Advantages:
- Several state-of-the-art face recognition models
- Strong facial attribute analysis
- Real-Time video analysis
- HTTP API
Limitations:
- No GPU acceleration options
- Small community
- Limited scope
YOLO
You Only Look Once (YOLO) is a specialized object detection system, image segmentation library, and Command Line Interface (CLI) utility. It provides five sizes of pre-trained models (nano, small, medium, large, and extra large) that increase its accuracy. It’s also able to process video in real time.
Most suitable for:
- Object detection
Advantages:
- Model size segmentation
- State-of-the-art object detection models
- Easy to use
- Real-Time support for video
Limitations:
- Limited scope
- Small development community
- Scarce documentation
Detectron2
The Facebook AI Research (FAIR) group created Detectron2. It’s based on PyTorch and aims to provide simplified object detection utilities. It competes with YOLO (kind of), and is being used in several research projects.
Most suitable for:
- Pose prediction
Advantages:
- Specialized models for object detection
- Models can be exported to TorchScript
- Data augmentation capabilities
Limitations:
- Scarce documentation
- Small community
OpenVINO
The Open Visual Inference and Neural Network Optimization (OpenVINO) project is an optimization and deployment framework that wraps external models from other frameworks. It provides object detection, face recognition, colorization, and movement recognition utilities.
Most suitable for:
- Emulating human vision
Advantages:
- Compatible with TensorFlow, PyTorch, OpenCV, and other major machine learning frameworks
- Model security schema
- Large pre-trained model zoo from Intel
Limitations:
- Scarce documentation
- Small community
Albumentations
The most difficult task in machine learning is obtaining good data. It’s common to enrich and augment existing datasets with classification, semantic segmentation, instance segmentation, object detection, and pose estimation. Albumentations is a library that specializes in these types of tasks. It also integrates seamlessly with PyTorch and Keras.
Most suitable for:
- Image augmentation
Advantages:
- Supports keypoints augmentation
- Supports the augmentation of multiple targets
- Integrates with PyTorch and Keras
Limitations:
- Scarce documentation
- Small community
Conclusions – Computer Vision for Python
Computer vision models can be trained to perform a large number of tasks with the support of the open source libraries and frameworks discussed in this article. Many of them are suitable for deploying to commodity hardware and have capabilities that were unimaginable just a few years ago. Computer vision became accessible almost overnight, and the applications are almost endless.
Next steps:
Download the Computer Vision Python environment and try out the packages in this post for yourself.
Read Similar Stories
There are dozens of algorithms available in Python. Learn which one is the most appropriate for your project.
Learn the best set of tools that can help you fast-track several different tasks in the data analysis and Machine Learning (ML) pipeline.
Learn how to build a Generative Adversarial Network to identify deepfake images.