What is Yolo?
YOLO, or You Only Look Once, represents a renowned collection of object detection models that have significantly influenced real-time object detection and classification in the domain of computer vision. As one-stage object detection models, the YOLO family employs a unique methodology—processing entire images in a single forward pass of a convolutional neural network (CNN).
What sets YOLO apart is its singular detection approach, a design tailored for both real-time detection and high precision. In this article, we will use YOLO V5 model to detect objects in a given image. It is one of the best computer vision model for object detection.
Running Object Detection Models: GPU vs CPU
Executing object detection models requires intricate calculations to correctly identify and position objects in images or videos. Deciding to use a GPU (Graphics Processing Unit) or a CPU (Central Processing Unit) to run these models significantly affects their speed and efficiency. Each processor has unique benefits and limitations, which influences the choice based on the particular needs of the task.
Advantages of Using GPU for Running Object Detection Models:
- Parallel Processing Power: GPUs are designed to handle thousands of threads simultaneously, enabling parallel processing. This helps in performing similar computations on different parts of the image concurrently. This parallelism significantly accelerates the model’s inference time, leading to faster results.
- Significant Memory Bandwidth: GPUs have extensive memory bandwidth, providing them with the ability to transfer data quickly between memory and processing units. This attribute is essential for object detection activities that demand fast data collection and processing, leading to lowered latency and enhanced real-time performance.
- Complex Model Execution: Object detection models can be computationally demanding due to their intricate architectures, involving multiple layers and parameters. GPUs are exceptional in managing these intricate models and their intense computations, providing efficient and prompt outcomes.
Drawbacks of Using GPU for Running Object Detection Models:
- Cost: GPUs tend to be more expensive than CPUs, both in terms of hardware acquisition and power consumption. The initial investment might be a significant factor for individuals or organizations with budget constraints.
- Energy Usage: Despite GPUs delivering outstanding performance, they use more energy than CPUs. This heightened energy demand can lead to a rise in operational expenses, particularly when executing resource-heavy tasks over long durations.
- Compatibility and Portability: GPUs require compatible hardware and software frameworks to function optimally. This might limit the portability of GPU-accelerated solutions, making them less suitable for environments with diverse hardware configurations.
- Restricted Access: In scenarios where the use of GPUs could be limited or unavailable, for instance, in remote or cloud-based configurations, depending solely on GPU acceleration may not be practical.
CPUs are more cost-effective and versatile but might struggle to provide the same level of performance for resource-intensive tasks like object detection. This is the problem we are trying to address in this article. We will perform object detection on a Google Colab CPU.
What is DeepSparse?
DeepSparse, is an amazing Python library by Neural Magic, renders superior CPU performance. It takes advantage of model sparsity by omitting zeroed parameters to limit compute during forward passes. It offers a blend of GPU-like velocity and uncomplicated software handling.
DeepSparse supports vertical scaling up to hundreds of cores, with Kubernetes or Serverless options and offers straightforward APIs for model integration and production monitoring.
Implementing Object Detection in Google Colab
Firstly open up a Colab notebook and initiate a CPU runtime.
Install DeepSparse Library
!pip install deepsparse[server,yolo,onnxruntime]
Next let’s download an image that we will use as input for the YOLO V5 model. You can also upload your own images.
!wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg
Create DeepSparse Pipeline
from deepsparse import Pipeline from PIL import Image, ImageDraw images_path = ["basilica.jpg"] # specify model stup for YOLO V5 model_stub = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_96" # create Pipeline yolo_pipeline = Pipeline.create( task="yolo", model_path=model_stub, )
You can find the model stub of YOLO V5 at https://sparsezoo.neuralmagic.com/.
Inference: Object Detection
In this tutorial, we are using a single image as input, but you can easily use the same code for multiple images as well.
# run inference on input image, receive bounding boxes + classes pipeline_outputs = yolo_pipeline(images=images_path, iou_thres=0.6, conf_thres=0.001)
Create Bounding Boxes for Detected Objects
Let’s define a Python function to create bounding box for a given object using its coordinates. We can retrieve the coordinates of all the detected objects in the input image from the ‘pipeline_outputs’ object.
def draw_bounding_box(img, coords): # Create a drawing object draw = ImageDraw.Draw(img) # Draw the bounding boxes for box in coords: x_min, y_min, x_max, y_max = box draw.rectangle([x_min, y_min, x_max, y_max], outline="red", width=2) return img
Now we will load the same image using PIL library and draw bounding boxes for the first five objects detected by YOLO V5 model.
# Load the image input_image = Image.open('basilica.jpg') # plot image with bounding boxes draw_bounding_box(input_image, pipeline_outputs.dict()['boxes'][:5])
You can find the labels of the detected objects by using the code below
# print list of labels print(pipeline_outputs.dict()['labels']) # get label of the 5th object detected by YOLO V5 print(pipeline_outputs.dict()['labels'])
The lable to class mapping is given below:
0: person 1: bicycle 2: car 3: motorcycle 4: airplane 5: bus 6: train 7: truck 8: boat 9: traffic light 10: fire hydrant 11: stop sign 12: parking meter 13: bench 14: bird 15: cat 16: dog 17: horse 18: sheep 19: cow 20: elephant 21: bear 22: zebra 23: giraffe 24: backpack 25: umbrella 26: handbag 27: tie 28: suitcase 29: frisbee 30: skis 31: snowboard 32: sports ball 33: kite 34: baseball bat 35: baseball glove 36: skateboard 37: surfboard 38: tennis racket 39: bottle 40: wine glass 41: cup 42: fork 43: knife 44: spoon 45: bowl 46: banana 47: apple 48: sandwich 49: orange 50: broccoli 51: carrot 52: hot dog 53: pizza 54: donut 55: cake 56: chair 57: couch 58: potted plant 59: bed 60: dining table 61: toilet 62: tv 63: laptop 64: mouse 65: remote 66: keyboard 67: cell phone 68: microwave 69: oven 70: toaster 71: sink 72: refrigerator 73: book 74: clock 75: vase 76: scissors 77: teddy bear 78: hair drier 79: toothbrush
We can apply the same technique on a video as well. First break the video into frames and then apply YOLO V5 on each frame to detect the objects of interest.