What is Yolo?
YOLO, or You Only Look Once, represents a renowned collection of object detection models that have significantly influenced real-time object detection and classification in the domain of computer vision. As one-stage object detection models, the YOLO family employs a unique methodology—processing entire images in a single forward pass of a convolutional neural network (CNN).
What sets YOLO apart is its singular detection approach, a design tailored for both real-time detection and high precision. In this article, we will use YOLO V5 model to detect objects in a given image. It is one of the best computer vision model for object detection.
Running Object Detection Models: GPU vs CPU
Executing object detection models requires intricate calculations to correctly identify and position objects in images or videos. Deciding to use a GPU (Graphics Processing Unit) or a CPU (Central Processing Unit) to run these models significantly affects their speed and efficiency. Each processor has unique benefits and limitations, which influences the choice based on the particular needs of the task.
Advantages of Using GPU for Running Object Detection Models:
- Parallel Processing Power: GPUs are designed to handle thousands of threads simultaneously, enabling parallel processing. This helps in performing similar computations on different parts of the image concurrently. This parallelism significantly accelerates the model’s inference time, leading to faster results.
- Significant Memory Bandwidth: GPUs have extensive memory bandwidth, providing them with the ability to transfer data quickly between memory and processing units. This attribute is essential for object detection activities that demand fast data collection and processing, leading to lowered latency and enhanced real-time performance.
- Complex Model Execution: Object detection models can be computationally demanding due to their intricate architectures, involving multiple layers and parameters. GPUs are exceptional in managing these intricate models and their intense computations, providing efficient and prompt outcomes.
Drawbacks of Using GPU for Running Object Detection Models:
- Cost: GPUs tend to be more expensive than CPUs, both in terms of hardware acquisition and power consumption. The initial investment might be a significant factor for individuals or organizations with budget constraints.
- Energy Usage: Despite GPUs delivering outstanding performance, they use more energy than CPUs. This heightened energy demand can lead to a rise in operational expenses, particularly when executing resource-heavy tasks over long durations.
- Compatibility and Portability: GPUs require compatible hardware and software frameworks to function optimally. This might limit the portability of GPU-accelerated solutions, making them less suitable for environments with diverse hardware configurations.
- Restricted Access: In scenarios where the use of GPUs could be limited or unavailable, for instance, in remote or cloud-based configurations, depending solely on GPU acceleration may not be practical.
CPUs are more cost-effective and versatile but might struggle to provide the same level of performance for resource-intensive tasks like object detection. This is the problem we are trying to address in this article. We will perform object detection on a Google Colab CPU.
What is DeepSparse?
DeepSparse, is an amazing Python library by Neural Magic, renders superior CPU performance. It takes advantage of model sparsity by omitting zeroed parameters to limit compute during forward passes. It offers a blend of GPU-like velocity and uncomplicated software handling.
DeepSparse supports vertical scaling up to hundreds of cores, with Kubernetes or Serverless options and offers straightforward APIs for model integration and production monitoring.
https://github.com/neuralmagic/deepsparse
Implementing Object Detection in Google Colab
Firstly open up a Colab notebook and initiate a CPU runtime.
Install DeepSparse Library
!pip install deepsparse[server,yolo,onnxruntime]
Import Image
Next let’s download an image that we will use as input for the YOLO V5 model. You can also upload your own images.
!wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg
Create DeepSparse Pipeline
from deepsparse import Pipeline
from PIL import Image, ImageDraw
images_path = ["basilica.jpg"]
# specify model stup for YOLO V5
model_stub = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_96"
# create Pipeline
yolo_pipeline = Pipeline.create(
task="yolo",
model_path=model_stub,
)
You can find the model stub of YOLO V5 at https://sparsezoo.neuralmagic.com/.
Inference: Object Detection
In this tutorial, we are using a single image as input, but you can easily use the same code for multiple images as well.
# run inference on input image, receive bounding boxes + classes
pipeline_outputs = yolo_pipeline(images=images_path, iou_thres=0.6, conf_thres=0.001)
Create Bounding Boxes for Detected Objects
Let’s define a Python function to create bounding box for a given object using its coordinates. We can retrieve the coordinates of all the detected objects in the input image from the ‘pipeline_outputs’ object.
def draw_bounding_box(img, coords):
# Create a drawing object
draw = ImageDraw.Draw(img)
# Draw the bounding boxes
for box in coords:
x_min, y_min, x_max, y_max = box
draw.rectangle([x_min, y_min, x_max, y_max], outline="red", width=2)
return img
Now we will load the same image using PIL library and draw bounding boxes for the first five objects detected by YOLO V5 model.
# Load the image
input_image = Image.open('basilica.jpg')
# plot image with bounding boxes
draw_bounding_box(input_image, pipeline_outputs.dict()['boxes'][0][:5])
You can find the labels of the detected objects by using the code below
# print list of labels
print(pipeline_outputs.dict()['labels'][0])
# get label of the 5th object detected by YOLO V5
print(pipeline_outputs.dict()['labels'][0][4])
The lable to class mapping is given below:
0: person
1: bicycle
2: car
3: motorcycle
4: airplane
5: bus
6: train
7: truck
8: boat
9: traffic light
10: fire hydrant
11: stop sign
12: parking meter
13: bench
14: bird
15: cat
16: dog
17: horse
18: sheep
19: cow
20: elephant
21: bear
22: zebra
23: giraffe
24: backpack
25: umbrella
26: handbag
27: tie
28: suitcase
29: frisbee
30: skis
31: snowboard
32: sports ball
33: kite
34: baseball bat
35: baseball glove
36: skateboard
37: surfboard
38: tennis racket
39: bottle
40: wine glass
41: cup
42: fork
43: knife
44: spoon
45: bowl
46: banana
47: apple
48: sandwich
49: orange
50: broccoli
51: carrot
52: hot dog
53: pizza
54: donut
55: cake
56: chair
57: couch
58: potted plant
59: bed
60: dining table
61: toilet
62: tv
63: laptop
64: mouse
65: remote
66: keyboard
67: cell phone
68: microwave
69: oven
70: toaster
71: sink
72: refrigerator
73: book
74: clock
75: vase
76: scissors
77: teddy bear
78: hair drier
79: toothbrush
We can apply the same technique on a video as well. First break the video into frames and then apply YOLO V5 on each frame to detect the objects of interest.