Fast Object Detection using YOLO V5 on CPU

What is Yolo?

YOLO, or You Only Look Once, represents a renowned collection of object detection models that have significantly influenced real-time object detection and classification in the domain of computer vision. As one-stage object detection models, the YOLO family employs a unique methodology—processing entire images in a single forward pass of a convolutional neural network (CNN).

What sets YOLO apart is its singular detection approach, a design tailored for both real-time detection and high precision. In this article, we will use YOLO V5 model to detect objects in a given image. It is one of the best computer vision model for object detection.

Running Object Detection Models: GPU vs CPU

Executing object detection models requires intricate calculations to correctly identify and position objects in images or videos. Deciding to use a GPU (Graphics Processing Unit) or a CPU (Central Processing Unit) to run these models significantly affects their speed and efficiency. Each processor has unique benefits and limitations, which influences the choice based on the particular needs of the task.

Advantages of Using GPU for Running Object Detection Models:

  1. Parallel Processing Power: GPUs are designed to handle thousands of threads simultaneously, enabling parallel processing. This helps in performing similar computations on different parts of the image concurrently. This parallelism significantly accelerates the model’s inference time, leading to faster results.
  2. Significant Memory Bandwidth: GPUs have extensive memory bandwidth, providing them with the ability to transfer data quickly between memory and processing units. This attribute is essential for object detection activities that demand fast data collection and processing, leading to lowered latency and enhanced real-time performance.
  3. Complex Model Execution: Object detection models can be computationally demanding due to their intricate architectures, involving multiple layers and parameters. GPUs are exceptional in managing these intricate models and their intense computations, providing efficient and prompt outcomes.

Drawbacks of Using GPU for Running Object Detection Models:

  1. Cost: GPUs tend to be more expensive than CPUs, both in terms of hardware acquisition and power consumption. The initial investment might be a significant factor for individuals or organizations with budget constraints.
  2. Energy Usage: Despite GPUs delivering outstanding performance, they use more energy than CPUs. This heightened energy demand can lead to a rise in operational expenses, particularly when executing resource-heavy tasks over long durations.
  3. Compatibility and Portability: GPUs require compatible hardware and software frameworks to function optimally. This might limit the portability of GPU-accelerated solutions, making them less suitable for environments with diverse hardware configurations.
  4. Restricted Access: In scenarios where the use of GPUs could be limited or unavailable, for instance, in remote or cloud-based configurations, depending solely on GPU acceleration may not be practical.

CPUs are more cost-effective and versatile but might struggle to provide the same level of performance for resource-intensive tasks like object detection. This is the problem we are trying to address in this article. We will perform object detection on a Google Colab CPU.

What is DeepSparse?

DeepSparse, is an amazing Python library by Neural Magic, renders superior CPU performance. It takes advantage of model sparsity by omitting zeroed parameters to limit compute during forward passes. It offers a blend of GPU-like velocity and uncomplicated software handling.

DeepSparse supports vertical scaling up to hundreds of cores, with Kubernetes or Serverless options and offers straightforward APIs for model integration and production monitoring.

https://github.com/neuralmagic/deepsparse

Implementing Object Detection in Google Colab

Firstly open up a Colab notebook and initiate a CPU runtime.

Install DeepSparse Library

!pip install deepsparse[server,yolo,onnxruntime]

Import Image

Next let’s download an image that we will use as input for the YOLO V5 model. You can also upload your own images.

!wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg

Create DeepSparse Pipeline

from deepsparse import Pipeline
from PIL import Image, ImageDraw

images_path = ["basilica.jpg"]

# specify model stup for YOLO V5
model_stub = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_96"

# create Pipeline
yolo_pipeline = Pipeline.create(
     task="yolo",
     model_path=model_stub,
)

You can find the model stub of YOLO V5 at https://sparsezoo.neuralmagic.com/.

Inference: Object Detection

In this tutorial, we are using a single image as input, but you can easily use the same code for multiple images as well.

# run inference on input image, receive bounding boxes + classes
pipeline_outputs = yolo_pipeline(images=images_path, iou_thres=0.6, conf_thres=0.001)

Create Bounding Boxes for Detected Objects

Let’s define a Python function to create bounding box for a given object using its coordinates. We can retrieve the coordinates of all the detected objects in the input image from the ‘pipeline_outputs’ object.

def draw_bounding_box(img, coords):
  # Create a drawing object
  draw = ImageDraw.Draw(img)

  # Draw the bounding boxes
  for box in coords:
    x_min, y_min, x_max, y_max = box
    draw.rectangle([x_min, y_min, x_max, y_max], outline="red", width=2)

  return img

Now we will load the same image using PIL library and draw bounding boxes for the first five objects detected by YOLO V5 model.

# Load the image
input_image = Image.open('basilica.jpg')

# plot image with bounding boxes
draw_bounding_box(input_image, pipeline_outputs.dict()['boxes'][0][:5])
YOLO V5 on CPU COLAB

You can find the labels of the detected objects by using the code below

# print list of labels
print(pipeline_outputs.dict()['labels'][0])

# get label of the 5th object detected by YOLO V5
print(pipeline_outputs.dict()['labels'][0][4])

The lable to class mapping is given below:

  0: person
  1: bicycle
  2: car
  3: motorcycle
  4: airplane
  5: bus
  6: train
  7: truck
  8: boat
  9: traffic light
  10: fire hydrant
  11: stop sign
  12: parking meter
  13: bench
  14: bird
  15: cat
  16: dog
  17: horse
  18: sheep
  19: cow
  20: elephant
  21: bear
  22: zebra
  23: giraffe
  24: backpack
  25: umbrella
  26: handbag
  27: tie
  28: suitcase
  29: frisbee
  30: skis
  31: snowboard
  32: sports ball
  33: kite
  34: baseball bat
  35: baseball glove
  36: skateboard
  37: surfboard
  38: tennis racket
  39: bottle
  40: wine glass
  41: cup
  42: fork
  43: knife
  44: spoon
  45: bowl
  46: banana
  47: apple
  48: sandwich
  49: orange
  50: broccoli
  51: carrot
  52: hot dog
  53: pizza
  54: donut
  55: cake
  56: chair
  57: couch
  58: potted plant
  59: bed
  60: dining table
  61: toilet
  62: tv
  63: laptop
  64: mouse
  65: remote
  66: keyboard
  67: cell phone
  68: microwave
  69: oven
  70: toaster
  71: sink
  72: refrigerator
  73: book
  74: clock
  75: vase
  76: scissors
  77: teddy bear
  78: hair drier
  79: toothbrush

We can apply the same technique on a video as well. First break the video into frames and then apply YOLO V5 on each frame to detect the objects of interest.

Leave a Reply

Your email address will not be published. Required fields are marked *