Vehicle Detection using YOLO v4 in Python

6 min readMay 11, 2022

For complex Machine Learning application, the favourite go-to algorithm would be Neural Networks as they show high accuracy than traditional ML algorithms. It is also interchangeably used with the term Deep Learning because the size of the Neural Network are really huge having multiple layers. Applications like object recognition, language translation, sound recognition etc. use Neural Network.
I wanted to have a hands-on trial with some of these Deep Learning algorithms utilizing Neural Networks as their base. For this I choose the YOLO v4 algorithm which is utilized in image detection. YOLO stands for You Only Look Once. I have used the already trained model weights for this post as I just wanted to get kick started on Deep Learning 😅. So without wasting another moment lets get started!

Required Files and Packages

For this exercise, I chose Google Colab. Object/Image detection using videos is something that I wanted to try and I had doubts whether my laptop could handle this 🤣. Hence Google Colab. Also some short videos featuring vehicles is needed which I downloaded from this site -
https://www.pexels.com
The next thing needed is to download the already trained weights of YOLOv4. I downloaded from the below GitHub page —
https://github.com/kiyoshiiriemon/yolov4_darknet
If you scroll to the bottom you can see two files — yolov4.cfg and yolov4.weights both of these need to be downloaded. The screenshot below shows the one I downloaded

Another file to download is the coco class text files. So basically the above YOLO implementation has been trained on the classes given in the coco dataset. This text file is required in the code for detecting the class to which the identified object belongs. There are 80 different classes on which the algorithm has been trained on. I downloaded the file from the below github link:
https://github.com/taipingeric/yolo-v4-tf.keras/tree/master/class_names
After downloading the above resources, I uploaded them to my drive so that I can access them in Colab. Firing up the Colab execute the below code first to install a specified version of opencv
!pip install opencv-contrib-python==4.5.4.60 -force-reinstall
The reason to do this is that one of the function used in opencv — ‘ cv2.dnn.readNetFromDarknet’ has issues with the default opencv version installed in Colab. By installing the above version I was able to solve the issue.
Next up is importing the required packages in Colab. The list is quite small and need numpy,imutils and opencv.
Vehicle Detection Algorithm:
The basic flow of the algorithm goes this way —
1. Feed a video as input.
2. Decompose the video into individual Frames.
3. Detect vehicles in each frame and highlight them.
4. Convert the images back into video.
The below post in pyimagesearch site has beautifully explained the working of YOLO and code as well. I have taken the code relating to vehicle detection in video and modified it to work for YOLO v4. You can find the site link below:
https://pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv/

For easy reference I have pasted the code which I modified to run YOLO v4 below:

!pip install opencv-contrib-python==4.5.4.60 --force-reinstall
import numpy as np
import imutils
import cv2inputVideoPath = 'your_input_video_path'
outputVideoPath = 'your_output_video_path.avi'
yoloWeightsPath = 'your_weights_path.weights'
yoloConfigPath = 'your_configuration_path.cfg'
detectionProbabilityThresh = 0.5
nonMaximaSuppression = 0.3labelsPath = 'your_file_path/coco_classes.txt'
LABELS = open(labelsPath).read().strip().split("\n")np.random.seed(42)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3), dtype="uint8")net = cv2.dnn.readNetFromDarknet(yoloConfigPath, yoloWeightsPath)
ln = net.getLayerNames()
ln = [ln[i - 1] for i in net.getUnconnectedOutLayers()]vs = cv2.VideoCapture(inputVideoPath)
writer = None
(W, H) = (None,None)try:
    prop = cv2.CAP_PROP_FRAME_COUNT if imutils.is_cv2() else cv2.CAP_PROP_FRAME_COUNT
    total = int(vs.get(prop))except:
    total = -1
    print('Frames could not be determined')while True:
    (grabbed, frame) = vs.read()if not grabbed:
        breakif W is None and H is None:
        (H, W) = frame.shape[:2]blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416), swapRB=True, crop=False)
    net.setInput(blob)
    layerOutputs = net.forward(ln)
    # initialize our lists of detected bounding boxes, confidences,
    # and class IDs, respectively
    boxes = []
    confidences = []
    classIDs = []
    # loop over each of the layer outputs
    for output in layerOutputs:
        # loop over each of the detections
        for detection in output:
            # extract the class ID and confidence (i.e., probability)
            # of the current object detection
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]
            # filter out weak predictions by ensuring the detected
            # probability is greater than the minimum probability
            if confidence > detectionProbabilityThresh:
                # scale the bounding box coordinates back relative to
                # the size of the image, keeping in mind that YOLO
                # actually returns the center (x, y)-coordinates of
                # the bounding box followed by the boxes' width and
                # height
                box = detection[0:4] * np.array([W, H, W, H])
                (centerX, centerY, width, height) = box.astype("int")
                # use the center (x, y)-coordinates to derive the top
                # and and left corner of the bounding box
                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))
                # update our list of bounding box coordinates,
                # confidences, and class IDs
                boxes.append([x, y, int(width), int(height)])
                confidences.append(float(confidence))
                classIDs.append(classID)
    # apply non-maxima suppression to suppress weak, overlapping
    # bounding boxes
    idxs = cv2.dnn.NMSBoxes(boxes, confidences, detectionProbabilityThresh, nonMaximaSuppression)
    # ensure at least one detection exists
    if len(idxs) > 0:
        # loop over the indexes we are keeping
        for i in idxs.flatten():
            # extract the bounding box coordinates
            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])
            # draw a bounding box rectangle and label on the frame
            color = [int(c) for c in COLORS[classIDs[i]]]
            cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
            text = "{}: {:.4f}".format(LABELS[classIDs[i]], confidences[i])
            cv2.putText(frame, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)# check if the video writer is None
    if writer is None:
        # initialize our video writer
        fourcc = cv2.VideoWriter_fourcc(*"MJPG")
        writer = cv2.VideoWriter(outputVideoPath, fourcc, 30, (frame.shape[1], frame.shape[0]), True)
    # write the output frame to disk
    writer.write(frame)# release the file pointers
print("[INFO] cleaning up...")
writer.release()
vs.release()

The code took approximately around 35 minutes to process each video and create the annotated video. This was because Colab was running on CPU rather than on GPU. An output sample of one of the videos I have shown below:

Its not that perfect but it seems no less than magic ! The cars and their drivers are also identified in some of the instances. There seems to be confusion when the car/jeep is farther in the clip so most of the time it identifies as truck. But once the cars are closer in the frame or more clear, then it identifies correctly. The sign post seems to be identified as truck ! The Gif might not be that detailed but you can see the markings perfectly in the resultant output video.
I downloaded these result videos and converted into gif using below code. I think this could be quiet helpful !

from moviepy.editor import *inputVideoPathOne = 'input_videos/video_1.mp4'
inputVideoPathTwo = 'output_videos/video_1.avi'
outputVideoPathOne = 'output_videos/vid_1_input.gif'
outputVideoPathTwo = 'output_videos/vid_1_output.gif'clipOne = (VideoFileClip(inputVideoPathOne).subclip((0.0),(10.0)).resize(0.2))
clipOne.write_gif(outputVideoPathOne,fps=15)clipTwo = (VideoFileClip(inputVideoPathTwo).subclip((0.0),(10.0)).resize(0.2))
clipTwo.write_gif(outputVideoPathTwo,fps=15)

Well with this we come to end of the blog. This was just a light dive into the world of Deep Learning magic. In upcoming blogs you can expect some deep dive into the algorithms and more such Deep/Machine Learning mini projects!

References:
- A lot of googling amongst which the major sources were
https://pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv/
https://neptune.ai/blog/object-detection-with-yolo-hands-on-tutorial
stackoverflow

Originally published at http://evrythngunder3d.wordpress.com on May 11, 2022.

Vehicle Detection using YOLO v4 in Python

Written by Shrinand Kadekodi