ScaledYOLOv4 3 is the best object detection model currently available. But how does it perform on out-of-dataset images? In this blog, we will test how well the model generalizes by running it through the CityPersons1,2 dataset using pre-trained model weights. For this demonstration, we will use Google Colab as it provides free GPU access to notebooks, and is well, free. Let’s get started.
Installing Scaled YOLOv4 dependencies
Once inside the notebook, change the runtime to GPU.
As Colab provides many preinstalled packages like PyTorch, we just need to install a few additional dependencies along with the main model code.
First, clone the Scaled-YOLOv4 repo and switch over to the yolov4-large branch.
Let’s check if the GPU is running properly.
Next we'll install mish-cuda in order to run the mish activation functions quickly on our notebook's GPU. After that, we need to update to the latest pyYaml version as colab offers an older one by default.
Looks Good! We are done with the dependencies. Next we will download the model weights.
Downloading Model Weights
We will will use pretrained model weights for our testing. We will do this by using the ‘gdown’ command and passing it our Google Drive ID to download the pre-trained model weights.
First, navigate to the ScaledYOLOv4 directory and create a new weights folder for storing the pre-trained model weights.
Now, lets download the weights.
We have six model weights trained at different checkpoints with yolov4-p7.pt being the best among them. Now, let’s move to the next step.
Preparing the Custom Dataset
To test the object detection model, we will will use the CityPerson dataset, a subset of the CityScapes dataset1,2 - a widely used dataset obtained from different cities worldwide. For our use case, we will use a small subset of the original dataset consisting of three cities - Frankfurt, Lindau, and Munster. We have already prepared the images and labels in YOLO format, so you don’t have to!
To download the dataset from Kaggle, we will use the package ‘opendatasets’. The following line of code shows how this can be done.
Note: To download the dataset from Kaggle, you will need an API token that can be created from your Kaggle account. Click on the profile icon on the top right of the window --> Click on ‘Account’ → ‘Create new API Token’. Enter your username and this API key, after you execute the above lines of code.
Creating a Custom Yaml file
YOLOv4 uses a Yaml file to store the dataset location, the number of classes, and the names of classes for training and testing. Since we are using a custom dataset, we need to create a new YAML file for testing. To do this, navigate to the data folder in the ScaledYOLOv4 directory and create a ‘cityperson.yaml’ file.
And enter the following lines.
The first line specifies the validation image location. The second line indicates the number of classes in the labels file and the third line indicates the names of classes in the labels file.
That’s it. We are ready to test the model on the cityperson dataset.
Testing the Model
Navigate to the ScaledYOLOv4 directory. You will notice a test.py file. This file can be used to run validation and test processes. Run the below line of code to kick-start the validation.
This might take a few seconds to a few minutes depending on your GPU. Once run, it should produce an output similar to the following.
If you run the above code, you will find a test_batch0 image and a test_batch0_pred image in the ScaledYOLOv4 directory. Let’s look into these.
The above is our labeled image. As you can see, there is only one label - ‘person’. Now, let’s look into the predicted image.
We can notice a few things from these images:
- The object detection itself is pretty good. It was able to draw bounding-boxes around 2 out of 3 people and around both the cars.
- It mis-classified one car as a person.
Quantitatively, the Precision and Recall are 0.489 and 0.448 respectively. To put this into perspective, in our experiments on the COCO Validation 2017 set, ScaledYOLOv4 scored 0.47 on precision and 0.85 on recall for “person” class, and 0.397 on precision and 0.75 on recall, averaged over all 80 classes. So while there is a sharp decline in recall, precision actually received a minor boost. We chose the CityPersons dataset specifically because its perspective differs from the images usually found in the COCO dataset on which the model is trained. Considering this, these results are actually not that bad.
But are there ways to improve them? There certainly are. The most straightforward is to simply train the model on your custom data. If you want to use pre-trained weights, then the other solution can be to freeze all-but-final few layers, reduce the classification to “person” and “not_person” and train this custom setup on your data. A final solution to improve the results further is to improve your training and validation labels.
We at Mindkosh are always looking for ways to improve the quality of annotated data, and to do all of it at blazing fast speeds. Keep an eye out for our blogs to get a sneak peek into how we work our magic!