Guide to Object Detection with YOLO-NAS & FiftyOne Using E2E Cloud GPU Server

By Akshayraj

Object detection is a crucial aspect of computer vision, enabling machines to identify and locate multiple objects within an image or a video. It’s the foundational technology behind various applications like autonomous vehicles, surveillance systems, augmented reality, and even assistive technologies for the visually impaired. Among the various object detection algorithms, YOLO (You Only Look Once) stands out for its speed and accuracy.

YOLO-NAS Overview

Now, YOLO (You Only Look Once) is a pioneering object detection algorithm known for its speed and accuracy. Instead of sliding windows or region-based approaches, YOLO divides the image into a grid and predicts bounding boxes and class probabilities directly from grid cells. This approach allows YOLO to make predictions in real-time, making it well-suited for applications requiring rapid object detection. To get an in-depth understanding of YOLO, refer to the paper: https://arxiv.org/pdf/1506.02640.pdf.

Since its inception, the original YOLO paper has sparked a series of architectural advancements, each iteration refining and enhancing the model’s capabilities. One of the latest breakthroughs in this lineage is the YOLO-NAS (Neural Architecture Search) developed by Deci.ai, marking a significant leap in performance and precision in the realm of object detection.

Setting new benchmarks in the field, YOLO-NAS surpasses its predecessors by achieving superior mean average precision (mAP) while maintaining the same operational speed. This cutting-edge model stands as a testament to state-of-the-art (SOTA) technology, boasting unparalleled accuracy-speed equilibrium that outshines even renowned models like YOLOv5, YOLOv6, YOLOv7, and YOLOv8.

In raw statistical terms, YOLO-NAS demonstrates an approximate increase of 0.5 mAP points in accuracy compared to its counterparts, while exhibiting a striking speed advantage of 10–20% over equivalent versions of YOLOv8 and YOLOv7. The official comparative data is encapsulated in the figure below, illustrating the remarkable performance differentials among these models.

This evolution within the YOLO architecture, particularly with YOLO-NAS, underscores the relentless pursuit of precision and efficiency in object detection technologies. The tangible advancements witnessed in this latest iteration hold promise for diverse applications across industries, offering enhanced accuracy without compromising on speed, thereby revolutionizing the landscape of computer vision systems.

FiftyOne: Overview

FiftyOne stands out as a robust tool used for streamlined and efficient inference in the realm of computer vision. This versatile platform is tailored to simplify the process of evaluating, analyzing, and visualizing models’ predictions on datasets, making it a go-to choice for professionals working in object detection, image segmentation, and beyond. With its intuitive interface and comprehensive functionalities, FiftyOne streamlines the inference process, enabling practitioners to assess model performance, iterate on improvements, and gain deeper insights into their computer vision solutions effortlessly. Learn more about the tool in the following link: https://docs.voxel51.com/index.html.

E2E GPU Cloud

The most effective approach to grasp YOLO-NAS and FiftyOne involves hands-on experience, where the environment you choose for practice plays a pivotal role in mastering such complex architectures. Amidst numerous GPU cloud service providers available, selecting the right one can notably enhance both cost efficiency and productivity. Fortunately, after thorough research, I’ve identified E2E Cloud as the optimal choice, offering a balance between cost-effectiveness and accessibility. Moreover, it provides readily available setups for all required environments, expediting enthusiast projects by saving valuable time. For this hands-on session, I utilized the TIR-AI Platform within the E2E cloud. To embark on a similar journey, you can initiate the process by following this link: https://www.e2enetworks.com/blog/how-to-use-jupyter-notebooks-on-e2e-networks.

Let’s Play

To employ YOLO-NAS (super-gradients) and FiftyOne, installation can be accomplished via the Python package installer, PIP. In a Jupyter notebook, utilize the magic command as illustrated below:

!pip install fiftyone
!pip install super-gradients

Alternatively, when operating in the E2E cloud terminal:

pip install fiftyone
pip install super-gradients

Next, import the necessary packages by executing:

import os
import json
import numpy as np
import urllib.request


import fiftyone as fo
import fiftyone.utils.annotations as foua
import super_gradients

For our inference process, let’s procure several images from the internet and store them in a designated folder. However, if you already possess a folder containing images for use, you can bypass this step and directly upload the folder to the E2E cloud. Simply provide the path to this folder in the ‘base_dir’ variable.

# Alternatively if a folder full of images has to be predicted use os.listdir("folder_path")


test_images_urls =["http://farm4.staticflickr.com/3454/3208929391_b9fa771095_z.jpg",
       "http://farm9.staticflickr.com/8457/7981004366_626686aa75_z.jpg",
       "https://farm3.staticflickr.com/2183/2435864370_901e470541_z.jpg",
       "https://farm2.staticflickr.com/1091/527450857_dfdcbcb3e7_z.jpg",
       "https://farm9.staticflickr.com/8507/8476712489_a24567bf64_z.jpg"
                   ]


# download the images
base_dir = "test_imgs"
try:
    os.mkdir(base_dir)
except OSError as error:
    print("Directory Exists!")


for index, img_url in enumerate(test_images_urls):
  urllib.request.urlretrieve(img_url, f"{base_dir}/{str(index)}.jpg")

Visualizing the images using the FiftyOne session allows us to confirm if the data has been loaded correctly. Don’t hesitate to explore the tool to gain a deeper understanding!

# View the downloaded dataset
test_dataset = fo.Dataset.from_images_dir(base_dir)
test_session = fo.launch_app(test_dataset)

Next, we’ll initialize the YOLO-NAS model for predictions. In this code snippet, we’re employing the YOLO-NAS Large model. However, if you prefer a different variant like medium or small, adjust the ‘model_size’ variable accordingly. To assess the functionality of the code, a sample image is utilized, initializing the model with pre-trained weights from the coco dataset.

# Loading model, for medium/low use 'yolo_nas_m' / 'yolo_nas_s'
model_size = "yolo_nas_l"


# If you don't have GPU instance remove .cuda()
yolo_nas = super_gradients.training.models.get(model_size, pretrained_weights="coco").cuda()


img_url = "https://deci-pretrained-models.s3.amazonaws.com/sample_images/beatles-abbeyroad.jpg"
# test and view prediction on single image
model_predictions  = yolo_nas.predict(img_url).show()

Let’s leverage the capabilities of YOLO-NAS to make predictions on the loaded data using FiftyOne. The ‘conf’ variable serves as a hyperparameter, dictating the minimum confidence necessary to draw bounding boxes after object detection. For this experiment, we set ‘conf=0.6’.

file_paths, widths, heights = test_dataset.values(["filepath", "metadata.width", "metadata.height"])
preds = yolo_nas.predict(file_paths, conf = 0.6)._images_prediction_lst

The ‘_images_prediction_lst’ parameter produces a Python iterable where each element contains all predictions for each image. To better comprehend this, let’s take a peek into the predictions of the first image.

# Peeking into predictions
print(preds[0])

Now, it’s time to merge the predictions with the image and plot the bounding boxes. This involves 3 steps:

  1. Acquiring a label dictionary, mapping each label to its respective object.

  2. Converting YOLO’s ‘xyxy’ bounding box format to COCO format.

  3. Transforming YOLO predictions into FiftyOne Detection objects.

The following code blocks cover all these 3 steps seamlessly.

1.

# create label to lable_name mapping
label_dict = {i:j for i,j in enumerate(preds[0].class_names)}
print(label_dict)

2.

def convert_bboxes(bboxes, w, h):
  """
  Input:
  bboxes: YOLO boundary boxes of type xyxy
  w : width of image
  h : height of image


  Output:
  COCO format converted boundary boxes
  """
  tmp = np.copy(bboxes[:, 1])
  bboxes[:, 1] = h - bboxes[:, 3]
  bboxes[:, 3] = h - tmp
  bboxes[:, 0]/= w
  bboxes[:, 2]/= w
  bboxes[:, 1]/= h
  bboxes[:, 3]/= h
  bboxes[:, 2] -= bboxes[:, 0]
  bboxes[:, 3] -= bboxes[:, 1]
  bboxes[:, 1] = 1 - (bboxes[:, 1] + bboxes[:, 3])
  return bboxes

3.

# extract bbox, confidence and labels from predictions
# create FiftyOne's detections object for each image
all_detections = []
for pred in preds:
  img = pred.image
  pred = pred.prediction
  height, width, _ = img.shape


  bboxes, probs, labels = np.array(pred.bboxes_xyxy), pred.confidence, pred.labels.astype(int)
  bboxes = convert_bboxes(bboxes, width, height)


  labels = [label_dict[i] for i in labels]


  detections = []
  for (label, prob, bbox) in zip(labels, probs, bboxes):
    detections.append(fo.Detection(label = label,confidence = prob,bounding_box = bbox))


  all_detections.append(fo.Detections(detections=detections))


print(all_detections[0]) # converted to COCO format

Everything’s set! Now, let’s bring on the visualizations using FiftyOne, where the real fun begins. Time to explore and enjoy the predictions in action!

# Visualize the predictions with FiftyOne


dataset = fo.Dataset() # empty dataset
samples = []
for fpath, pred in zip(file_paths, all_detections): #iterate all images and draw detections
  samples.append(fo.Sample(filepath=fpath, pred_objects=pred))

dataset.add_samples(samples)


session = fo.launch_app(dataset)

Conclusion

In conclusion, diving into YOLO-NAS and FiftyOne has been an exciting journey. We’ve explored powerful tools and techniques, unlocking the potential of computer vision in a hands-on, dynamic way. And guess what? It’s all been made smoother and more exhilarating with the support of E2E Cloud! Cheers to the thrill of innovation and learning in this vibrant tech landscape!

The code used can be found at GitHub: https://github.com/Lord-Axy/Article-YoloNAS

Let’s stay connected:

LinkedIn: https://www.linkedin.com/in/akshayraj-axy-210733132/

References

YOLO Original Paper: https://arxiv.org/pdf/1506.02640.pdf

Deci.ai : https://deci.ai/

YOLO-NAS: https://deci.ai/blog/yolo-nas-object-detection-foundation-model/

FiftyOne: https://docs.voxel51.com/index.html

E2E network home: https://www.e2enetworks.com/

E2E environment setup: https://www.e2enetworks.com/blog/how-to-use-jupyter-notebooks-on-e2e-networks

COCO Prediction tutorial: https://voxel51.com/blog/state-of-the-art-object-detection-with-yolo-nas-fiftyone/