Object detection is a computer vision task recently influenced by advances in machine learning.
What algorithm do you employ for tasks involving object detection? R-CNN It provided a useful structure.
First, let's clarify what object detection is.
Object Detection is the process of finding real instances of objects, such as a car, bicycle, television, flowers, and people, in still images or videos. It allows recognition, localization, and detection of multiple objects within an image, giving us a much better understanding of the image. It is commonly used in image retrieval, security, surveillance, and advanced driver assistance systems (ADAS) applications.
Object detection can be done in several ways:-
Each object detection algorithm has a different way of working, but they all work on the same principle.
In general, the object detection task is performed in three steps:-
Now that you have understood the basic object detection workflow let's move forward in the object detection tutorial and understand what Tensorflow is.
Our Learners Also Read: What is the main difference between RNN and LSTM?
Tensorflow is Google's Open Source Machine Learning Framework for data flow programming across various tasks. The nodes in the graph represent mathematical operations, while the edges of the graph represent multidimensional data arrays (tensors) that communicate with each other.
Tensors are just multidimensional arrays, extensions of 2-dimensional arrays to higher-dimensional data. There are various features of Tensorflow that make it suitable for deep learning. So, without wasting time, let's see how we can implement object detection using Tensorflow.
Let's quickly review the three R-CNN family algorithms—R-CNN, Fast R-CNN, and Faster R-CNN—that we looked at in the first post. This will make it straightforward for us to implement when we anticipate bounding boxes in previously unexplored images (new data).
R-CNN
R-CNN uses a selective search to extract several regions from a given image, and it then determines whether regions contain objects. These regions are initially extracted, and for each region, CNN is utilized to extract particular features. Finally, object detection is accomplished using these features. Unfortunately, because there are so few steps in the process, R-CNN becomes comparatively slow.
Fast R-CNN
Fast R-CNN, on the other hand, sends the entire image to ConvNet, which creates regions of interest (instead of passing extracted areas from the image). Additionally, it employs a single model that collects features from areas, categorizes them into various classes, and produces bounding boxes rather than three independent models (as we saw with R-CNN).classifies them into other classes, and returns bounding boxes.
All these steps are performed simultaneously, making it faster compared to R-CNN. However, R-CNN is not fast enough when applied to a large dataset because it also uses selective search to extract regions.
Faster R-CNN
Faster R-CNN is an extension of Fast R-CNN. The name suggests that Faster R-CNN is faster than Fast R-CNN due to the region prediction network (RPN).
The main contributions to this:
Region Design Network (RPN) is a fully convolutional network that generates designs with different scales and aspect ratios. RPN implements neural network terminology that emphasizes telling object detection (Fast R-CNN) where to look.
Rather than using image pyramids (i.e., multiple instances of an image but at different scales) or filter pyramids (i.e., various filters with different sizes), this article introduced the concept of anchor boxes. An anchor box is a specific scale and aspect ratio reference box. With multiple reference frames, there are various scales and aspect ratios for a single area. One can think of this as a pyramid made out of reference anchor boxes. Then, each zone is mapped to a separate reference anchor box to enable the detection of objects with various scales and aspect ratios.
RPN and Fast R-CNN both use convolutional computations. The calculating time is shortened by this.
The following figure depicts the Faster R-architecture CNN There are two modules in it:-
It is the responsibility of the RPN module to create draught regions. The Fast R-CNN detection engine is directed where to look for items in the image by this application of the attentional principle in neural networks.
The next step is to create a new Python file, paste this code, and navigate to the "Object Detection" directory inside the research subfolder.
“`
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import pathlib
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from IPython.display import display
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
while "models" in pathlib.Path.cwd().parts:
os.chdir('..')
def load_model(model_name):
base_url = 'http://download.tensorflow.org/models/object_detection/'
model_file = model_name + '.tar.gz'
model_dir = tf.keras.utils.get_file(
fname=model_name,
origin=base_url + model_file,
untar=True)
model_dir = pathlib.Path(model_dir)/"saved_model"
model = tf.saved_model.load(str(model_dir))
return model
PATH_TO_LABELS = 'models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
model_name = 'ssd_inception_v2_coco_2017_11_17'
detection_model = load_model(model_name)
“`
“`
def run_inference_for_single_image(model, image):
image = np.asarray(image)
# Use 'tf.convert to tensor' to convert the input, which must be a tensor.
input_tensor = tf.convert_to_tensor(image)
# Add an axis using 'tf.newaxis' because the model anticipates a batch of photos.
input_tensor = input_tensor[tf.newaxis,…]
# Run inference
model_fn = model.signatures['serving_default']
output_dict = model_fn(input_tensor)
# All outputs are batches tensors.
# Convert to numpy arrays and remove the batch dimension using index [0].
# We're only interested in the first num_detections.
num_detections = int(output_dict.pop('num_detections'))
output_dict = {key:value[0, :num_detections].numpy()
for key,value in output_dict.items()}
output_dict['num_detections'] = num_detections
# detection_classes should be ints.
output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
# Handle models with masks:
if 'detection_masks' in output_dict:
# Resize the bbox mask to fit the image.
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
output_dict['detection_masks'], output_dict['detection_boxes'],
image.shape[0], image.shape[1])
detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,
tf.uint8)
output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()
return output_dict
def show_inference(model, image_path):
#Later, to generate the output image with boxes and labels on it, the array-based representation of the image will be employed.
image_np = np.array(Image.open(image_path))
# Actual detection.
output_dict = run_inference_for_single_image(model, image_np)
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks_reframed', None),
use_normalized_coordinates=True,
line_thickness=8)
display(Image.fromarray(image_np))
“`
We have a test images folder inside the object detection folder. Two photos that will be used to test the model are already present in that folder. The results can be obtained by running the below cells while also inserting the images for which we wish to locate items.
“`
PATH_TO_TEST_IMAGES_DIR = pathlib.Path('models/research/object_detection/test_images')
TEST_IMAGE_PATHS = sorted(list(PATH_TO_TEST_IMAGES_DIR.glob("*.jpg")))
“`
“`
for image_path in TEST_IMAGE_PATHS:
print(image_path)
show_inference(detection_model, image_path)
“`
We have now reached the conclusion of this blog, in which we learned how to utilize the Tensorflow Object Detection API to identify things in both webcam feeds and photos.
About The Author:
Digital Marketing Course
₹ 29,499/-Included 18% GST
Buy Course₹ 41,299/-Included 18% GST
Buy Course