Object Detection using Tensorflow: bee and butterfly Part V

homeMachine Learning >Image Processing

Object Detection using Tensorflow: bee and butterflies

  1. Part 1: set up tensorflow in a virtual environment
  2. adhoc functions
  3. Part 2: preparing annotation in PASCAL VOC format
  4. Part 3: preparing tfrecord files
  5. more scripts
  6. Part 4: start training our machine learning algorithm!
  7. COCO API for Windows
  8. Part 5: perform object detection

Tips. Instead of reading this post, read instead Object Detection using Tensorflow: bee and butterfly Part V, faster. The object detection process is performed with a much more efficient arrangement. The code below has re-run tensorflow session at each iteration of image which we want to perform object detection on. However, this costs a large time overhead. In our new code, the session is only run once. The first image will take some tiem, but the subsequent images will be processed very quickly.

In part IV, we end with completing the training of our faster R-CNN model. Since we ran 2000 training steps, the last produced model checkpoints will be model.ckpt-2000. We need to make a frozen graph out of it to be able to successfully utilize it for prediction.

Freezing the graph

Let’s go into command line cmd.exe. Remember to go into the virtual environment if you started with one, as we instructed.

cd C:\Users\acer\Desktop\adhoc\myproject\Lib\site-packages\tensorflow\models\research
SET INPUT_TYPE=image_tensor
SET TRAINED_CKPT_PREFIX="C:\Users\acer\Desktop\adhoc\myproject\models\model\model.ckpt-2000"
SET PIPELINE_CONFIG_PATH="C:\Users\acer\Desktop\adhoc\myproject\models\model\faster_rcnn_resnet101_coco.config"
SET EXPORT_DIR="C:\Users\acer\Desktop\adhoc\myproject\models\export"
python object_detection/export_inference_graph.py --input_type=%INPUT_TYPE% --pipeline_config_path=%PIPELINE_CONFIG_PATH% --trained_checkpoint_prefix=%TRAINED_CKPT_PREFIX% --output_directory=%EXPORT_DIR%

Upon successful completion, the following will be produced in the directory .

+ saved_model
  + variables
  - saved_model.pb
+ checkpoint
+ frozen_inference_graph.pb
+ pipeline.config
+ model.ckpt.data-00000-of-00001
+ model.ckpt.index
+ model.ckpt.meta

Notice that three ckpt files are created. We can use this for further training by replacing the 3 ckpt files from part 4.

frozen_inference_graph.pb is the file we will be using for prediction. We just need to run the following python file with suitable configuration. Create the following directory and put all the images that contain butterflies or bees which you want the algorithm to detect into the folder img_predict. I will put 4 images, 2 from the images/test and 2 completely new pictures, neither from images/test nor images/train (but also from https://www.pexels.com/). As seen in blue, these images are named predict1.png ,predict2.png, predict3.png and predict4.png.

+ ...
+ img_predict

Finally, to perform prediction, just run the following using cmd.exe after moving into adhoc/myproject folder where we place our prediction.py (see the script below).

python prediction.py

and predict1_MARKED.png, for example, will be produced in img_predict, with boxes showing the detected object, either butterfly or bee.

See the blue highlight below; most configurations that need to be done are in blue. The variable TEST_IMAGES_NAMES contains the name of the files we are going to predict. You can rename the images or just change the variable. Note that in this code, the variable filetype stores the file type of images we are predicting. For each prediction, thus, we can only perform prediction for the same type of images. Of course we can do better. Modify the script accordingly.


# from distutils.version import StrictVersion
import os, sys,tarfile, zipfile
import numpy as np
import tensorflow as tf
import six.moves.urllib as urllib
from PIL import Image
from io import StringIO
from matplotlib import pyplot as plt
from collections import defaultdict
from object_detection.utils import ops as utils_ops

# Paths settings
THE_PATH = "C:/Users/acer/Desktop/adhoc/myproject/Lib/site-packages/tensorflow/models/research"
PATH_TO_FROZEN_GRAPH = "C:/Users/acer/Desktop/adhoc/myproject/models/export/frozen_inference_graph.pb"
PATH_TO_LABELS = "C:/Users/acer/Desktop/adhoc/myproject/data/butterfly_bee_label_map.pbtxt"
PATH_TO_TEST_IMAGES_DIR = 'C:/Users/acer/Desktop/adhoc/myproject/img_predict'

TEST_IMAGE_NAMES = ["predict1","predict2","predict3","predict4"]
filetype = '.png'
TEST_IMAGE_PATHS = [''.join((PATH_TO_TEST_IMAGES_DIR, '\\', x, filetype)) for x in TEST_IMAGE_NAMES]
# print("test image path = ", TEST_IMAGE_PATHS)
IMAGE_SIZE = (12, 8) # Size, in inches, of the output images.

from utils import label_map_util
from utils import visualization_utils as vis_util
# MODEL_NAME = 'faster_rcnn_resnet101_pets'

detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
    serialized_graph = fid.read()
    tf.import_graph_def(od_graph_def, name='')

label_map  = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

for image_path, image_name in zip(TEST_IMAGE_PATHS, TEST_IMAGE_NAMES):
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)
  # Actual detection.
  output_dict = run_inference_for_single_image(image_np, detection_graph)
  # Visualization of the results of a detection.
      # ercx!
      # each element in DETECTION BOX is [ymin, xmin, ymax, xmax]
      # do consider the following
      #           im_width, im_height = image.size
      #       if use_normalized_coordinates:
      #         (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
      #                                       ymin * im_height, ymax * im_height)
      min_score_thresh = 0.4)
  # print("detection_boxes:")
  # print(output_dict['detection_boxes'])
  # print(type(output_dict['detection_boxes']),len(output_dict['detection_boxes']))
  # print('detection_classes')
  # print(output_dict['detection_classes'])
  # print(type(output_dict['detection_classes']),len(output_dict['detection_classes']))
  # print('detection_scores')
  # print(output_dict['detection_scores'], len(output_dict['detection_scores']))
  print('\n**************** detection_scores\n')
  # plt.imshow(image_np)
  plt.imsave(''.join((PATH_TO_TEST_IMAGES_DIR, '\\',image_name,"_MARKED", filetype)), image_np)

Here are the outputs, pretty good I will say.


Note: the commented codes are there to assist you with the output of prediction. For example, if you would like to extract the data points of the rectangles that show the position of the butterflies or the bees, note that you can obtain it from output_dict[‘detection_boxes’]. Other information is stored in the dictionary output_dict as well.

You can play around with different models. But that’s it for now, cheers!