Object Detection using Tensorflow: coco API for python in Windows

homeMachine Learning >Image Processing

Object Detection using Tensorflow: bee and butterflies

  1. Part 1: set up tensorflow in a virtual environment
  2. adhoc functions
  3. Part 2: preparing annotation in PASCAL VOC format
  4. Part 3: preparing tfrecord files
  5. more scripts
  6. Part 4: start training our machine learning algorithm!
  7. COCO API for Windows
  8. Part 5: perform object detection

We continue our tutorial from part IV. Since the instruction to set up coco API for python here is for Linux, we need to find a way to do it in Windows. First download the coco API and extract it, and you will see folder cocoapi-master. The following instruction mainly refers to the link here.

We need to follow the following steps beforehand:

  1. install Microsoft Visual C++ 14.0. ***
  2. Visual C++ might raise rc.exe error. Fix by add to PATH variable C:\Program Files (x86)\Windows Kits\8.1\bin\x64.
  3. inside cocoapi-master/PythonAPI, edit the setup.py (see below).

Install MinGW and go into msys.exe. Move into the PythonAPI folder of the coco API just downloaded.

cd "C:\Users\acer\Downloads\cocoapi-master\PythonAPI"

If successful, the file _mask.cp36-win_amd64 will be generated in /PythonAPI/pycocotools. Move the folder pycocotools so that we have

+ pycocotools
 - ...
 - _mask.cp36-win_amd64

The modified setup.py is shown here.


import sys
from distutils.core import setup
from Cython.Build import cythonize
from distutils.extension import Extension
import numpy as np

extra_compile_args = ['-Wno-cpp', '-Wno-unused-function', '-std=c99']\
  if sys.platform != 'win32' else []
ext_modules = [
        sources=['../common/maskApi.c', 'pycocotools/_mask.pyx'],
        include_dirs = [np.get_include(), '../common'],

      package_dir = {'pycocotools': 'pycocotools'},

*** We attempted to install VC++ 14 via Visual Studio Community 2017 installer Windows 10 and had some troubles. In particular, msys.exe raised error “cannot find vcvarsall.bat”. Indeed, when we look into “C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC”, vcvarsall.bat is not installed.

Instead, we download Visual Studio Community 2015 here and install Programming Languages/Visual C++ package as shown here. Web installer is available but it is not downloading smoothly. When the downloader says there is a “A Setup Package is Missing or Damaged”, just keep clicking “Retry” till it works.

Object Detection using Tensorflow: bee and butterfly Part IV

homeMachine Learning >Image Processing

Object Detection using Tensorflow: bee and butterflies

  1. Part 1: set up tensorflow in a virtual environment
  2. adhoc functions
  3. Part 2: preparing annotation in PASCAL VOC format
  4. Part 3: preparing tfrecord files
  5. more scripts
  6. Part 4: start training our machine learning algorithm!
  7. COCO API for Windows
  8. Part 5: perform object detection

We have prepared tfrecord files, which are basically just the images and annotations bundled into a format that we can feed into our tensorflow algorithm. Now we start the training.

Before proceeding, we need to use coco API for python. It is given here, though the instruction given is to set up for Linux. See here for the instruction to set it up in Windows. Once done, copy the entire folder pycocotools from inside PythonAPI into the following folder.


Some recap: in part I, we have set the label map but have not configured the model we will use.


item {
id: 1
name: 'butterfly'

item {
id: 2
name: 'bee'

Download the previously trained model* by going to this link and click “Downloading a COCO-pretrained Model for Transfer Learning” or directly download it here. Unzip the .tar file (download winrar to unzip it if your system could not do it), among the unzipped items there will be 3 checkpoint files (.ckpt). Move them into our model directory so that we have:

+ faster_rcnn_resnet101_coco.config
+ model.ckpt.data-00000-of-00001
+ model.ckpt.index
+ model.ckpt.meta

We will now configure the PATH_TO_BE_CONFIGURED in faster_rcnn_resnet101_coco .config inside the folder adhoc/myproject/models/model/. As the name suggest, we are using faster R-CNN, regions with convolutional neural network features by Ross Girshick et al. There are 5 PATH_TO_BE_CONFIGURED, each pointing to the corresponding files.

train_config: {
# fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
fine_tune_checkpoint: "C:/Users/acer/Desktop/adhoc/myproject/models/model/model.ckpt"

train_input_reader: {
  tf_record_input_reader {
    # input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-?????-of-00100"
    input_path: "C:/Users/acer/Desktop/adhoc/myproject/data/train.record"
  # label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
  label_map_path: "C:/Users/acer/Desktop/adhoc/myproject/data/butterfly_bee_label_map.pbtxt"

eval_input_reader: {
  tf_record_input_reader {
    # input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-?????-of-00010"
    input_path: "C:/Users/acer/Desktop/adhoc/myproject/data/test.record"
  # label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
  label_map_path: "C:/Users/acer/Desktop/adhoc/myproject/data/butterfly_bee_label_map.pbtxt"
  shuffle: false
  num_readers: 1

Let us start training! Here we are using a small number of training and evaluation steps, just to finish the training fast, of course at the expense of accuracy. The official site recommends NUM_TRAIN_STEPS=50000 and NUM_EVAL_STEPS=2000 steps in its tutorial. Go to command line, cmd.exe.

cd "C:\Users\acer\Desktop\adhoc\myproject\Lib\site-packages\tensorflow\models\research"

SET PIPELINE_CONFIG_PATH="C:\Users\acer\Desktop\adhoc\myproject\models\model\faster_rcnn_resnet101_coco.config"
SET MODEL_DIR="C:\Users\acer\Desktop\adhoc\myproject\models\model"
echo %MODEL_DIR%
python object_detection/model_main.py --pipeline_config_path=%PIPELINE_CONFIG_PATH% --model_dir=%MODEL_DIR% --num_train_steps=%NUM_TRAIN_STEPS% --num_eval_steps=%NUM_EVAL_STEPS% --alsologtostderr

Some possible errors may arise are listed at the end of this post***.

Many warnings may pop-up, but it will be fairly obvious if the training is ongoing (and not terminating due to some error). If you train your model in a laptop like mine, with only a single NVIDIA GeForce GTX 1050, you might be running out of memory as well. From task manager, my utilization profile looks like this. GPU is used in larger spikes consistently (more than 25% each spike), and CPU resource is heavily consumed.


See the next part on how to use to perform object detection after the training is completed.

Note: I notice there is some strange behaviour. Sometimes the training stops (task manager shows no resource consumption) that will proceed when I press enter at the command lines. To make sure this does not prevent us from completing the training, press enter several times in advance at the start of training.

Monitoring progress

We can monitor progress using tensorboard. Open another command line cmd.exe, enter the following

SET MODEL_DIR="C:\Users\acer\Desktop\adhoc\myproject\models\model"
tensorboard --logdir=%MODEL_DIR%

then open your web browser and type in the URL, given at the name of your computer port 6006. For mine, it is http://NITROGIGA:6006. About 15 minutes into the training, tensorboard shows the following:


Okay, it seems like the algorithm is detecting some yellow little parts of the flowers as a bee as well…

The following is an indication that the training is in progress.

creating index...
index created!
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.12s).
Accumulating evaluation results...
DONE (t=0.04s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.015
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.056
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.003
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.052
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.040
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.087
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.093
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000

and in the directory adhoc/myprojec/models/model, more checkpoints files will be created in a batch of 3: data, index and meta. For example, at the 58-th (out of the specified 2000) training steps, these are created.


Update: the training lasted about 4 hours.

*** Possible errors

If you see errors like

(unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

do check .config files. When setting the PATH_TO_BE_CONFIGURED, use either double back slash \\ or a single slash /. Using a single backslash \ will give error. This is just a problem of escaping character in string.

*** Another possible error.

We successfully used tensorflow-gpu version 1.10.0. However, when trying it with version 1.11.0, our machine does not recognize the gpu.

Object Detection using Tensorflow: More Scripts

homeMachine Learning >Image Processing

Object Detection using Tensorflow: bee and butterflies

  1. Part 1: set up tensorflow in a virtual environment
  2. adhoc functions
  3. Part 2: preparing annotation in PASCAL VOC format
  4. Part 3: preparing tfrecord files
  5. more scripts
  6. Part 4: start training our machine learning algorithm!
  7. COCO API for Windows
  8. Part 5: perform object detection

The following scripts are used for this.


Note that the blue highlight needs to be configured. Besides, the input images are read from folder /image and the output is saved to folder /data.

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET

def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df

def main():
    for directory in ['train','test']:
        image_path = os.path.join("C:\\Users\\acer\\Desktop\\adhoc\\myproject\\", 'images/{}'.format(directory))
        xml_df = xml_to_csv(image_path)
        xml_df.to_csv('data/{}_labels.csv'.format(directory), index=None)
        print('Successfully converted xml to csv.')



  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=data/train_labels.csv  --output_path=train.record
  # Create test data:
  python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=test.record
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')

# TO-DO replace this with label map
def class_text_to_int(row_label):
    if row_label == 'butterfly':
        return 1
    elif row_label == 'bee':
        return 2

def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]

def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    return tf_example

def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join("C:\\Users\\acer\\Desktop\\adhoc\\myproject\\", 'images')
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)

    output_path = "C:\\Users\\acer\\Desktop\\adhoc\\myproject\\"
    print('Successfully created the TFRecords: {}'.format(output_path))

if __name__ == '__main__':

Object Detection using Tensorflow: bee and butterfly Part II

home>ML>Image Processing

Object Detection using Tensorflow: bee and butterflies

  1. Part 1: set up tensorflow in a virtual environment
  2. adhoc functions
  3. Part 2: preparing annotation in PASCAL VOC format
  4. Part 3: preparing tfrecord files
  5. more scripts
  6. Part 4: start training our machine learning algorithm!
  7. COCO API for Windows
  8. Part 5: perform object detection

Tips: do remember to activate the virtual environment if you have deactivated it. Virtual environment helps ensure that packages we download may not interfere with the system or other projects, especially when we need older version of some packages.

We continue from Part I. Let us prepare the data to feed into the algorithm for training. We will not feed the images into the algorithm directly, but will convert them into tfrecord files. Create the following directory for the preparation. I named it keropb, you can name it anything.

+ butterflies_and_bees
  + Butterflies
    - butterflyimage1.png 
    - ...
  + Butterflies_canvas
    - butterflyimage1.png
  + Bees
    - beeimage1.png
    - ...
  + Bees_canvas
    - beeimage1.png
    - ...
+ do_clone_to_annotate.py
+ do_convert_to_PASCALVOC.py
+ do_move_a_fraction.py
+ adhoc_functions.py

Note: the image folders and the corresponding canvas folders can be downloaded here. Also, do not worry, the last 4 python files will be provided along the way.

We store all our butterflies images in the folder Butterflies and bee images in the folder Bees. The _canvas folders are exact replicas of the corresponding folders. You can copy-paste both Butterflies and Bees folders and rename them. In the canvas folders, however, we will mark out the butterflies and the bees. In a sense, we are teaching the algorithm which objects in the pictures are butterflies, and which are bees. To mark out a butterfly, use white (255,255,255) RGB to block out the butterfly. Ok, this is easy to do, just use the good ol’ Paint program and use white color to paint over the butterfly, or use eraser. See the example below. Note that the images have exactly the same names.

Tips: if the image contains white patches, they might be wrongly detected as a butterfly too. This is bad. In that case, paint this irrelevant white patches with other obvious color, such as black.


Install the package kero and its dependecies.

pip install kero
pip install opencv-python
pip install pandas

Tips. Consider using clone_to_annotate_faster(). It is A LOT faster with a little trade off on the accuracy of bounding boxes on rotated images. The step by step instruction to do this can be found in Object Detection using Tensorflow: bee and butterfly Part II, faster. If you do, we can skip the following steps and skip the front part of Part III. Follow the instruction there.

Create and run the following script do_clone_to_annotate.py from adhoc/keropb i.e. in cmd.exe, cd into keropb and run the command

python do_clone_to_annotate.py

Tips: We have set check_missing_mode=False. It is good to set it to True first. This helps us check if each image in Butterflies have a corresponding image in Butterflies_canvas. Before processing, we want to identify missing images so that we can fix them before proceeding. If everything is fine, “ALL GREEN. No missing files.” will be printed. Then set it to False and run it again.


import kero.ImageProcessing.photoBox as kip

# adjust the folders accordingly
this_folder = "C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\Butterflies"
tag_folder = "C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\Butterflies_canvas"

rotate_angle_set = [0,30,60,90,120,150,180] # None
annotation_name = "butterfly"
gsw.clone_to_annotate(this_folder, tag_folder,1,annotation_name,

Note: set order_name and tag_name to be the same so that adhoc_functions.py need not be adjusted later. See that Bees_LOG.txt and Butterflies_LOG.txt are created also, listing how the image files are renamed.

Tips: Read ahead. We will be doing the same thing for Bees folder, so go ahead and open new cmd.exe, create a copy and name it do_clone_to_annotate2.py so that we can run the process in parallel to save time.

Tips: If annotation fails for one reason or another after ground truth image generation is complete, then make sure to set skip_ground_truth=True before rerunning the script, so that we do not waste time re-spawning the ground truth images.

This will create Butterflies_CLONE, Butterflies_GT and Butterflies_ANNOT folders.

  1. The CLONE folder contains the images from Butterflies folder, but rotated to different angles as specified by the variable rotate_angle_set. This is to create more training images, so that the algorithm will learn to recognise the object even if it is tilted.
  2. The GT folder contains the ground truth images, set to black and white. White patch will be (desirably) the object we point to. Note that this may not be perfect and more settings will be available as we develop the package to optimize this.
  3. The ANNOT folder contains annotations, which are boxes to show where the object, butterfly or bee, is. This information is stored in txt file which contains the information in the format:
    label height width xmin ymin xmax ymax

    where label is either bee or butterfly; height and width are the width and height of the entire image. The image will be saved together with the annotation box as shown below.


Notice that we do this for the Butterflies folder. Do it for Bees folder as well. Also, I am using only about 30 images for each category bee and butterfly (you should use more). Using the above code, we perform 6x rotations on each image, by angles specified in the variable rotate_angle_set. This is so that the algorithm will be able to recognise the same object even if it appears in the different orientation. Note that at the time of writing, research on DNN is still ongoing and more robust image classification that can handle more transformations such as rotation might be available in the future. In total, then, we have about 180 images each.

To make tfrecord files that we will feed into the algorithm, we will need to convert this information further into PASCAL VOC format. Run the following script do_convert_to_PASCALVOC.py from adhoc/keropb. (See adhoc_functions.py here)

import adhoc_functions as af

annot_foldername = "C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\Butterflies_ANNOT"
annot_filetype = ".txt"
img_foldername = "C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\Butterflies_CLONE"
img_filetype = ".png"

annot_foldername = "C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\Bees_ANNOT"
annot_filetype = ".txt"
img_foldername = "C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\Bees_CLONE"
img_filetype = ".png"

A bunch of xml files, each corresponding to a butterfly or bee image, will be created in the _ANNOT folder. The format of these xml files are like this.

Good! We are ready to create tfrecords files in Part III.

Object Detection using Tensorflow: bee and butterfly Part III

home>ML>Image Processing

Object Detection using Tensorflow: bee and butterflies

  1. Part 1: set up tensorflow in a virtual environment
  2. adhoc functions
  3. Part 2: preparing annotation in PASCAL VOC format
  4. Part 3: preparing tfrecord files
  5. more scripts
  6. Part 4: start training our machine learning algorithm!
  7. COCO API for Windows
  8. Part 5: perform object detection

In part II we have created a directory storing butterflies and bees images, together with all the annotations showing where in each image a butterfly or a bee is. Now we convert them into tfrecord files, i.e. convert them into the format that the tensorflow algorithm we use can read.

You are encouraged to create an adhoc script to automate this whole part as well. Our demonstrations will be semi-manual. This part follows the steps recommended here.

Train and test split

Create the following empty folders:

+ Butterflies_train
+ Butterflies_test
+ Bees_train
+ Bees_test
+ ...

From Butterflies_CLONE, copy all images to Butterflies_train. From Butterflies_ANNOT, copy all xml files to the same Butterflies_train folder. Do the corresponding steps to the Bees. Now run the script do_move_a_fraction.py in the folder adhoc/keropb (We again make use of adhoc_functions.py from here).

import adhoc_functions as af



What we are doing above is to move 10% of the images and annotations from the train folders to the corresponding test folders. Here we are using roughly 10% of all the images we have to test if the model we train using the rest 90% is performing well. As of now, I have 291 images of bees and butterflies for training and 31 for testing (yes, by right we should have more).

Now create the following directory.

+ train
+ test

Put all files from Butterflies_train and Bees_train into images/train and all files from Butterflies_test and Bees_test into images/test.


Conversion to tfrecords

The following step will be quite memory inefficient. Copy all files from images/train and images/test into the images folder. We will need it.

Add the following file to the directory

+ ...
+ xml_to_csv.py

Note that this file, as shown here, needs to be configured. The variable image_path has to point to the train and test folders in images folder in adhoc/myproject. See the instruction in the link. Now go to the command line cmd.exe.

cd "C:\Users\acer\Desktop\adhoc\myproject"
python xml_to_csv.py

Both test_labels.csv and train_labels.csv will be produced in adhoc/myproject/data if the process is successful. Check that the csv files contains something like this

filename, width, height, class, xmin, ymin, xmax, ymax
imgBEE_107.png, 524, 350, butterfly, 151, 9, 424, 224

Also, add the following file to the directory

+ ...
+ generate_tf_records.py

This file is also shown here, and needs to be configured similarly. In main() of the script, adjust the variable path and output_path (see green highlight in the link) to the following.

path = os.path.join("C:\\Users\\acer\\Desktop\\adhoc\\myproject\\", 'images')
output_path = "C:\\Users\\acer\\Desktop\\adhoc\\myproject\\"

Also, edit the following function to correspond to the label map in the case you want to add more types of insects (see the orange highlight in the link).

def class_text_to_int(row_label):
    if row_label == 'butterfly':
        return 1
    elif row_label == 'bee':
        return 2

Now go to the command line cmd.exe, move into directory tensorflow\models\research\object_detection using

cd "C:\Users\acer\Desktop\adhoc\myproject\Lib\site-packages\tensorflow\models\research\object_detection"

Create the tfrecord files using the following. Of course the paths arguments output_path and csv_input must be changed accordingly.

python generate_tf_records.py --csv_input="C:/Users/acer/Desktop/adhoc/myproject/data/train_labels.csv" --output_path="C:/Users/acer/Desktop/adhoc/myproject/data/train.record"
python generate_tf_records.py --csv_input="C:/Users/acer/Desktop/adhoc/myproject/data/test_labels.csv" --output_path="C:/Users/acer/Desktop/adhoc/myproject/data/test.record"

Both test.record and train.record will be produced in adhoc/myproject/data.

See the next part, part IV, for training and prediction.


Object Detection using Tensorflow: adhoc functions

homeMachine Learning >Image Processing

Object Detection using Tensorflow: bee and butterflies

  1. Part 1: set up tensorflow in a virtual environment
  2. adhoc functions
  3. Part 2: preparing annotation in PASCAL VOC format
  4. Part 3: preparing tfrecord files
  5. more scripts
  6. Part 4: start training our machine learning algorithm!
  7. COCO API for Windows
  8. Part 5: perform object detection

The following script contains adhoc adjustable functions. This is written with respect to this object detection tutorial for convenience. Of course they can be done manually or differently.


import cv2, os, sys

def move_ALL_to_train(folder_name_CLONE, folder_name_ANNOT,rotate_angle_set, folder_target):
	img_in_CLONE =  [file for file in os.listdir(folder_name_CLONE)]
	xml_in_ANNOT =  [file for file in os.listdir(folder_name_ANNOT) if file.endswith("xml")]
	for x in img_in_CLONE:
		# print(x)
		copyfile(folder_name_CLONE+"\\"+x, folder_target+"\\"+x)
	for x in img_in_CLONE:
		num_label = 0 
		for angle_in_deg in rotate_angle_set:
			rotated_name = "R" + str(num_label) +"_"+ x
			# print(rotated_name)
			img_to_rotate = cv2.imread(''.join((folder_name_CLONE,"\\",x)))
			rows,cols,_ = img_to_rotate.shape
			M = cv2.getRotationMatrix2D((cols/2,rows/2),angle_in_deg,1)
			dst = cv2.warpAffine(img_to_rotate,M,(cols,rows))
			cv2.imwrite(folder_target+"\\"+rotated_name, dst)
            # rows,cols,_ = img_to_rotate.shape
            # M = cv2.getRotationMatrix2D((cols/2,rows/2),angle_in_deg,1)
            # dst = cv2.warpAffine(img_to_rotate,M,(cols,rows))          
			num_label = num_label + 1
	for x in xml_in_ANNOT:
		# print(x)

def mass_convert_to_PASCAL_VOC_xml(annot_foldername,annot_filetype ,
	txt_files = [file for file in os.listdir(annot_foldername) if (os.path.isfile(os.path.join(annot_foldername,file)) and file.endswith(annot_filetype))]
	img_files = [file for file in os.listdir(img_foldername) if (os.path.isfile(os.path.join(img_foldername,file)) and file.endswith(img_filetype))]
	for annot_filename_filetype, img_filename_filetype in zip(txt_files, img_files):
		annot_filename = annot_filename_filetype[0:len(annot_filename_filetype)-len(annot_filetype)]
		img_filename = img_filename_filetype[0:len(img_filename_filetype)-len(img_filetype)]

		# ******************************************* #
		# The following Boolean formula is the formula for corresponding matching strings
		formula = ( annot_filename == img_filename)
		# print(annot_filename," : ", img_filename, " --> matching names: ", name1, " : ", name2)
		# ******************************************* #
		if formula:
			count = count+1
			print("mass_convert_to_PASCAL_VOC_xml(). Files not matching: ", annot_filename, img_filename)
			print(" + terminating...")
	print("mass_convert_to_PASCAL_VOC_xml(). Number of converted files = ", count)

def convert_to_PASCAL_VOC_xml(annot_foldername,annot_filename,annot_filetype, 
	# assume each line in annotation file txt is
	# label\theight\twidth\txmin\tymin\txmax\tymax\n
	# print("convert_to_PASCAL_VOC_xml().")
	annot_txt = open(''.join((annot_foldername,"\\",annot_filename,annot_filetype)),'r')
	object_list = []
	line = annot_txt.read().split("\n")
	# print("LINE: ", line)
	for x in line:
		if x is not '':

	# print(" + start printing annotation xml")
	xml = open(''.join((annot_foldername,"\\",annot_filename,".xml")),'w')
	xml.write(''.join(("\t","<fol","der>", img_foldername,"</fol","der>\n")))
	xml.write(''.join(("\t","<fil","ename>", img_filename, img_filetype , "</file","name>\n")))
	xml.write(''.join(("\t", "<pa","th>", ''.join((img_foldername,"/", img_filename, img_filetype)), "</pa","th>\n")))

	if len(object_list)>0:
		img_height = object_list[0][1]
		img_width = object_list[0][2]	
		xml.write(''.join(("\t\t<hei","ght>", str(img_height), "</he","ight>\n")))
		xml.write(''.join(("\t\t<de","pth>", "3", "</de","pth>\n"))) # not sure
		xml.write(''.join(("\t\t<seg","mented>", str(0), "</segm","ented>\n")))

		for one_annot_line in object_list:
			label = one_annot_line[0]
			# img_height = one_annot_line[1]
			# img_width = one_annot_line[2]
			xmin = one_annot_line[3]
			ymin = one_annot_line[4]
			xmax = one_annot_line[5]
			ymax = one_annot_line[6]
			xml.write(''.join(("\t\t<na","me>", label, "</nam","e>\n")))
			xml.write(''.join(("\t\t<po","se>", "Unspecified", "</po","se>\n")))
			xml.write(''.join(("\t\t<trunca","ted>", "0", "</trunc","ated>\n")))
			xml.write(''.join(("\t\t<d","ifficult>", "0", "</diff","icult>\n")))
			xml.write(''.join(("\t\t\t\t<xm","in>",str(xmin) , "</xm","in>\n")))
			xml.write(''.join(("\t\t\t\t<ym","in>",str(ymin) , "</y","min>\n")))
			xml.write(''.join(("\t\t\t\t<xm","ax>",str(xmax) , "</xm","ax>\n")))
			xml.write(''.join(("\t\t\t\t<ym","ax>",str(ymax) , "</ym","ax>\n")))

	# print(" + convert_to_PASCAL_VOC_xml(). END.")

from os import listdir
import random
from shutil import copyfile

def move_some_percent(src,tgt):
	#! change the format here accordingly!!!!

	# src="C:\\Users\\ericotjoa\\Desktop\\I2R\\keropb\\IN PROGRESS\\gg_train"
	# tgt="C:\\Users\\ericotjoa\\Desktop\\I2R\\keropb\\IN PROGRESS\\gg_test"
	all_files = [f for f in listdir(src) if (f.endswith(".png") or f.endswith(".jpg"))]
	count = 1
	total = len(all_files)
	# print(all_files)
	for i in range(len(all_files)):
		if coin == 1:
			correspxml = ''.join((all_files[i][0:len(all_files[i])-4],".xml"))
			count = count+1

Update Log:

20181009: convert_to_PASCAL_VOC_xml() to be able to handle images without label

Object Detection using Tensorflow: bee and butterfly Part I

homeMachine Learning >Image Processing

Object Detection using Tensorflow: bee and butterflies

  1. Part 1: set up tensorflow in a virtual environment
  2. adhoc functions
  3. Part 2: preparing annotation in PASCAL VOC format
  4. Part 3: preparing tfrecord files
  5. more scripts
  6. Part 4: start training our machine learning algorithm!
  7. COCO API for Windows
  8. Part 5: perform object detection

First preparation

Our objective here is to try using tensorflow object detection API on Windows machine. We will train our model to recognise butterflies and bees. See the following detection on some images that we obtain at the end of a 4-hour training.


We assume no prerequisite knowledge and will go through step by step as much as possible. We use python 3.6. Find a compatible version from the official site here . Follow the instruction and download accordingly.

We will use command line cmd.exe for no special reason, though any other command lines are okay. Try type python in the command line and see if we get into python mode. If the command is not recognizable, set the environment variables properly. In Windows, edit or add the Variable Name to PYTHONPATH and Variable Value the location python 3.6 is installed. Typically this is just C:\Python36. And then, to the variable Path, add both %PYTHONPATH% and %PYTHONPATH%\Scripts.

Let us do this in a virtual environment. We will create a folder that serves as the virtual environment within the Desktop. Create a folder adhoc in the Desktop. Copy the directory path to adhoc (in windows 10 this can simply be found on the top-left of windows explorer) using copy path. In cmd.exe, move into this directory by typing

cd "C:\Users\acer\Desktop\adhoc"

and the above refers to my directory path to this folder adhoc. Install a python package for virtual environment. The second line creates a virtual environment and the third line activates the it. The fourth line just moves us into myproject.

pip install virtualenv
virtualenv myproject
cd myproject

Install the following dependencies in order to use tensorflow. We use tensorflow that will make use of gpu. But otherwise, if we just want to test out with cpu or if your machine does not have a gpu, replace the last line with pip install tensorflow.

pip install Cython
pip install contextlib2
pip install jupyter
pip install matplotlib
pip install pillow
pip install lxml
pip install pandas
pip install tensorflow-gpu==1.10.0

Remark: You can type deactivate to exit the virtual environment.

Another remark: We successfully used tensorflow-gpu version 1.10.0 for this series of tutorial. However, when trying it with version 1.11.0 (latest version at the moment of writing), our machine does not recognize the gpu. Test if GPU is used using this code.

Install CUDA which is required by tensorflow as well.  We used CUDA 9.0 from here. Furthermore, we also need cudnn. Download it from here.  Unzip cudnn and copy everything into where CUDA is installed (or follow the instructions in the website). In my case, it is in “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0”.

When installed, go into python and do

import tensorflow as tf

if there is no problem, skip to the next section “More preparations”. We tested pip install tensorflow on Windows 10 on Virtual Box. When we import tensorflow, we get error asking us to install Microsoft Visual Studio 2015 redistributable. We downloaded it from the given link, but still error occurred. Then we try installing Microsoft Visual Studio Community version. Using Microsoft VS Installer, we installed VS community 2017 with the following packages 1. .NET desktop development and 2. Desktop development with C++ under Windows category. Once completed, tensorflow works.

More preparations

Create the following directories.

+ ...
+ data
  - butterfly_bee_label_map.pbtxt
+ models
  - model
    - faster_rcnn_resnet101_coco.config # See next section

The following is the label map.


item {
id: 1
name: 'butterfly'

item {
id: 2
name: 'bee'

Getting tensorflow object detection API

Download or clone the folder models from tensorflow object detection API here. Copy paste it into the tensorflow folder, so that, in my case, we will have the directory

+ official
+ research
+ ...

To prevent possible error during training later, we might need the following step (see the reference here): go to ~/tensorflow/models/research/object_detection, find model_lib.py. At around line 390, edit category_index.values() to list( category_index.values() ).

The file faster_rcnn_resnet101_coco.config is obtained from inside the models directory


We need to configure all PATH_TO_BE_CONFIGURED in the config file; we will do this later. See part 4.

Also, we need to add the following to our environment variables. If you have PYTHONPATH variable and set path to include PYTHONPATH as instructed earlier, then add to PYTHONPATH the paths to research and research\slim.  In my computer, they are


Now we want to convert protos files into python files in the directory tensorflow/models/objection_detection/protos.

Download the protocol buffer from here. The page may have been updated, so find it by clicking on the Next page in the given link till you find the older release that goes by the name protoc-3.4.0-win32. Try here. Note that we are downloading version 3.4, which is an older release, since newer versions do give problems. Move protoc.exe into some directory you like. In this tutorial I move it to


Go into the command line cmd.exe, then move into the tensorflow/models/research folder using (do adjust the paths accordingly)

cd C:\Users\acer\Desktop\adhoc\myproject\Lib\site-packages\tensorflow\models\research

and then run

"C:\protoc-3.4\bin\protoc.exe" object_detection/protos/*.proto --python_out=.

Notice that now, in tensorflow/models/research/objection_detection/protos, python files (.py) have been generated from .protos files.

Still in the \tensorflow\models\research directory, to check that things are doing fine, do

python object_detection/builders/model_builder_test.py

You will see OK if everything is fine. We will proceed to process our butterflies and bees data in the next part, part II.

Tips: If you encounter error such as No module named ‘absl’, probably you have already exited the virtual environment, i.e. the machine no longer sees the Python installed in the virtual environment. Just make sure to go back into the virtual environment every time you are working on this project, so that all the packages are well managed. From cmd.exe you can cd to myproject, and run Scripts\activate. To deactivate, just type deactivate and click Enter.