Convolutional Neural Network with keras: MNIST

home > Machine Learning

In this post we use Convolutional Neural Network, with VGG-like convnet structure for MNIST problem: i.e. we train the model to recognize hand-written digits. We mainly follow the official keras guide, in this link.

Download MNIST file that has been converted into CSV form; I got it from this link.

The jupyter notebook detailing the attempt in this post is found here by the name keras_test2.ipynb.

Starting with a conclusion: it works pretty well, for a very quick training, the model can recognize hand-written digit with 98% accuracy.

As shown below, our input is 28 by 28 with 1 channel (1 color), since the hand-written digit is stored in a 28 by 28-pixel greyscale image. The layers used are

  1. 2x 2D convolutional layers with 32x 3 by 3 filters  followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used; this is to prevent over-fitting.
  2. 2x 2D convolutional layers with 64x 3 by 3 filters followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used.
  3. Flatten layer just reshapes 2D image-like output from the previous layer to a 1D list of values. The first denses layer has 256 neurons, followed by dropout layer and finally a dense layer of 10 neurons corresponding to 10 classes or 10 different digits in MNIST. All activation functions are ReLu except the last one, softmax, as usual.

See the link here on how the data is prepared for training (i.e. the missing code shown as … partial code… below).

# ... partial code ...

model = Sequential()
# input: 28x28 images with 1 channels -> (28 ,28, 1) tensors.
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.fit(x_train, y_train, batch_size=16, epochs=10)
model.evaluate(x_test, y_test, batch_size=32)

For a quick training, this model obtains a very high accuracy of 0.98, as shown below.

Epoch 1/10
6400/6400 [==============================] - 6s 904us/step - loss: 0.8208 - acc: 0.7206
Epoch 2/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.2427 - acc: 0.9266
Epoch 3/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.1702 - acc: 0.9483
Epoch 4/10
6400/6400 [==============================] - 2s 380us/step - loss: 0.1353 - acc: 0.9589
Epoch 5/10
6400/6400 [==============================] - 2s 373us/step - loss: 0.1117 - acc: 0.9650
Epoch 6/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.1080 - acc: 0.9697
Epoch 7/10
6400/6400 [==============================] - 2s 374us/step - loss: 0.0881 - acc: 0.9734
Epoch 8/10
6400/6400 [==============================] - 2s 375us/step - loss: 0.0880 - acc: 0.9736 1s - los
Epoch 9/10
6400/6400 [==============================] - 2s 377us/step - loss: 0.0690 - acc: 0.9766
Epoch 10/10
6400/6400 [==============================] - 2s 373us/step - loss: 0.0686 - acc: 0.9800
100/100 [==============================] - 0s 940us/step

 

Convolutional Neural Network with VGG Architecture

home > ML Concepts

This post seeks to illustrate the difference between Convolutional Neural Network (CNN) and deep neural network (DNN) and hopes to add a little bit more clarity in the CNN process.

CNN vs DNN

Figure 1. Comparison between a regular deep neural network with the convolutional neural network.

CNN is a variant of DNN with the constraint that the input is an image, or image-like. More technically,

  1. In a DNN, every layer is fully connected to the layer before. This means every neuron in layer n+1 is affected by every neuron in layer n via linear combination (with bias, and then activation).
  2. For CNN, such as in VGG architecture, only a few layers near the end are fully connected. Using the receptive field, a single neuron in layer n+1 is connected to some small number of neurons in the previous layer, corresponding to a small region in the visual space.

The above is illustrated in figure 1. In DNN, each neuron is fully connected to the image. With a large number of neurons, there will be a large number of weights to compute, not to mention that there will be a lot more weights between neurons from one layer to the next. In CNN, on the other hand, each neuron will be “in charge of” a small region in the image.

You might have seen the illustration for VGG architecture like figure 2 (I took the images from here; do visit the original sources of the image). VGG is an implementation of CNN by the Visual Geometry Group, Oxford (official link here). Figure 3 illustrates the process of convolution in the first layer, while figure 4 illustrates the process through fully connected layers. In essence, both are performing linear sums weighted by filters in figure 3 and the usual weights in figure 4 (like DNN) respectively.

VGG1VGG2

Figure 2. VGG architecture.

conv layer

Figure 3. Illustration for convolutional layer.fc layer

Figure 4. Illustration for fully connected layers.

Object Detection using Tensorflow: bee and butterfly Part V, faster

home>ML>Image Processing

This post is a faster alternative to the following post: Object Detection using Tensorflow: bee and butterfly Part V.

In part IV, we end with completing the training of our faster R-CNN model. Since we ran 2000 training steps, the last produced model checkpoints will be model.ckpt-2000. We need to make a frozen graph out of it to be able to successfully utilize it for prediction (see the blue bolded part of the code).

Freezing the graph

Let’s go into command line cmd.exe. Remember to go into the virtual environment if you started with one, as we instructed.

cd C:\Users\acer\Desktop\adhoc\myproject\Lib\site-packages\tensorflow\models\research
SET INPUT_TYPE=image_tensor
SET TRAINED_CKPT_PREFIX="C:\Users\acer\Desktop\adhoc\myproject\models\model\model.ckpt-2000"
SET PIPELINE_CONFIG_PATH="C:\Users\acer\Desktop\adhoc\myproject\models\model\faster_rcnn_resnet101_coco.config"
SET EXPORT_DIR="C:\Users\acer\Desktop\adhoc\myproject\models\export"
python object_detection/export_inference_graph.py --input_type=%INPUT_TYPE% --pipeline_config_path=%PIPELINE_CONFIG_PATH% --trained_checkpoint_prefix=%TRAINED_CKPT_PREFIX% --output_directory=%EXPORT_DIR%

Upon successful completion, the following will be produced in the directory .

adhoc/myproject/models/export
+ saved_model
  + variables
  - saved_model.pb
+ checkpoint
+ frozen_inference_graph.pb
+ pipeline.config
+ model.ckpt.data-00000-of-00001
+ model.ckpt.index
+ model.ckpt.meta

Notice that three ckpt files are created. We can use this for further training by replacing the 3 ckpt files from part 4.

frozen_inference_graph.pb is the file we will be using for prediction. We just need to run the following python file with suitable configuration. Create the following directory and put all the images that contain butterflies or bees which you want the algorithm to detect into the folder for_predict. In this example, we use 6 images namely “1.jpeg”, “2.jpeg”, …, “6.jpeg” as .

adhoc/myproject/
+ ...
+ for_predict

Finally, to perform prediction, just run the following using cmd.exe after moving into adhoc/myproject folder where we place our prediction2.py (see the script below).

python prediction2.py

and 1_MARKED.png, for example, will be produced in for_predict, with boxes showing the detected object, either butterfly or bee.

See the blue highlight below; most configurations that need to be done are in blue. The variable TEST_IMAGES_NAMES contains the name of the files we are going to predict. You can rename the images or just change the variable. Note that in this code, the variable filetype stores the file type of images we are predicting. For each prediction, thus, we can only perform prediction for the same type of images. Of course we can do better. Modify the script accordingly.

prediction2.py

# from distutils.version import StrictVersion
import os, sys,tarfile, zipfile
import numpy as np
import tensorflow as tf
import six.moves.urllib as urllib
from PIL import Image
from io import StringIO
from matplotlib import pyplot as plt
from collections import defaultdict
from object_detection.utils import ops as utils_ops

import time
start_all = time.time()
# Paths settings
THE_PATH = "C:/Users/acer/Desktop/adhoc/myproject/Lib\\site-packages/tensorflow/models/research"
sys.path.append(THE_PATH)
sys.path.append(THE_PATH+"/object_detection")
PATH_TO_FROZEN_GRAPH = "C:/Users/acer/Desktop/adhoc/myproject/models/export/frozen_inference_graph.pb" # "C:/Users/ericotjoa/Desktop/I2R/ODRT/models/export/frozen_inference_graph.pb"
filetype = '.jpeg'
PATH_TO_LABELS = "C:/Users/acer/Desktop/adhoc/myproject/data/butterfly_bee_label_map.pbtxt"
PATH_TO_TEST_IMAGES_DIR = 'C:/Users/acer/Desktop/adhoc/myproject/for_predict'
TEST_IMAGE_NAMES = [str(i) for i in range(1,7)]
TEST_IMAGE_PATHS = [''.join((PATH_TO_TEST_IMAGES_DIR, '\\', x, filetype)) for x in TEST_IMAGE_NAMES]
# print("test image path = ", TEST_IMAGE_PATHS)
IMAGE_SIZE = (12, 8) # Size, in inches, of the output images.
NUM_CLASSES = 90

from utils import label_map_util
from utils import visualization_utils as vis_util
sys.path.append("..")
# MODEL_NAME = 'faster_rcnn_resnet101_pets'

start = time.time()
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')

label_map  = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
print(category_index)
end=time.time()

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  # return np.array(image.getdata()).reshape(
  #     (im_height, im_width, 3)).astype(np.uint8)
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

timesetX=[]
def run_inference_for_single_image(image, graph):
  with graph.as_default():  
    # Get handles to input and output tensors
    ops = tf.get_default_graph().get_operations()
    all_tensor_names = {output.name for op in ops for output in op.outputs}
    tensor_dict = {}
    for key in [
        'num_detections', 'detection_boxes', 'detection_scores',
        'detection_classes', 'detection_masks'
    ]:
      tensor_name = key + ':0'
      if tensor_name in all_tensor_names:
        tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
            tensor_name)
    if 'detection_masks' in tensor_dict:
      # The following processing is only for single image
      detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
      detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
      # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
      real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
      detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
      detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
      detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
          detection_masks, detection_boxes, image.shape[0], image.shape[1])
      detection_masks_reframed = tf.cast(
          tf.greater(detection_masks_reframed, 0.5), tf.uint8)
      # Follow the convention by adding back the batch dimension
      tensor_dict['detection_masks'] = tf.expand_dims(
          detection_masks_reframed, 0)
    image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

    # Run inference
    start0X = time.time()
    output_dict = sess.run(tensor_dict,
                           feed_dict={image_tensor: np.expand_dims(image, 0)})
    end0X=time.time()
    timesetX.append(end0X-start0X)
    # all outputs are float32 numpy arrays, so convert types as appropriate
    output_dict['num_detections'] = int(output_dict['num_detections'][0])
    output_dict['detection_classes'] = output_dict[
        'detection_classes'][0].astype(np.uint8)
    output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
    output_dict['detection_scores'] = output_dict['detection_scores'][0]
    if 'detection_masks' in output_dict:
      output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict

timeset=[]
timeset2=[]
config = tf.ConfigProto()
config.intra_op_parallelism_threads = 44
config.inter_op_parallelism_threads = 44
with tf.Session(config=config,graph=detection_graph) as sess:
  for image_path, image_name in zip(TEST_IMAGE_PATHS, TEST_IMAGE_NAMES):
    image = Image.open(image_path).convert('RGB') # !!
    # the array based representation of the image will be used later in order to prepare the
    # result image with boxes and labels on it.
    image_np = load_image_into_numpy_array(image)
    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    # Actual detection.

    start0 = time.time() # bottleneck in the main detection, 22s per img
    output_dict = run_inference_for_single_image(image_np, detection_graph)
    end0=time.time()
    timeset.append(end0-start0)

    start1 = time.time()
    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        output_dict['detection_boxes'],
        # ercx!
        # each element in DETECTION BOX is [ymin, xmin, ymax, xmax]
        # do consider the following
        #           im_width, im_height = image.size
        #       if use_normalized_coordinates:
        #         (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
        #                                       ymin * im_height, ymax * im_height)
        output_dict['detection_classes'],
        output_dict['detection_scores'],
        category_index,
        instance_masks=output_dict.get('detection_masks'),
        use_normalized_coordinates=True,
        line_thickness=2,
        min_score_thresh = 0.05)
    # print("detection_boxes:")
    # print(output_dict['detection_boxes'])
    # print(type(output_dict['detection_boxes']),len(output_dict['detection_boxes']))
    # print('detection_classes')
    # print(output_dict['detection_classes'])
    # print(type(output_dict['detection_classes']),len(output_dict['detection_classes']))
    # print('detection_scores')
    # print(output_dict['detection_scores'], len(output_dict['detection_scores']))
    print('\n**************** detection_scores\n')
    print(output_dict['detection_scores'][1:10])
    plt.figure(figsize=IMAGE_SIZE)
    # plt.imshow(image_np)
    plt.imsave(''.join((PATH_TO_TEST_IMAGES_DIR, '\\',image_name,"_MARKED", filetype)), image_np)
    end1=time.time()
    timeset2.append(end1-start1)


print("time 1 = ", end-start)
print("time each:")
for i in range(len(timeset)):
  print(" + ",timeset[i])
  # print(" + ",timeset[i],":",timesetX[i], " : ",timeset2[i])
end_all=time.time()
print("time all= ", end_all-start_all)

# plt.show()

The results should be similar to the ones in Object Detection using Tensorflow: bee and butterfly Part V. The only difference is the processing speed. I used NVIDIA GeForce GTX 1050 and the performance is as the following.

time 1 = 2.0489230155944824
time each:
+ 18.73304057121277
+ 1.6632516384124756
+ 1.7054014205932617
+ 1.5573828220367432
+ 1.6851420402526855
+ 0.5358219146728516
time all= 34.96004343032837

Using previous code, the speed will be ~18 seconds for each image. Better GPU can yield even faster performance. On another project, using GTX 1080 on images with 1920×1080 pixels, the time can be as fast as 0.2s per image second image onwards. Using CPU only, one example I tried yield a performance of ~4.5s per image second image onwards.

Object Detection using Tensorflow: bee and butterfly Part II, faster

home>ML>Image Processing

Summary so far: Given images of butterflies and bees, mark the butterflies and the bees with white color RGB(255,255,255), create rotated copy of the images, annotate the position of these butterflies and bees with respect to the images in PASCAL VOC format, and finally put them into train and test folders.

Using python package kero version 4.3.0, we can increase the speed of image pre-processing using clone_to_annotate_faster() to do the processes described in the summary above. We will use this as the alternative to the tutorial from Part II of bees and butterflies object detection, and the front section of Part III.

Tips: do remember to activate the virtual environment if you have deactivated it. Virtual environment helps ensure that packages we download may not interfere with the system or other projects, especially when we need older version of some packages.

Create the following directory. I named it keropb, you can name it anything.

adhoc/myproject/images
+ train 
+ test
adhoc/keropb
+ butterflies_and_bees
  + Butterflies
    - butterflyimage1.png 
    - ...
  + Butterflies_canvas
    - butterflyimage1.png
  + Bees
    - beeimage1.png
    - ...
  + Bees_canvas
    - beeimage1.[ng
    - ...
+ do_clone_to_annotate.py
+ do_convert_to_PASCALVOC.py
+ do_move_ALL_to_train.py
+ do_move_a_fraction.py
+ adhoc_functions.py

Note: the image folders and the corresponding canvas folders can be downloaded here. Also, do not worry, the last 5 python files will be provided along the way.

We store all our butterflies images in the folder Butterflies and bee images in the folder Bees. The _canvas folders are exact replicas of the corresponding folders. You can copy-paste both Butterflies and Bees folders and rename them. In the canvas folders, however, we will mark out the butterflies and the bees. In a sense, we are teaching the algorithm which objects in the pictures are butterflies, and which are bees. To mark out a butterfly, use white (255,255,255) RGB to block out the butterfly. Ok, this is easy to do, just use the good ol’ Paint program and use white color to paint over the butterfly, or use eraser. See the example below. Note that the images have exactly the same names.

Tips: if the image contains white patches, they might be wrongly detected as a butterfly too. This is bad. In that case, paint this irrelevant white patches with other obvious color, such as black.

butterfly.jpg

Make sure kero is installed or upgrade it with

pip install --upgrade kero

Create and run the following script do_clone_to_annotate.py from adhoc/keropb i.e. in cmd.exe, cd into keropb and run the command

python do_clone_to_annotate.py

Tips: We have set check_missing_mode=False. It is good to set it to True first. This helps us check if each image in Butterflies have a corresponding image in Butterflies_canvas. Before processing, we want to identify missing images so that we can fix them before proceeding. If everything is fine, “ALL GREEN. No missing files.” will be printed. Then set it to False and run it again.

do_clone_to_annotate.py

import kero.ImageProcessing.photoBox as kip

this_folder = "butterflies_and_bees\\Butterflies" # "butterflies_and_bees\\Bees"
tag_folder =  "butterflies_and_bees\\Butterflies_canvas" # "butterflies_and_bees\\Bees_canvas"

gsw=kip.GreyScaleWorkShop()
rotate_angle_set = [30,60,90,120,150,180] # None
annotation_name = "butterfly" # "bee"
gsw.clone_to_annotate_faster(this_folder, tag_folder,1,annotation_name, 
	order_name="imgBUT", tag_name="imgBUT",
	check_missing_mode=False,
	skip_ground_truth=False,
	significant_fraction=0.01,
	rotate_angle_set=rotate_angle_set,
	thresh=250)

Redo the above for Bees folder as well changing the blue bolded values above correspondingly. See that Bees_LOG.txt and Butterflies_LOG.txt are created also, listing how the image files are renamed.

Create PASCAL VOC annotation format in xml

Now, still in keropb folder, create adhoc_functions.py as shown in Object Detection using Tensorflow: adhoc functions, adjust the approriate paths in do_convert_to_PASCALVOC.py and then run the following

python do_convert_to_PASCALVOC.py

do_convert_to_PASCALVOC.py

import adhoc_functions as af

for x in ["Butterflies","Bees"]:
	annot_foldername = "C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\"+ x + "_ANNOT"
	annot_filetype = ".txt"
	img_foldername = "C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\"+ x +"_CLONE"
	img_filetype = ".png"
	
	af.mass_convert_to_PASCAL_VOC_xml(annot_foldername,annot_filetype,
	annot_foldername,img_filetype)

By now, all the images required are available with annotations in PASCAL VOC, written in .xml files.

Train and test split

Now we split the images (with annotations) to training folder (90% of all the images) and test folder (the remaining 10%). We can further split out 10% out of the 90% for external testing if you prefer. Our tutorial will use 90% for training and 10% for validation, without splitting 10% out of the the 90%.

Run the following.

python do_move_ALL_to_train.py

do_move_ALL_to_train.py

import adhoc_functions, os

train_folder = "C:\\Users\\acer\\Desktop\\adhoc\\myproject\\images\\train"
test_folder = "C:\\Users\\acer\\Desktop\\adhoc\\myproject\\images\\test"
if not os.path.exists(train_folder):
	print("Creating train folder.")
	os.mkdir(train_folder)
if not os.path.exists(test_folder):
	print("Creating test folder.")
	os.mkdir(test_folder)

for i in ["Butterflies","Bees"]:
	folder_name_CLONE="C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\"+i+"_CLONE"
	folder_name_ANNOT="C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\"+i+"_ANNOT"
	rotate_angle_set=[30,60,90,120,150,180]
	folder_target = train_folder
	adhoc_functions.move_ALL_to_train(folder_name_CLONE,folder_name_ANNOT,rotate_angle_set, folder_target)

print("Done.")

The folder train will be populated. Now we get 10% into folder test. Run the following

python do_move_a_fraction.py

do_move_a_fraction.py

import adhoc_functions as af

src= "C:\\Users\\acer\\Desktop\\adhoc\\myproject\\images\\train"
tgt= "C:\\Users\\acer\\Desktop\\adhoc\\myproject\\images\\test"
af.move_some_percent(src,tgt)

Done! To proceed with the next step, which is model training, see the section “Conversion to tfrecords” (and skip “Train and test split” section) in Part III of bees and butterflies object detection.

clone_to_annotate_faster()

home > kero > Documentation

Given a folder of images and a tag folder containing the same images, each of which is marked with white color, RGB=(255,255,255), this function creates

  1. a clone of the folder,
  2. a folder of its ground truth images based on the tag folder,
  3. a folder containing the images in the folder with bounding boxes based on the ground truth images along with text files containing
    image_label img-width img-height xmin ymin xmax ymax
    where (xmin,ymin) the top-left coordinates of the bounding box and (xmax,ymax) bottom-right bounding box.
def clone_to_annotate_faster(self,this_folder, tag_folder,starting_label,annotation_name,
        order_name="img",
        tag_name="imggt",
        check_missing_mode=False,
        rotate_angle_set=None,
        skip_ground_truth=False,
        significant_fraction=0.01,
        thresh=254,
        scale="Auto",
        discard_edge=True,
        edge_offset=2):
  return

 

this_folder String. The name of the folder with images.
tag_folder String. The name of the folder with images. If each image in the tag_folder has a corresponding image of the same name and type, then clone folder, clone tag folder, ground truth image folder and annotation folder will be created with their corresponding contents.
starting_label Integer. The sequence of numbers will start with this integer.
annotation_name String. The image label for the white region marked in the tag folder. This will be saved in annotation .txt files.

In example 1, this is “butterfly”. This means that we label the object marked with white a “butterfly”.

order_name String. The clone of this_folder will be relabelled with prefix specified by this string.

Default value =”img”

tag_name String. The clone of tag_folder will be relabelled with prefix specified by this string.

Default value =”imggt”

check_missing_mode Boolean. If True, cloning process of the folders are not performed. The file names of images in this_folder that do not have the corresponding images in the tag_folder will be printed.

Default value =False

rotate_angle_set List of float. Each float is a value in degree with which an image is rotated.

Default value =None

skip_ground_truth Boolean. Set to True if the ground truth images have been created in the manner spawn_ground_truth() spawns them. The function will continue with annotations.
significant_fraction Float. This argument takes in values between 0 and 1.0. It specifies the fraction of area relative to the area of whole matrix, above which a component will be considered as a component. Smaller than this value, the component will be treated as noise. This is an argument to the function get_connected_components().

Default value = .001

thresh Integer, from 0 to 255. This specifies the RGB values below which the ground truth color is treated converted to black, RGB=(0,0,0) and otherwise white, RGB=(255,255,255).

For example, if the value is set to 244, a pixel with (255,254,246) is converted to white since 244<255,254 and 246 while a pixel with (20,40,120) is converted to black.

Default value = 254

scale “Auto”, (Integer,Integer) or None. Annotation of images are computed from down-scaled images to improve the processing speed (unless scale=None). If set to “Auto”, the image will be downsized to (200,150) for processing.

This is an argument to the function multiple_scaled_box_positions().

Default value = “Auto”

Note: The actual image is not changed, only that the annotation positions are re-computed back from down-scaled images, giving potential loss of accuracy.

discard_edge Boolean. True:
While annotating objects in the image, objects whose bounding boxes appear at the edge will not be considered. More technically, if the coordinates of a bounding box are within edge_offset from image, then it will not be considered.  For example, if image width is 400 pixels, edge_offset=5, then a bounding box that lies in x position = 396 or 2 will be ignored. 

Default value = True

edge_offset Integer. Number of pixels from the edge of the image that is considered edge of the image. If discard_edge=True, any object whose bounding box is at least partly in the edge area will be ignored.

Default value = 2

Tips: If annotation fails for one reason or another after ground truth image generation is complete, then make sure to set skip_ground_truth=True before rerunning the function, so that we do not waste time re-spawning the ground truth images.

Tips: For images whose objects are very small, setting a small scale might be a bad choice, since the position of annotation boxes might loss a lot of accuracy during rescaling.

Example usage 1.

Download the example here and put them in the working directory under the folder /bb.

import kero.ImageProcessing.photoBox as kip

this_folder = "bb\\Butterflies"
tag_folder =  "bb\\Butterflies_canvas"

gsw=kip.GreyScaleWorkShop()
rotate_angle_set = [30,60,90,120,150,180] # None
annotation_name = "butterfly"
gsw.clone_to_annotate_faster(this_folder, tag_folder,1,annotation_name, 
	order_name="img",
        tag_name="img",
	check_missing_mode=False,
	rotate_angle_set=rotate_angle_set,
	skip_ground_truth=False,
	thresh=250)

This function will call spawn_ground_truth(), i.e. create ground truth image (figure 1 top-right) of the image (figure 1 top-left) based on the corresponding image from tag folder (figure 1 top-center), and furthermore create annotations for each images. Samples of images from the folders are shown below.

clone_faster

Figure 1.

A folder containing rotated figures (figure 2) will be created. Also, a txt file will be created for each rotated copy of the image. The bounding boxes are the thin green rectangles.

clone_faster_out

Figure 2.

Speed comparison

This function is alternative to clone_to_annotate(). It is way faster, though the precision of bounding boxes for rotated images will decline. Download the example here and put

import kero.ImageProcessing.photoBox as kip
import time

start = time.time()

this_folder = "bb\\Butterflies"
tag_folder =  "bb\\Butterflies_canvas"

gsw=kip.GreyScaleWorkShop()
rotate_angle_set = [0,30,60] # None
annotation_name = "butterfly"
gsw.clone_to_annotate(this_folder, tag_folder,1,annotation_name, 
	order_name="img",
        tag_name="img",
	check_missing_mode=False,
	skip_ground_truth=False,
	significant_fraction=0.01,
	rotate_angle_set=rotate_angle_set,
	thresh=250)

end = time.time()
elapsed = end - start
print(elapsed)

# 402.4 seconds (6 mins 40 seconds)

The following shows similar process using this faster function. It is generally as faster n times than the previous script, where n is the number of rotate_angle_set. This is because the bounding boxes of rotated images are derived from the original images, though this may lead to the loss of precision.

import kero.ImageProcessing.photoBox as kip
import time

start = time.time()

this_folder = "bb\\Butterflies"
tag_folder =  "bb\\Butterflies_canvas"

gsw=kip.GreyScaleWorkShop()
rotate_angle_set = [0,30,60] # None
annotation_name = "butterfly"
gsw.clone_to_annotate_faster(this_folder, tag_folder,1,annotation_name, 
	order_name="img",
        tag_name="img",
	check_missing_mode=False,
	skip_ground_truth=False,
	significant_fraction=0.01,
	rotate_angle_set=rotate_angle_set,
	thresh=250)

end = time.time()
elapsed = end - start
print(elapsed)

# 143.5 seconds ( 2 min 23.5 seconds)

kero version: 0.4.4 and above

rotate_rect()

home > kero > Documentation

Given a bounding box inside an image, this function returns the bounding box of the rotated image.

kero.ImageProcessing.photoBox.py

def rotate_rect(coord, theta, img_width, img_height):
  return [xmin0,ymin0,xmax0,ymax0]
coord List of integers, coord = [xmin,ymin,xmax,ymax] is the coordinates of the rectangles in pixel. [xmin, ymin] is the top-left corner of the image while [xmax, ymax] is the bottom-right corner.
theta Float. Angle of rotation in radians.
img_width Integer. Image width in number of pixels.
Img_height Integer. Image height in number of pixels.
return coord_out List of integers, coord_out = [xmin,ymin,xmax,ymax]  is the rotated coordinates.

Example usage 1

Both images boxrotate.png and boxrotate2.png can be downloaded here.  The results for both images are partially shown here. The original bounding box is in green. The images are then rotated, and the red bounding boxes are the new bounding boxes computed from the green boxes and the angles of rotation.

boxrot.png

import numpy as np
import kero.ImageProcessing.photoBox as kip
import cv2, os

filename_set= []
coord_set = []

filename = "boxrotate.png"
xmin = 42
ymin = 40
xmax = 145 # 200
ymax = 125 # 250
coord = [xmin,ymin,xmax,ymax]
filename_set.append(filename)
coord_set.append(coord)

filename = "boxrotate2.png"
coord = [66,58,244,193]
filename_set.append(filename)
coord_set.append(coord)

count = 1
foldercount = 1
for filename in filename_set:
	if not os.path.exists(''.join(("result",str(foldercount)))):
		os.mkdir(''.join(("result",str(foldercount))))
	coord = coord_set[foldercount-1]
	gg=cv2.imread(filename)

	img_height, img_width, _ = gg.shape

	cv2.rectangle(gg, (coord[0], coord[1]), (coord[2],coord[3]),(0,255,0),3)

	theta_set = np.linspace(10,350,20)
	theta_set = [x*np.pi/180 for x in theta_set]
	for theta in theta_set:
		gg2=cv2.imread(filename)
		rows,cols,_ = gg2.shape
		M = cv2.getRotationMatrix2D((cols/2,rows/2),theta*180/np.pi,1)
		dst = cv2.warpAffine(gg2,M,(cols,rows))
		[xmin0,ymin0,xmax0,ymax0] = kip.rotate_rect(coord, theta, img_width, img_height)
		cv2.rectangle(dst, (xmin0, ymin0), (xmax0,ymax0),(0,0,255),3)
		cv2.imwrite(''.join(("result",str(foldercount),"/rotate",str(count),".png")), dst)
		cv2.imwrite(''.join(("result",str(foldercount),"/orign.png")),gg)
		count = count+1
	foldercount = foldercount + 1

kero version: 0.4.2 and above

rotate_wrt_img_center()

home > kero > Documentation

kero.ImageProcessing.photoBox.py

def rotate_wrt_img_center(coord, theta, img_width, img_height):
  return coord_out

This function rotates a coordinate with respect to image center.

coord List of integers, coord = [x,y] is the coordinates of the image in pixels. [0,0] is the top-left corner of the image while [width, height] is the bottom-right corner.
theta Float. Angle of rotation in radians.
img_width Integer. Image width in number of pixels.
Img_height Integer. Image height in number of pixels.
return coord_out List of integers, coord_out = [x,y] is the rotated coordinates.

Example Usage 1

import numpy as np
import kero.ImageProcessing.photoBox as kip
import cv2

gg=cv2.imread("boxrotate.png")
img_height, img_width, _ = gg.shape
# print(img_height, img_width)
xmin = 100
ymin = 50
xmax = 110 # 200
ymax = 60# 250
cv2.rectangle(gg, (xmin, ymin), (xmax,ymax),(0,255,0),1)

theta_set = np.linspace(10,330,20)
theta_set = [x*np.pi/180 for x in theta_set]

for theta in theta_set:
	[xmin0, ymin0] = kip.rotate_wrt_img_center([xmin,ymin,xmax,ymax], theta, img_width, img_height)
	print(xmin0,ymin0)
	cv2.rectangle(gg, (xmin0, ymin0), (xmin0+10,ymin0+10),(0,0,255),1)

cv2.imshow("aa",gg)
cv2.waitKey(0)

The above demonstrates how the green rectangle object is rotated wrt origin, which is the center of image.

rotimg

kero version: 0.4.2 and above

clone_to_annotate()

home > kero > Documentation

Given a folder of images and a tag folder containing the same images, each of which is marked with white color, RGB=(255,255,255), this function creates

  1. a clone of the folder,
  2. a folder of its ground truth images based on the tag folder,
  3. a folder containing the images in the folder with bounding boxes based on the ground truth images along with text files containing
    image_label img-width img-height xmin ymin xmax ymax
    where (xmin,ymin) the top-left coordinates of the bounding box and (xmax,ymax) bottom-right bounding box.

Tips. Consider using clone_to_annotate_faster().

kero.ImageProcessing.photoBox.py

def clone_to_annotate(self,this_folder, tag_folder,starting_label,annotation_name,
        order_name="img",
        tag_name="imggt",
        check_missing_mode=False,
        rotate_angle_set=None,
        skip_ground_truth=False,
        significant_fraction=0.01,
        thresh=254,
        scale="Auto"):
  return

 

this_folder String. The name of the folder with images.
tag_folder String. The name of the folder with images. If each image in the tag_folder has a corresponding image of the same name and type, then clone folder, clone tag folder, ground truth image folder and annotation folder will be created with their corresponding contents.
starting_label Integer. The sequence of numbers will start with this integer.
annotation_name String. The image label for the white region marked in the tag folder. This will be saved in annotation .txt files.

In example 1, this is “butterfly”. This means that we label the object marked with white a “butterfly”.

order_name String. The clone of this_folder will be relabelled with prefix specified by this string.

Default value =”img”

tag_name String. The clone of tag_folder will be relabelled with prefix specified by this string.

Default value =”imggt”

check_missing_mode Boolean. If True, cloning process of the folders are not performed. The file names of images in this_folder that do not have the corresponding images in the tag_folder will be printed.

Default value =False

rotate_angle_set List of float. Each float is a value in degree with which an image is rotated.

Default value =None

skip_ground_truth Boolean. Set to True if the ground truth images have been created in the manner spawn_ground_truth() spawns them. The function will continue with annotations.
significant_fraction Float. This argument takes in values between 0 and 1.0. It specifies the fraction of area relative to the area of whole matrix, above which a component will be considered as a component. Smaller than this value, the component will be treated as noise. This is an argument to the function get_connected_components().

Default value = .001

thresh Integer, from 0 to 255. This specifies the RGB values below which the ground truth color is treated converted to black, RGB=(0,0,0) and otherwise white, RGB=(255,255,255).

For example, if the value is set to 244, a pixel with (255,254,246) is converted to white since 244<255,254 and 246 while a pixel with (20,40,120) is converted to black.

Default value = 254

scale “Auto”, (Integer,Integer) or None. Annotation of images are computed from down-scaled images to improve the processing speed (unless scale=None). If set to “Auto”, the image will be downsized to (200,150) for processing.

This is an argument to the function multiple_scaled_box_positions().

Default value = “Auto”

Note: The actual image is not changed, only that the annotation positions are re-computed back from down-scaled images, giving potential loss of accuracy.

Tips: If annotation fails for one reason or another after ground truth image generation is complete, then make sure to set skip_ground_truth=True before rerunning the function, so that we do not waste time re-spawning the ground truth images.

Tips: For images whose objects are very small, setting a small scale might be a bad choice, since the position of annotation boxes might loss a lot of accuracy during rescaling.

Example usage 1.

Download the example here and put them in the working directory under the folder /bb.

import kero.ImageProcessing.photoBox as kip

this_folder = "bb\\Butterflies"
tag_folder =  "bb\\Butterflies_canvas"

gsw=kip.GreyScaleWorkShop()
rotate_angle_set = [0,30,60,90,120,150,180] # None
annotation_name = "butterfly"
gsw.clone_to_annotate(this_folder, tag_folder,1,annotation_name, check_missing_mode=False,rotate_angle_set=rotate_angle_set)

This function will call spawn_ground_truth(), i.e. create ground truth image (figure 1 top-right) of the image (figure 1 top-left) based on the corresponding image from tag folder (figure 1 top-center), and furthermore create annotations for each images. Samples of images from the folders are shown below.

gtspawn

Figure 1.

A folder containing rotated figures (figure 2) will be created. Also, a txt file will be created for each rotated copy of the image. The bounding boxes are the thin green rectangles.

annot

Figure 2.

kero version: 0.4.2 and above

spawn_ground_truth()

home > kero > Documentation

Given a folder of images and a tag folder containing the same images, marked with white color, RGB=(255,255,255), this function creates a clone of the folder and a folder of its ground truth images based on the tag folder.

kero.ImageProcessing.photoBox.py

class GreyScaleWorkShop:
  def spawn_ground_truth(self,this_folder, tag_folder,starting_label,
        order_name="img",
        tag_name="img",
        check_missing_mode=False,
        rotate_angle_set=None,
        thresh=254)
    return

 

this_folder String. The name of the folder with images.
tag_folder String. The name of the folder with images. If each image in the tag_folder has a corresponding image of the same name and type, then the rotated image clone and ground truth images will be created and relabelled accordingly.
starting_label Integer. The sequence of numbers will start with this integer.
order_name String. The clone of this_folder will be relabelled with prefix specified by this string.

Default value =”img”

tag_name String. The clone of tag_folder will be relabelled with prefix specified by this string.

Default value =”img”

check_missing_mode Boolean. If True, cloning process of the folders are not performed. The file names of images in this_folder that do not have the corresponding images in the tag_folder will be printed.

Default value =False

rotate_angle_set List of float. Each float is a value in degree with which an image is rotated.

Default value =None

thresh Integer, from 0 to 255. This specifies the RGB values below which the ground truth color is treated converted to black, RGB=(0,0,0) and otherwise white, RGB=(255,255,255).

For example, if the value is set to 244, a pixel with (255,254,246) is converted to white since 244<255,254 and 246 while a pixel with (20,40,120) is converted to black.

Default value = 254

Example usage 1

Download the example here and put them in the working directory under the folder /bb.

import kero.ImageProcessing.photoBox as kip

this_folder = "bb\\Butterflies"
tag_folder =  "bb\\Butterflies_canvas"

gsw=kip.GreyScaleWorkShop()
rotate_angle_set = [0,30,60,90,120,150,180] # None
gsw.spawn_ground_truth(this_folder, tag_folder,1, check_missing_mode=False,rotate_angle_set=rotate_angle_set)

The _canvas folder (top-middle) marks out the butterfly with white colored marker. The ground truth is produced as shown in the top-right.

gtspawn.JPG

Given a folder of images and a tag folder containing the same images, marked with white color, RGB=(255,255,255), this function creates a clone of the folder and a folder of its ground truth images based on the tag folder.

kero.ImageProcessing.photoBox.py

class GreyScaleWorkShop:
  def spawn_ground_truth(self,this_folder, tag_folder,starting_label,
        order_name="img",
        tag_name="img",
        check_missing_mode=False,
        rotate_angle_set=None,
        thresh=254)
    return

 

this_folder String. The name of the folder with images.
tag_folder String. The name of the folder with images. If each image in the tag_folder has a corresponding image of the same name and type, then the rotated image clone and ground truth images will be created and relabelled accordingly.
starting_label Integer. The sequence of numbers will start with this integer.
order_name String. The clone of this_folder will be relabelled with prefix specified by this string.

Default value =”img”

tag_name String. The clone of tag_folder will be relabelled with prefix specified by this string.

Default value =”img”

check_missing_mode Boolean. If True, cloning process of the folders are not performed. The file names of images in this_folder that do not have the corresponding images in the tag_folder will be printed.

Default value =False

rotate_angle_set List of float. Each float is a value in degree with which an image is rotated.

Default value =None

thresh Integer, from 0 to 255. This specifies the RGB values below which the ground truth color is treated converted to black, RGB=(0,0,0) and otherwise white, RGB=(255,255,255).

For example, if the value is set to 244, a pixel with (255,254,246) is converted to white since 244<255,254 and 246 while a pixel with (20,40,120) is converted to black.

Default value = 254

Example usage 1

Download the example here and put them in the working directory under the folder /bb.

import kero.ImageProcessing.photoBox as kip

this_folder = "bb\\Butterflies"
tag_folder =  "bb\\Butterflies_canvas"

gsw=kip.GreyScaleWorkShop()
rotate_angle_set = [0,30,60,90,120,150,180] # None
gsw.spawn_ground_truth(this_folder, tag_folder,1, check_missing_mode=False,rotate_angle_set=rotate_angle_set)

The _canvas folder (top-middle) marks out the butterfly with white colored marker. The ground truth is produced as shown in the top-right.

gtspawn.JPG

kero version: 0.5.1 and above

tag_rename_clone()

home > kero > Documentation

kero.ImageProcessing.photoBox.py

def tag_rename_clone(this_folder, tag_folder,starting_label,
    order_name="img",tag_name="imggt",filetype=".png",
    clone_name="_CLONE",tag_clone_name="_TAG",
    dump_folder="this_dump",
    check_missing_mode=False,
    rotate_angle_set=None):
  return

Description. Cloning folder for image pre-processing. See example usage 1.

Rotation function. When the variable rotate_angle_set is set to a list of float [a1, a2,…] , then each image in the folder will be cloned, and then copies of the image rotated with angles a1, a2, … will be created in the clone folder and labelled along the sequence of numbers as well.

this_folder String. The name of the folder with images.
tag_folder String. The name of the folder with images. If each image in the tag_folder has a corresponding image of the same name and type, then the image will be cloned into a clone folder and tag clone folder and relabelled accordingly.
starting_label Integer. The sequence of numbers will start with this integer.
order_name String. The clone of this_folder will be relabelled with prefix specified by this string.

Default value =”img”

tag_name String. The clone of tag_folder will be relabelled with prefix specified by this string.

Default value =”imggt”

filetype String. Each image in both clone folders will have file type specified by this string.

Default value =”.png”

clone_name String. The clone folder of this_folder will be named this_folder+clone_name

Default value =”_CLONE”

tag_clone_name String. The clone folder of tag_folder will be named this_folder+tag_clone_name

Default value =”_GT”

dump_folder String. The name of dump folder. Dump folder will be filled with images from this_folder that do not have corresponding images with the same name and type in the tag_folder. If this folder is empty, it will be deleted at the end.

Default value =”this_dump”

check_missing_mode Boolean. If True, cloning process of the folders are not performed. The file names of images in this_folder that do not have the corresponding images in the tag_folder will be printed.

Default value =False

rotate_angle_set List of float. Each float is a value in degree with which an image is rotated.

Default value =None

 

Example usage 1

Given a folder of images and a tag folder, this function clones both folder and relabel the images in numerical sequence. Consider the folder as shown below (left) and its tag folder (right) which is identical to the folder except each corresponding butterfly object is marked white. Download the example here and put them in the working directory under the folder /bb.

import kero.ImageProcessing.photoBox as kip

this_folder = "bb\\Butterflies"
tag_folder =  "bb\\Butterflies_canvas"

kip.tag_rename_clone(this_folder, tag_folder, 1, check_missing_mode=False)

 

bbfolder.png

The cloned and relabelled folders are shown below.

bbfolder2.png

kero version: 0.4.3 and above