Summary so far: Given images of butterflies and bees, mark the butterflies and the bees with white color RGB(255,255,255), create rotated copy of the images, annotate the position of these butterflies and bees with respect to the images in PASCAL VOC format, and finally put them into train and test folders.
Using python package kero version 4.3.0, we can increase the speed of image pre-processing using clone_to_annotate_faster() to do the processes described in the summary above. We will use this as the alternative to the tutorial from Part II of bees and butterflies object detection, and the front section of Part III.
Tips: do remember to activate the virtual environment if you have deactivated it. Virtual environment helps ensure that packages we download may not interfere with the system or other projects, especially when we need older version of some packages.
Create the following directory. I named it keropb, you can name it anything.
adhoc/myproject/images + train + test adhoc/keropb + butterflies_and_bees + Butterflies - butterflyimage1.png - ... + Butterflies_canvas - butterflyimage1.png + Bees - beeimage1.png - ... + Bees_canvas - beeimage1.[ng - ... + do_clone_to_annotate.py + do_convert_to_PASCALVOC.py + do_move_ALL_to_train.py + do_move_a_fraction.py + adhoc_functions.py
Note: the image folders and the corresponding canvas folders can be downloaded here. Also, do not worry, the last 5 python files will be provided along the way.
We store all our butterflies images in the folder Butterflies and bee images in the folder Bees. The _canvas folders are exact replicas of the corresponding folders. You can copy-paste both Butterflies and Bees folders and rename them. In the canvas folders, however, we will mark out the butterflies and the bees. In a sense, we are teaching the algorithm which objects in the pictures are butterflies, and which are bees. To mark out a butterfly, use white (255,255,255) RGB to block out the butterfly. Ok, this is easy to do, just use the good ol’ Paint program and use white color to paint over the butterfly, or use eraser. See the example below. Note that the images have exactly the same names.
Tips: if the image contains white patches, they might be wrongly detected as a butterfly too. This is bad. In that case, paint this irrelevant white patches with other obvious color, such as black.
Make sure kero is installed or upgrade it with
pip install --upgrade kero
Create and run the following script do_clone_to_annotate.py from adhoc/keropb i.e. in cmd.exe, cd into keropb and run the command
Tips: We have set check_missing_mode=False. It is good to set it to True first. This helps us check if each image in Butterflies have a corresponding image in Butterflies_canvas. Before processing, we want to identify missing images so that we can fix them before proceeding. If everything is fine, “ALL GREEN. No missing files.” will be printed. Then set it to False and run it again.
import kero.ImageProcessing.photoBox as kip this_folder = "butterflies_and_bees\\Butterflies" # "butterflies_and_bees\\Bees" tag_folder = "butterflies_and_bees\\Butterflies_canvas" # "butterflies_and_bees\\Bees_canvas" gsw=kip.GreyScaleWorkShop() rotate_angle_set = [30,60,90,120,150,180] # None annotation_name = "butterfly" # "bee" gsw.clone_to_annotate_faster(this_folder, tag_folder,1,annotation_name, order_name="imgBUT", tag_name="imgBUT", check_missing_mode=False, skip_ground_truth=False, significant_fraction=0.01, rotate_angle_set=rotate_angle_set, thresh=250)
Redo the above for Bees folder as well changing the blue bolded values above correspondingly. See that Bees_LOG.txt and Butterflies_LOG.txt are created also, listing how the image files are renamed.
Create PASCAL VOC annotation format in xml
Now, still in keropb folder, create adhoc_functions.py as shown in Object Detection using Tensorflow: adhoc functions, adjust the approriate paths in do_convert_to_PASCALVOC.py and then run the following
import adhoc_functions as af for x in ["Butterflies","Bees"]: annot_foldername = "C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\"+ x + "_ANNOT" annot_filetype = ".txt" img_foldername = "C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\"+ x +"_CLONE" img_filetype = ".png" af.mass_convert_to_PASCAL_VOC_xml(annot_foldername,annot_filetype, annot_foldername,img_filetype)
By now, all the images required are available with annotations in PASCAL VOC, written in .xml files.
Train and test split
Now we split the images (with annotations) to training folder (90% of all the images) and test folder (the remaining 10%). We can further split out 10% out of the 90% for external testing if you prefer. Our tutorial will use 90% for training and 10% for validation, without splitting 10% out of the the 90%.
Run the following.
import adhoc_functions, os train_folder = "C:\\Users\\acer\\Desktop\\adhoc\\myproject\\images\\train" test_folder = "C:\\Users\\acer\\Desktop\\adhoc\\myproject\\images\\test" if not os.path.exists(train_folder): print("Creating train folder.") os.mkdir(train_folder) if not os.path.exists(test_folder): print("Creating test folder.") os.mkdir(test_folder) for i in ["Butterflies","Bees"]: folder_name_CLONE="C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\"+i+"_CLONE" folder_name_ANNOT="C:\\Users\\acer\\Desktop\\adhoc\\keropb\\butterflies_and_bees\\"+i+"_ANNOT" rotate_angle_set=[30,60,90,120,150,180] folder_target = train_folder adhoc_functions.move_ALL_to_train(folder_name_CLONE,folder_name_ANNOT,rotate_angle_set, folder_target) print("Done.")
The folder train will be populated. Now we get 10% into folder test. Run the following
import adhoc_functions as af src= "C:\\Users\\acer\\Desktop\\adhoc\\myproject\\images\\train" tgt= "C:\\Users\\acer\\Desktop\\adhoc\\myproject\\images\\test" af.move_some_percent(src,tgt)
Done! To proceed with the next step, which is model training, see the section “Conversion to tfrecords” (and skip “Train and test split” section) in Part III of bees and butterflies object detection.