Blender multiple rendering

Pretty often, when we create a 3D model, for example using Blender, we would like to automate the rendering of the images from different camera angles and perspectives. See for example, this pre-processing part of an image processing project. The blender file containing the 3D model can be found here.

After creating the model, use the following codes to run auto_capture.py for rendering. The codes are to be input into python console in Blender (to access it, press shift+f4).

import os, sys, importlib
current_dir = "path\\to\\current\\directory"
sys.path.append(current_dir) # add current path
import auto_capture # importlib.reload(auto_capture) # reload to rerun the script after first time import

auto_capture.py

import os, sys, importlib
current_dir = "path\\to\\current\\directory"
sys.path.append(current_dir) # add current path
# import auto_capture
# importlib.reload(auto_capture)

print("Hello!")

import bpy
obj = bpy.data.objects['Animal.001']
obj_cam = bpy.data.objects['Camera']

# x, y, z, euler_x, euler_y, euler_z
pos = [
[0, -3, 5, 30, 0, 0],
[0, -3.5, 5, 40, 0, 10],
[0, -4, 5, 45, 0, -5],
[-2, -3, 3, 50, 0, -20],
]

count = 1
def convert_deg_to_rad(x):
return x*3.142/180
for x in pos:
obj_cam.location = x[0:3]
obj_cam.rotation_euler = [convert_deg_to_rad(y) for y in x[3:]]
print(obj_cam.location)
print(obj_cam.rotation_euler )

bpy.data.scenes['Scene'].render.filepath = current_dir + '\\a' + str(count) + '.jpg'
bpy.ops.render.render(write_still=True)
count = count + 1

We use pos to specify the position of the camera x, y, z and then the extrinsic Euler angles (in degree) of the camera w.r.t the absolute x, y, z axes. We have specified 4 positions, and of course you can automate by creating a longer list. The output is a set of 4 images, a0.jpg, a1.jpg, a2.jpg and a3.jpg as shown in figure (D) below.

process

Ground Truth Images from 3D model

Look at this photo of a katydid. Amazing camouflage, isn’t it?

Katydid_camouflaged_in_basil_plant

By Jeff Kwapil – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=50923289

Recently I have been tasked with a project related to the detection of a rather elusive object. Our aim is to use image processing algorithm to detect the object, and the fact that we cannot obtain too many samples of the real object is a problem. I mean, imagine you have to capture the images of a katydid, searching high and low, but it is so easy to overlook! Perhaps we can automate photo taking, and then from the many photos obtained, we want to use algorithm to detect which photos actually contain a katydid. We need to train the model of the image processing algorithm, but has too few samples. What are we to do? Let us assume we know the shape of this object.

Here, we use 3D modelling to generate many samples of the object seen from different perspective in a camouflage background. In our training, the images to be used for training need to be marked with, say, a red box to mark out the region where the elusive object is present, as shown above. We will be using ground truth images to help draw the red boxes. What we want to show here is the simple steps to create this ground truth image. We will be using Blender and python. The codes can be found here.

process

Let us suppose we want to detect the elusive “animal” from figure (A). The model of the object might not be easy to create, but let us also assume we have created a 3D model that replicates well the “real animal”, as shown in figure (B).

  1. In Object mode, see yellow box in figure (B); select the object by right-clicking it. Once selected, change to Edit mode.
  2. Select the entire object. You can also do this by selecting one vertex, edge or face, and press A twice.
  3. Press SHIFT+D (duplicate the object), then left-click wherever the mouse cursor is now. Press P and choose Selection. Notice that a separate object has been created. The name of my object is animal, and the duplicated object generated is called Animal.001. This can be seen in the Outliner panel; see figure (C) yellow dashed rectangle.
  4. Scale Animal.001 to be just very slightly larger than the actual Animal object, just so that Animal.001 will completely cover it. Now can change the material to a color very different from the background. I change it to red. To do this, use the icon marked with dashed yellow circle in figure (C).
  5. Render and save the images from different point of views, as shown in figure D. You can use a python script to automate the process (see example here). The files are saved as a1.png, a2.png, a3.png and a4.png.
  6. Use the following code ground.py to generate the ground truth images b1.png, b2.png, b3.png and b4.png.
import cv2

# Set the current directory to the current directory
# where the Blender file is located.
current_dir = "path\\to\\current\\directory"
for i in range(4):
    count = i + 1
    myimg = cv2.imread(current_dir + '\\a' + str(count) + '.png')
    myimg_hsv = cv2.cvtColor(myimg, cv2.COLOR_BGR2HSV)
    output = cv2.inRange(myimg_hsv, (0, 40, 50), (30, 255, 255))
    cv2.imwrite(current_dir + '\\b' +str(count) + '.png', output)

In this code, the image is converted to hsv format, and we are to extract the part of image with red color, more specifically the one with hue between 0 to 30, saturation from 40 to 255 and value (brightness) from 50 to 255. This is done through inRange() with arguments (0, 40, 50) and (30, 255, 255). The desired color is marked white and the rest is black; this is our ground truth image and we are done! Of course, these images are to be further processed for image processing, though not within the scope of this post.

ground_truth.JPG

One final note: of course we can just set the object Animal to red color. However, when we need to extract only a part of an object, use the above method. Highlight the part, duplicate it, change its color to red and capture the ground truth image.

Image Processing #2: Common Blender Functions

We will list some common functions and shortcuts.

Navigation. Use the arrows in the numpad to navigate, and also their combinations with Ctrl or Shift. Have fun trying them out!

Selecting multiple object. Hold shift + left mouse click.

Real time rendering.

Look at the red circles. We can change the sections in Blender’s interface by clicking on them, depending on our preferences. To edit a scene and see them change in real time, set both to 3D view as shown below. At the bottom half, shown with green rectangle, type Shift + Z and edit the top half. See how it changes, pretty neat!

blender3.PNG

Round edge

blender_round.png

Click the edge, then Ctrl + B and then go increase the number of segments as shown in red.

Bending

blender2

Highlight the region to bend, then click Shift + W. Place the cursor (crosshair) as the pivot, and then move the mouse around to bend the object.

Merging Vertices

You might be faced a lot of meshes while working with Blender. Sometimes we might want to merge vertices. The following shows 4 vertices to be merged into 1.

Select all four vertices simultaneously, click Alt+M, and choose the way you want to merge (in this example, we choose to merge the vertices towards the crosshair cursor).

blender4.png

Image Processing #1: Using Blender

I am embarking on an project on image processing now. The first task was to generate some images to help model training, since actual images for training are hard to come by. Let us use Blender to generate these images.

In this example, I will create the following directory. The folder image_save is empty; this is where we will save the rendered images. The folder somemodule is to show how to import an external module we would like to include in the project. The file practice.blend is created through Blender. It will start off with a scene containing the following objects: Camera, Cube, Lamp and World and RenderLayers. Also, __init__.py file is, as usual, the required python file use an external module such as somemodule. It can be left empty, or be used just like any other python scripts.

blender
/image_save
/somemodule
//__init__.py
//test_mod.py
/practice.blend
/mytest.py

What to expect? We show how to

  1. render an image of a cube using python from inside Blender. Then, save the image.
  2. tilt the same cube, render the image, and then save it as well.

mytest.py

print("test")

import bpy
obj = bpy.data.objects['Cube']
current_dir = "your\\directory\\to\\blender"

print(" - location = ", obj.location)
print(" - angle = ", obj.rotation_euler)
bpy.data.scenes['Scene'].render.filepath = current_dir + '\\image_save\\mytest\\ggwp.jpg'
bpy.ops.render.render( write_still=True )


obj.location[0] = 1.0
obj.rotation_euler[0] = 30
print(" - location (after) = ", obj.location)
print(" - angle (after) = ", obj.rotation_euler)
bpy.data.scenes['Scene'].render.filepath = current_dir + '\\image_save\\mytest\\ggwp2.jpg'
bpy.ops.render.render( write_still=True )

test_mod.py

print("inside test_mod")

Now we are ready. Inside Blender, after creating new file practice.blend, press SHIFT+F4 to access Blender’s internal python console.

import os, sys, importlib
>>> cur_dir = “your\\directory\\to\\blender”
>>> sys.path.append(cur_dir) # add current path
>>> import somemodule.test_mod
inside test_mod
>>> import mytest
test – location = <Vector (0.0000, 0.0000, 0.0000)> – angle = <Euler (x=0.0000, y=0.0000, z=0.0000), order=’XYZ’> – location (after) = <Vector (1.0000, 0.0000, 0.0000)> – angle (after) = <Euler (x=30.0000, y=0.0000, z=0.0000), order=’XYZ’>

and two images, ggwp.png and ggwp2.png are created in image_save/mystest, as shown below

blender1

We have run the script by importing, for example through import mytest. In the case we need to rerun the script, use instead

importlib.reload(mytest)

since import mytest will no longer work.

Deep Neural Network Regression part 2.2

home > keroMachine Learning

DNN Regressor

  1. Part 1. Synthetic Regression
  2. Part 2.1. Synthetic Regression. Train and save the model.
  3. Part 2.2. Synthetic Regression. Load model and predict.

The following codes can be found here in folder DNN regression, python under the name synregMVar_cont.ipynb (Jupyter notebook) and synregMVar_cont.py format.

Continuing from part 1 in this link, we load the model saved under output1 and output2 for prediction. It will perform badly since it is trained with a small number of data points. We will then train it further with more data points and perform better prediction.

import kero.DataHandler.RandomDataFrame as RDF
import kero.DataHandler.DataTransform as dt
from kero.DataHandler.Generic import *

import numpy as np
import pandas as pd
import tensorflow as tf
import itertools
import matplotlib.pyplot as plt
import matplotlib
from sklearn.model_selection import train_test_split
from scipy.stats.stats import pearsonr
from pylab import rcParams

In this code, make sure the number of layers and hidden units are the same as the values used in the first round of training. Likewise, make sure the activation functions are the same as well. The number of steps for training in the later part of the code can be set a lot higher.

hiddenunit_set=[[32,16,8],[32,16,8]]
step_set= [2400, 2400] # [6400,6400] # [None, None]
activationfn_set=[tf.nn.relu, tf.nn.relu]

no_of_new_training_set=2000
new_test_size_frac = 0.5

rdf = RDF.RandomDataFrame()
####################################################
# Specify the input variables here
####################################################
FEATURES = ["first","second","third", "fourth","bool1", "bool2", "bool3", "bool4"]
output_label="output1" # !! List all the output column names
output_label2= "output2"

col1 = {"column_name": FEATURES[0], "items": list(range(4))}
col2 = {"column_name": FEATURES[1], "items": list(np.linspace(10, 20, 8))}
col3 = {"column_name": FEATURES[2], "items": list(np.linspace(-100, 100, 1250))}
col4 = {"column_name": FEATURES[3], "items": list(np.linspace(-1, 1, 224))}
col5 = {"column_name": FEATURES[4], "items": [0, 1]}
col6 = {"column_name": FEATURES[5], "items": [0, 1]}
col7 = {"column_name": FEATURES[6], "items": [0, 1]}
col8 = {"column_name": FEATURES[7], "items": [0, 1]}

LABEL = [output_label, output_label2]

In the following code we load the training data set from part 2.1, drop all the defective data points, split it into training part (20 data points) and test part (980 data points), similar to, but not necessarily the same as, part 2.1.

df_train = pd.read_csv(r"regressionMVartest_train.csv")
print('df train shape =',df_train.shape)
# print("train data:\n", df_train.head())
cleanD_train, crippD_train, _ = dt.data_sieve(df_train)  # cleanD, crippD, origD'
cleanD_train.get_list_from_df()
colname_set_train = df_train.columns
df_train_clean = cleanD_train.clean_df
df_train_crippled = crippD_train.crippled_df
print("clean train data:\n", df_train.head())
print('df train clean shape =',df_train_clean.shape)
if df_train_crippled is not None:
    print('df train crippled shape =',df_train_crippled.shape)
else:
    print('df train: no defect')

# prepare
train = df_train_clean[:]
print(FEATURES," -size = ", len(FEATURES))
# Columns for tensorflow
feature_cols = [tf.contrib.layers.real_valued_column(k) for k in FEATURES]

# Training set and Prediction set with the features to predict
training_set = train[FEATURES]
prediction_set = train[LABEL]

# Train and Test 
x_train, x_test, y_train, y_test = train_test_split(training_set[FEATURES] , prediction_set, test_size=0.98, random_state=42)
y_train = pd.DataFrame(y_train, columns = LABEL)
training_set = pd.DataFrame(x_train, columns = FEATURES).merge(y_train, left_index = True, right_index = True)

# Training for submission
training_sub = training_set[FEATURES]
print(training_set.head()) # NOT YET PREPROCESSED
# Same thing but for the test set
y_test = pd.DataFrame(y_test, columns = LABEL)
testing_set = pd.DataFrame(x_test, columns = FEATURES).merge(y_test, left_index = True, right_index = True)
print(testing_set.head()) # NOT YET PREPROCESSED
print("training size = ", training_set.shape)
print("test size = ", testing_set.shape)

Then we do pre-processing. Once done, we are ready to feed these pre-processed data into the model for prediction.

range_second = [10,20]
range_third = [-100,100]
range_fourth = [-1,1]
# range_input_set = [range_second, range_third, range_fourth]
range_output1 = [-200,200]
range_output2 = [-600,600]
range_output_set = {'output1': range_output1, 'output2': range_output2}


conj_command_set = {FEATURES[0]: "",
                    FEATURES[1]: "cont_to_scale",
                    FEATURES[2]: "cont_to_scale",
                    FEATURES[3]: "cont_to_scale",
                    FEATURES[4]: "",
                    FEATURES[5]: "",
                    FEATURES[6]: "",
                    FEATURES[7]: "",
                    # OUTPUT
                    LABEL[0]: "cont_to_scale",
                    LABEL[1]: "cont_to_scale",
                   }
scale_output1 = [0,1]
scale_output2 = [0,1]
scale_output_set = {'output1': scale_output1, 'output2': scale_output2}
cont_to_scale_settings_second = {"scale": [-1, 1], "mode": "uniform", "original_scale":range_second}
cont_to_scale_settings_third = {"scale": [0, 1], "mode": "uniform", "original_scale":range_third}
cont_to_scale_settings_fourth = {"scale": [0, 1], "mode": "uniform", "original_scale":range_fourth}
cont_to_scale_settings_output1 = {"scale": scale_output1 , "mode": "uniform", "original_scale":range_output1}
cont_to_scale_settings_output2 = {"scale": scale_output2 , "mode": "uniform", "original_scale":range_output2}
conj_command_setting_set = {FEATURES[0]: None,
                            FEATURES[1]: cont_to_scale_settings_second,
                            FEATURES[2]: cont_to_scale_settings_third,
                            FEATURES[3]: cont_to_scale_settings_fourth,
                            FEATURES[4]: None,
                            FEATURES[5]: None,
                            FEATURES[6]: None,
                            FEATURES[7]: None,
                            # OUTPUT
                            LABEL[0]: cont_to_scale_settings_output1,
                            LABEL[1]: cont_to_scale_settings_output2,
                            }
# Model
tf.logging.set_verbosity(tf.logging.ERROR)
regressor_set = []
for i in range(len(LABEL)):
    regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_cols, 
                                              activation_fn = activationfn_set[i], hidden_units=hiddenunit_set[i],
                                             model_dir=LABEL[i])
#                                             optimizer = tf.train.GradientDescentOptimizer(learning_rate= 0.1))
    regressor_set.append(regressor)

# Reset the index of training
training_set.reset_index(drop = True, inplace =True)
def input_fn(data_set, one_label, pred = False):
    # one_label is the element of LABEL
    if pred == False:
        
        feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
        labels = tf.constant(data_set[one_label].values)
        
        return feature_cols, labels

    if pred == True:
        feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
        
        return feature_cols
# Conjugate

cleanD_testing_set = dt.clean_data()   
cleanD_testing_set.clean_df = testing_set
cleanD_testing_set.build_conj_dataframe(conj_command_set, conj_command_setting_set=conj_command_setting_set)

test_conj = cleanD_testing_set.clean_df_conj[:]
print("scaled test=\n", test_conj.head(10))

Loading Model for Prediction

Then we perform the prediction on 980 data points that we just created.

# Evaluation on the test set created by train_test_split
print("Final Loss on the testing set: ")
predictions_prev_set = []
for i in range(len(LABEL)):
    ev = regressor_set[i].evaluate(input_fn=lambda: input_fn(test_conj,LABEL[i]), steps=1)
    loss_score1 = ev["loss"]
    print(LABEL[i],"{0:f}".format(loss_score1))
    # Predictions
    y = regressor_set[i].predict(input_fn=lambda: input_fn(test_conj,LABEL[i]))
    predictions_prev = list(itertools.islice(y, test_conj.shape[0]))
    predictions_prev_set.append(predictions_prev)
print("predictions_prev_set length = ",len(predictions_prev_set))
corrcoeff_set = []
predictions_set = []
reality_set = []
print("pearson correlation coefficients  =  ")

for i in range(len(LABEL)):
    # print(LABEL[i]," : ",init_scale_max ,init_scale_min)
    # need to inverse transform #
    # since prediction is in conj form
    initial_scale = [range_output_set[LABEL[i]][0],range_output_set[LABEL[i]][1]]
    orig_scale = [scale_output_set[LABEL[i]][0],scale_output_set[LABEL[i]][1]]
    pred_inv = dt.conj_from_cont_to_scaled(predictions_prev_set[i], scale=initial_scale, mode="uniform",original_scale=orig_scale)
    #############################
    predictions = pd.DataFrame(pred_inv,columns = ['Prediction'])
    predictions_set = predictions_set + [pred_inv] # a list, or column
    # predictions.head()
    
    reality = testing_set[LABEL[i]].values # a list, or column
    reality_set = reality_set + [reality]
    # reality.head()
    corrcoeff=pearsonr(list(predictions.Prediction), list(reality))
    corrcoeff_set.append(corrcoeff)
    print(LABEL[i], " : ", corrcoeff)
matplotlib.rc('xtick', labelsize=20) 
matplotlib.rc('ytick', labelsize=20) 
for i in range(len(LABEL)):    
    fig, ax = plt.subplots()
#     plt.style.use('ggplot')
    plt.scatter(predictions_set[i], reality_set[i],s=3, c='r', lw=0) # ,'ro'
    plt.xlabel('Predictions', fontsize = 20)
    plt.ylabel('Reality', fontsize = 20)
    plt.title('Predictions x Reality on dataset Test: '+LABEL[i], fontsize = 20)

    plt.plot([reality_set[i].min(), reality_set[i].max()], [reality_set[i].min(), reality_set[i].max()], 'k--', lw=2)

deep_neural_network_poor

As shown above, the prediction performance is poor.

Further Training

Now, we create more data for further training.

no_of_data_points = [no_of_new_training_set, None] # number of rows for training and testing data sets to be generated.
puncture_rate=0.001

rdf = RDF.RandomDataFrame()
####################################################
# Specify the input variables here
####################################################
FEATURES = ["first","second","third", "fourth","bool1", "bool2", "bool3", "bool4"]
# output_label= # !! List all the output column names
# output_label2='output2'
LABEL = ['output1','output2']

col1 = {"column_name": FEATURES[0], "items": list(range(4))}
col2 = {"column_name": FEATURES[1], "items": list(np.linspace(10, 20, 8))}
col3 = {"column_name": FEATURES[2], "items": list(np.linspace(-100, 100, 1250))}
col4 = {"column_name": FEATURES[3], "items": list(np.linspace(-1, 1, 224))}
col5 = {"column_name": FEATURES[4], "items": [0, 1]}
col6 = {"column_name": FEATURES[5], "items": [0, 1]}
col7 = {"column_name": FEATURES[6], "items": [0, 1]}
col8 = {"column_name": FEATURES[7], "items": [0, 1]}

rdf.initiate_random_table(no_of_data_points[0], col1, col2, col3, col4, col5, col6, col7, col8, panda=True)
# print("clean\n", rdf.clean_df)

df_temp = rdf.clean_df
listform, column_name_list = dt.dataframe_to_list(df_temp)

########################################################
# Specify the system of equations which determines
# the output variables.
########################################################
tempcol = []
tempcol2 = []
gg = listform[:]
column_name_list = list(column_name_list)

########## Specifiy the name(s) of the output variable(s) ##########
column_name_list = column_name_list + LABEL

listform = list(listform)
for i in range(len(listform[0])):
    # example 0 (very easy)
#             temp = gg[0][i] + gg[1][i] + gg[2][i] + gg[3][i] + gg[4][i] + gg[5][i] + gg[6][i] + gg[7][i]
#             temp2 = gg[0][i] - gg[1][i] + gg[2][i] - gg[3][i] + gg[4][i] - gg[5][i] + gg[6][i] - gg[7][i]

    # example 1
    temp = gg[0][i]**2 + gg[1][i] + gg[2][i] + (gg[4][i] + gg[5][i])*gg[3][i] + gg[6][i] + gg[7][i]
    temp2 = gg[0][i] - gg[1][i]**2 + gg[2][i] - gg[3][i]*(0.5*(gg[6][i] - gg[7][i])) + gg[4][i] - gg[5][i] 
    ########################################
    tempcol = tempcol + [temp]
    tempcol2 = tempcol2 + [temp2]
listform = listform + [tempcol, tempcol2]
# for i in range(len(listform)):
#     print(column_name_list[i], '-', listform[i])
########################################################

listform = transpose_list(listform)
# print(listform)
# print(column_name_list)
temp_df = pd.DataFrame(listform, columns=column_name_list)
rdf.clean_df = temp_df
# print(rdf.clean_df)

rdf.crepify_table(rdf.clean_df, rate=puncture_rate)
# print("post crepfify\n", rdf.crepified_df)
rdf.crepified_df.to_csv("regressionMVartest_train_more.csv", index=False)

We load the new training set, split them into training and test part (this time 50% each), and perform the pre-processing on the training part of the new training set.

df_train = pd.read_csv(r"regressionMVartest_train_more.csv")
print('df train shape =',df_train.shape)
# print("train data:\n", df_train.head())
cleanD_train, crippD_train, _ = dt.data_sieve(df_train)  # cleanD, crippD, origD'
cleanD_train.get_list_from_df()
colname_set_train = df_train.columns
df_train_clean = cleanD_train.clean_df
df_train_crippled = crippD_train.crippled_df
print("clean train data:\n", df_train.head())
print('df train clean shape =',df_train_clean.shape)
if df_train_crippled is not None:
    print('df train crippled shape =',df_train_crippled.shape)
else:
    print('df train: no defect')
# prepare
dftr=df_train_clean[:]
dftr.head()
train = dftr
print(FEATURES," -size = ", len(FEATURES))
feature_cols = [tf.contrib.layers.real_valued_column(k) for k in FEATURES]

# Training set and Prediction set with the features to predict
training_set = train[FEATURES]
prediction_set = train[LABEL]

# Train and Test 
x_train, x_test, y_train, y_test = train_test_split(training_set[FEATURES] , prediction_set, test_size=new_test_size_frac , random_state=42)
y_train = pd.DataFrame(y_train, columns = LABEL)
training_set = pd.DataFrame(x_train, columns = FEATURES).merge(y_train, left_index = True, right_index = True)

# Training for submission
training_sub = training_set[FEATURES]
print(training_set.head()) # NOT YET PREPROCESSED
# Same thing but for the test set
y_test = pd.DataFrame(y_test, columns = LABEL)
testing_set = pd.DataFrame(x_test, columns = FEATURES).merge(y_test, left_index = True, right_index = True)
print(testing_set.head()) # NOT YET PREPROCESSED
print("training size = ", training_set.shape)
print("test size = ", testing_set.shape)
range_second = [10,20]
range_third = [-100,100]
range_fourth = [-1,1]
# range_input_set = [range_second, range_third, range_fourth]
range_output1 = [-200,200]
range_output2 = [-600,600]
range_output_set = {'output1': range_output1, 'output2': range_output2}


conj_command_set = {FEATURES[0]: "",
                    FEATURES[1]: "cont_to_scale",
                    FEATURES[2]: "cont_to_scale",
                    FEATURES[3]: "cont_to_scale",
                    FEATURES[4]: "",
                    FEATURES[5]: "",
                    FEATURES[6]: "",
                    FEATURES[7]: "",
                    # OUTPUT
                    LABEL[0]: "cont_to_scale",
                    LABEL[1]: "cont_to_scale",
                   }
scale_output1 = [0,1]
scale_output2 = [0,1]
scale_output_set = {'output1': scale_output1, 'output2': scale_output2}
cont_to_scale_settings_second = {"scale": [-1, 1], "mode": "uniform", "original_scale":range_second}
cont_to_scale_settings_third = {"scale": [0, 1], "mode": "uniform", "original_scale":range_third}
cont_to_scale_settings_fourth = {"scale": [0, 1], "mode": "uniform", "original_scale":range_fourth}
cont_to_scale_settings_output1 = {"scale": scale_output1 , "mode": "uniform", "original_scale":range_output1}
cont_to_scale_settings_output2 = {"scale": scale_output2 , "mode": "uniform", "original_scale":range_output2}
conj_command_setting_set = {FEATURES[0]: None,
                            FEATURES[1]: cont_to_scale_settings_second,
                            FEATURES[2]: cont_to_scale_settings_third,
                            FEATURES[3]: cont_to_scale_settings_fourth,
                            FEATURES[4]: None,
                            FEATURES[5]: None,
                            FEATURES[6]: None,
                            FEATURES[7]: None,
                            # OUTPUT
                            LABEL[0]: cont_to_scale_settings_output1,
                            LABEL[1]: cont_to_scale_settings_output2,
                            }
cleanD_training_set = dt.clean_data()   
cleanD_training_set.clean_df = training_set
cleanD_training_set.build_conj_dataframe(conj_command_set, conj_command_setting_set=conj_command_setting_set)

train_conj = cleanD_training_set.clean_df_conj[:]
print("scaled train=\n", train_conj.head(10))

We write the model here, perform training, perform pre-processing on the test part of the training set, and predict the outcome of the test part of the training set.

# Model
tf.logging.set_verbosity(tf.logging.ERROR)
regressor_set = []
for i in range(len(LABEL)):
    regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_cols, 
                                              activation_fn = activationfn_set[i], hidden_units=hiddenunit_set[i],
                                             model_dir=LABEL[i])
#                                             optimizer = tf.train.GradientDescentOptimizer(learning_rate= 0.1))
    regressor_set.append(regressor)

# Reset the index of training
training_set.reset_index(drop = True, inplace =True)
def input_fn(data_set, one_label, pred = False):
    # one_label is the element of LABEL
    if pred == False:
        
        feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
        labels = tf.constant(data_set[one_label].values)
        
        return feature_cols, labels

    if pred == True:
        feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
        
        return feature_cols
    
# TRAINING HERE
for i in range(len(LABEL)):
    regressor_set[i].fit(input_fn=lambda: input_fn(train_conj, LABEL[i]), steps=step_set[i])
# Conjugate testing part of the training set

cleanD_testing_set = dt.clean_data()   
cleanD_testing_set.clean_df = testing_set
cleanD_testing_set.build_conj_dataframe(conj_command_set, conj_command_setting_set=conj_command_setting_set)

test_conj = cleanD_testing_set.clean_df_conj[:]
print("scaled test=\n", test_conj.head(10))
# Evaluation on the test set created by train_test_split
print("Final Loss on the testing set: ")
predictions_prev_set_new = []
for i in range(len(LABEL)):
    ev = regressor_set[i].evaluate(input_fn=lambda: input_fn(test_conj,LABEL[i]), steps=1)
    loss_score1 = ev["loss"]
    print(LABEL[i],"{0:f}".format(loss_score1))
    # Predictions
    y = regressor_set[i].predict(input_fn=lambda: input_fn(test_conj,LABEL[i]))
    predictions_prev = list(itertools.islice(y, test_conj.shape[0]))
    predictions_prev_set_new.append(predictions_prev)
print("predictions_prev_set_new length = ",len(predictions_prev_set))
corrcoeff_set_new = []
predictions_set_new = []
reality_set_new = []
print("pearson correlation coefficients  =  ")

for i in range(len(LABEL)):
    # print(LABEL[i]," : ",init_scale_max ,init_scale_min)
    # need to inverse transform #
    # since prediction is in conj form
    initial_scale = [range_output_set[LABEL[i]][0],range_output_set[LABEL[i]][1]]
    orig_scale = [scale_output_set[LABEL[i]][0],scale_output_set[LABEL[i]][1]]
    pred_inv = dt.conj_from_cont_to_scaled(predictions_prev_set_new[i], scale=initial_scale, mode="uniform",original_scale=orig_scale)
    #############################
    predictions = pd.DataFrame(pred_inv,columns = ['Prediction'])
    predictions_set_new = predictions_set_new + [pred_inv] # a list, or column
    # predictions.head()
    
    reality = testing_set[LABEL[i]].values # a list, or column
    reality_set_new = reality_set_new + [reality]
    # reality.head()
    corrcoeff=pearsonr(list(predictions.Prediction), list(reality))
    corrcoeff_set.append(corrcoeff)
    print(LABEL[i], " : ", corrcoeff)
matplotlib.rc('xtick', labelsize=20) 
matplotlib.rc('ytick', labelsize=20) 

for i in range(len(LABEL)):    
    fig2, ax2 = plt.subplots()
#     plt.style.use('ggplot')
    plt.scatter(predictions_set_new[i], reality_set_new[i],s=3, c='r', lw=0) # ,'ro'
    plt.xlabel('Predictions', fontsize = 20)
    plt.ylabel('Reality', fontsize = 20)
    plt.title('Predictions x Reality on dataset Test: '+LABEL[i], fontsize = 20)

    ax2.plot([reality_set_new[i].min(), reality_set_new[i].max()], [reality_set_new[i].min(), reality_set_new[i].max()], 'k--', lw=2)

deep_neural_network_train_test

As shown above, the outcome of the new training shows up better, i.e. closer to the true theoretical values.

Finally, we use the newly trained model to predict the outcome of our test data. First we do pre-processing.

df_test = pd.read_csv(r"regressionMVartest_test.csv")
# print('df test shape =',df_test.shape)
# print("\ntest data", df_test.head())
cleanD_test, crippD_test, _ = dt.data_sieve(df_test) 
cleanD_test.get_list_from_df()
colname_set_train = df_train.columns
df_test_clean = cleanD_test.clean_df
df_test_crippled = crippD_test.crippled_df

conj_command_set_test = {FEATURES[0]: "",
                    FEATURES[1]: "cont_to_scale",
                    FEATURES[2]: "cont_to_scale",
                    FEATURES[3]: "cont_to_scale",
                    FEATURES[4]: "",
                    FEATURES[5]: "",
                    FEATURES[6]: "",
                    FEATURES[7]: "",
                   }
conj_command_setting_set_test = {FEATURES[0]: None,
                            FEATURES[1]: cont_to_scale_settings_second,
                            FEATURES[2]: cont_to_scale_settings_third,
                            FEATURES[3]: cont_to_scale_settings_fourth,
                            FEATURES[4]: None,
                            FEATURES[5]: None,
                            FEATURES[6]: None,
                            FEATURES[7]: None,
                            }
# Same thing (preprocessing) but for the test set

cleanD_test.build_conj_dataframe(conj_command_set_test, conj_command_setting_set=conj_command_setting_set_test)

test_predict_conj = cleanD_test.clean_df_conj[:]
print("scaled test=\n", test_predict_conj.head(10))
print(df_test.shape)
print(test_predict_conj.shape)

Next we print out the prediction on a separate file synregMVar_submission.csv. The result is compared with the true solutions recorded in regressionMVartest_test_correctans.csv generated in part 1.

filename = "synregMVar_submission.csv"
df_test_correct_ans = pd.read_csv(r"regressionMVartest_test_correctans.csv")

y_predict_inv_set = []
for i in range(len(LABEL)):
    y_predict = regressor_set[i].predict(input_fn=lambda: input_fn(test_predict_conj , LABEL[i], pred = True))
    # need to transform back
    y_predict_before = list(itertools.islice(y_predict, df_test.shape[0]))
    # !!
    initial_scale = [range_output_set[LABEL[i]][0],range_output_set[LABEL[i]][1]]
    orig_scale = [scale_output_set[LABEL[i]][0],scale_output_set[LABEL[i]][1]]
    y_predict_inv = dt.conj_from_cont_to_scaled(y_predict_before, scale=initial_scale, mode="uniform",original_scale=orig_scale)
    y_predict_inv_set = y_predict_inv_set + [y_predict_inv]
    
    fig2, ax2 = plt.subplots()
    real_test = np.array(list(df_test_correct_ans[LABEL[i]]))
    plt.scatter(y_predict_inv, real_test,s=3, c='r', lw=0) # ,'ro'
    plt.xlabel('Predictions', fontsize = 20)
    plt.ylabel('Reality', fontsize = 20)
    plt.title('Predictions x Reality on dataset Test: '+LABEL[i], fontsize = 20)

    ax2.plot([real_test.min(), real_test.max()], [real_test.min(), real_test.max()], 'k--', lw=4)
y_predict_inv_set = transpose_list(y_predict_inv_set)
# print(y_predict_inv_set)
y_predict_for_csv = pd.DataFrame(y_predict_inv_set, columns = LABEL)
y_predict_for_csv.to_csv(filename, index=False)

deep_neural_network_test

The prediction is stored in synregMVar_submission.csv.

Deep Neural Network Regression part 2.1

home > kero > Machine Learning

DNN Regressor

  1. Part 1. Synthetic Regression
  2. Part 2.1. Synthetic Regression. Train and save the model.
  3. Part 2.2. Synthetic Regression. Load model and predict.

The following codes can be found here in folder DNN regression, python under the name synregMVar_init.ipynb (Jupyter notebook) and synregMVar_init.py format.

In the previous example, we used Deep Neural Network Regression to solve regression problems. Read it first to get the overall idea of what we are doing here. In this example, we also create random data for training and prediction similarly. The difference is, we split the problem into part 2.1 and part 2.2. We will go through part 2.1 here. We use small number of training data to train the model, save the model, and load this model in part 2.2. We will see that the model performs poorly, and then we further train the model and use it for prediction with better performance.

import kero.DataHandler.RandomDataFrame as RDF
import kero.DataHandler.DataTransform as dt
from kero.DataHandler.Generic import *

import numpy as np
import pandas as pd
import tensorflow as tf
import itertools
import matplotlib.pyplot as plt
import matplotlib
from sklearn.model_selection import train_test_split
from scipy.stats.stats import pearsonr
from pylab import rcParams

Here, 3 layers are used for each of the two outputs. Increase the number of steps for each output to allow the algorithm to update the model further through the course of training.

hiddenunit_set=[[32,16,8],[32,16,8]]
step_set=[2400,2400] # [None, None]
activationfn_set=[tf.nn.relu, tf.nn.relu]

Preparing Data

The following section creates the random data. We initiate 1000 points for training, but intentionally make test_size_frac=0.98, i.e. only 0.02 * 1000 = 20 points are used for training and 980 used for testing. This is to demonstrate how the model will perform poorly if we do not train it further with more data.

no_of_training_data = 1000
no_of_test_data = 1000
test_size_frac = 0.98 # fraction of training data used for validation
puncture_rate=0.001

no_of_data_points = [no_of_training_data, no_of_test_data] # number of rows for training and testing data sets to be generated.
rdf = RDF.RandomDataFrame()
####################################################
# Specify the input variables here
####################################################
FEATURES = ["first","second","third", "fourth","bool1", "bool2", "bool3", "bool4"]
output_label="output1" # !! List all the output column names
output_label2= "output2"

col1 = {"column_name": FEATURES[0], "items": list(range(4))}
col2 = {"column_name": FEATURES[1], "items": list(np.linspace(10, 20, 8))}
col3 = {"column_name": FEATURES[2], "items": list(np.linspace(-100, 100, 1250))}
col4 = {"column_name": FEATURES[3], "items": list(np.linspace(-1, 1, 224))}
col5 = {"column_name": FEATURES[4], "items": [0, 1]}
col6 = {"column_name": FEATURES[5], "items": [0, 1]}
col7 = {"column_name": FEATURES[6], "items": [0, 1]}
col8 = {"column_name": FEATURES[7], "items": [0, 1]}

LABEL = [output_label, output_label2]
for toggley in [0, 1]: # once for train.csv, once for test.csv
    rdf.initiate_random_table(no_of_data_points[toggley], col1, col2, col3, col4, col5, col6, col7, col8, panda=True)
    # print("clean\n", rdf.clean_df)

    df_temp = rdf.clean_df
    listform, _ = dt.dataframe_to_list(df_temp)
    
    
    ########################################################
    # Specify the system of equations which determines
    # the output variables.
    ########################################################
    tempcol = []
    tempcol2 = []
    gg = listform[:]

    ########## Specifiy the name(s) of the output variable(s) ##########
    

    listform = list(listform)
    for i in range(len(listform[0])):
        # example 0 (very easy)
#             temp = gg[0][i] + gg[1][i] + gg[2][i] + gg[3][i] + gg[4][i] + gg[5][i] + gg[6][i] + gg[7][i]
#             temp2 = gg[0][i] - gg[1][i] + gg[2][i] - gg[3][i] + gg[4][i] - gg[5][i] + gg[6][i] - gg[7][i]

        # example 1
        temp = gg[0][i]**2 + gg[1][i] + gg[2][i] + (gg[4][i] + gg[5][i])*gg[3][i] + gg[6][i] + gg[7][i]
        temp2 = gg[0][i] - gg[1][i]**2 + gg[2][i] - gg[3][i]*(0.5*(gg[6][i] - gg[7][i])) + gg[4][i] - gg[5][i] 
        ########################################
        tempcol = tempcol + [temp]
        tempcol2 = tempcol2 + [temp2]
    if toggley==0:
        listform = listform + [tempcol, tempcol2]
        column_name_list = FEATURES + LABEL
    else:
        correct_test_df = pd.DataFrame(np.transpose([tempcol, tempcol2]),columns=LABEL)
        correct_test_df.to_csv("regressionMVartest_test_correctans.csv", index=False)
        column_name_list = FEATURES
    # for i in range(len(listform)):
    #     print(column_name_list[i], '-', listform[i])
    ########################################################
    

    listform = transpose_list(listform)
    # print(listform)
    # print(column_name_list)
    temp_df = pd.DataFrame(listform, columns=column_name_list)
    rdf.clean_df = temp_df
    # print(rdf.clean_df)

    if toggley==0:
        rdf.crepify_table(rdf.clean_df, rate=puncture_rate)
        # print("post crepfify\n", rdf.crepified_df)
        rdf.crepified_df.to_csv("regressionMVartest_train.csv", index=False)
    else:
        rdf.crepify_table(rdf.clean_df, rate=0)
        # print("post crepfify\n", rdf.crepified_df)
        rdf.crepified_df.to_csv("regressionMVartest_test.csv", index=False)

Pre-processing

We pre-process the data for training. In this part of the code, the training set is split into the training and testing part, 20 and 980 data points respectively, as mentioned before.

df_train = pd.read_csv(r"regressionMVartest_train.csv")
print('df train shape =',df_train.shape)
# print("train data:\n", df_train.head())
cleanD_train, crippD_train, _ = dt.data_sieve(df_train)  # cleanD, crippD, origD'
cleanD_train.get_list_from_df()
colname_set_train = df_train.columns
df_train_clean = cleanD_train.clean_df
df_train_crippled = crippD_train.crippled_df
print("clean train data:\n", df_train.head())
print('df train clean shape =',df_train_clean.shape)
if df_train_crippled is not None:
    print('df train crippled shape =',df_train_crippled.shape)
else:
    print('df train: no defect')
# prepare
dftr=df_train_clean[:]
train = dftr
print(FEATURES," -size = ", len(FEATURES))
feature_cols = [tf.contrib.layers.real_valued_column(k) for k in FEATURES]

# Training set and Prediction set with the features to predict
training_set = train[FEATURES]
prediction_set = train[LABEL]

# Train and Test 
x_train, x_test, y_train, y_test = train_test_split(training_set[FEATURES] , prediction_set, test_size=test_size_frac, random_state=42)
y_train = pd.DataFrame(y_train, columns = LABEL)
training_set = pd.DataFrame(x_train, columns = FEATURES).merge(y_train, left_index = True, right_index = True)

training_sub = training_set[FEATURES]
print(training_set.head()) # NOT YET PREPROCESSED
# Same thing but for the test part of the training set
y_test = pd.DataFrame(y_test, columns = LABEL)
testing_set = pd.DataFrame(x_test, columns = FEATURES).merge(y_test, left_index = True, right_index = True)
print(testing_set.head()) # NOT YET PREPROCESSED
print("training size = ", training_set.shape)
print("test size = ", testing_set.shape)

Here we do the pre-processing of the training part of the training set.

range_second = [10,20]
range_third = [-100,100]
range_fourth = [-1,1]
# range_input_set = [range_second, range_third, range_fourth]
range_output1 = [-200,200]
range_output2 = [-600,600]
range_output_set = {'output1': range_output1, 'output2': range_output2}


conj_command_set = {FEATURES[0]: "",
                    FEATURES[1]: "cont_to_scale",
                    FEATURES[2]: "cont_to_scale",
                    FEATURES[3]: "cont_to_scale",
                    FEATURES[4]: "",
                    FEATURES[5]: "",
                    FEATURES[6]: "",
                    FEATURES[7]: "",
                    # OUTPUT
                    LABEL[0]: "cont_to_scale",
                    LABEL[1]: "cont_to_scale",
                   }
scale_output1 = [0,1]
scale_output2 = [0,1]
scale_output_set = {'output1': scale_output1, 'output2': scale_output2}
cont_to_scale_settings_second = {"scale": [-1, 1], "mode": "uniform", "original_scale":range_second}
cont_to_scale_settings_third = {"scale": [0, 1], "mode": "uniform", "original_scale":range_third}
cont_to_scale_settings_fourth = {"scale": [0, 1], "mode": "uniform", "original_scale":range_fourth}
cont_to_scale_settings_output1 = {"scale": scale_output1 , "mode": "uniform", "original_scale":range_output1}
cont_to_scale_settings_output2 = {"scale": scale_output2 , "mode": "uniform", "original_scale":range_output2}
conj_command_setting_set = {FEATURES[0]: None,
                            FEATURES[1]: cont_to_scale_settings_second,
                            FEATURES[2]: cont_to_scale_settings_third,
                            FEATURES[3]: cont_to_scale_settings_fourth,
                            FEATURES[4]: None,
                            FEATURES[5]: None,
                            FEATURES[6]: None,
                            FEATURES[7]: None,
                            # OUTPUT
                            LABEL[0]: cont_to_scale_settings_output1,
                            LABEL[1]: cont_to_scale_settings_output2,
                            }
cleanD_training_set = dt.clean_data()   
cleanD_training_set.clean_df = training_set
cleanD_training_set.build_conj_dataframe(conj_command_set, conj_command_setting_set=conj_command_setting_set)

train_conj = cleanD_training_set.clean_df_conj[:]
print("scaled train=\n", train_conj.head(10))

Now we begin writing the model and do the training.

# Model
tf.logging.set_verbosity(tf.logging.ERROR)
regressor_set = []
for i in range(len(LABEL)):
    regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_cols, 
                                              activation_fn = activationfn_set[i], hidden_units=hiddenunit_set[i],
                                             model_dir=LABEL[i])
#                                             optimizer = tf.train.GradientDescentOptimizer(learning_rate= 0.1))
    regressor_set.append(regressor)

# Reset the index of training
training_set.reset_index(drop = True, inplace =True)
def input_fn(data_set, one_label, pred = False):
    # one_label is the element of LABEL
    if pred == False:
        
        feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
        labels = tf.constant(data_set[one_label].values)
        
        return feature_cols, labels

    if pred == True:
        feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
        
        return feature_cols
    
# TRAINING HERE
for i in range(len(LABEL)):
    regressor_set[i].fit(input_fn=lambda: input_fn(train_conj, LABEL[i]), steps=step_set[i])

We are done here! The models are saved as output1 and output2 folders. We will load the models in the next part, in this link.


Rotation3D

Initial motivation. This class of object is created with the motivation of rotating object quickly to the z-axis. This is reflected by the function rotate_towards_z_axis() and rotate_all_towards_z_axis(). It is important to note that rotating an object to the z-axis can be performed in a number of ways. In this example, rotation is done first against the azimuth angle, then against the polar angle. Many variations exist, including how we might need to rotate all points against centre of mass rather than mean position etc.

kero.werithmetic.rotation.py

class Rotation3D:
  def __init__(self):
    self.x_set
    self.y_set
    self.z_set
    self.xrot_set
    self.yrot_set
    self.zrot_set
    self.x0
    self.y0
    self.z0
  def get_spherical_coords(self, x, y, z, smallxy=True, precisionxy=1e-15)
    # see example 1
    return r, phi, theta
  def rotate_towards_z_axis(self, state, current_phi, current_theta):
    # see example 2     
    return out
  def rotate_all_towards_z_axis(self):
    # see example 3
    return
  
  # extrinsic rotations
  def rotate_wrt_x(self, state, angle):
    return out
  def rotate_wrt_y(self, state, angle):
    return out
  def rotate_wrt_z(self, state, angle):
    return out

 

Properties Description
x_set, y_set, z_set List of x, y and z position of all particles before rotation, in Cartesian Coordinates.
xrot_set, yrot_set, zrot_set List of x, y and z position of all particles after rotation, in Cartesian Coordinates.
x0, y0, z0 mean positions of all particles, in Cartesian Coordinates.

Example 1: get_spherical_coords()

x, y, z Double. Position in Cartesian Coordinates.
smallxy Boolean.

If true, then the function checks if x and y are both below the value precisionxy. If so, then azimuth angle is set to zero immediately.

 

This is set to true if we anticipate singularity or some precision issue. If such situation arises, be extra careful or consider different approach.

precisionxy Double.

Set this to a very small number, and it is defaulted to 1e-15.

return

r, phi, theta

Double. Position in spherical coordinates.
import kero.werithmetic.rotation as rot

x, y, z = 0.1, 0.1, 10
ro = rot.Rotation3D()
r, phi, theta = ro.get_spherical_coords(x, y, z)
print("r, phi, theta =", r, phi, theta)

The output is the following, where r is in the same unit as x, y and z, while phi (polar angle), theta (azimuthal angle) are in radian.

r, phi, theta = 10.000999950005 0.7853981633974483 0.014141192927807654

 

Example 2: rotate_towards_z_axis()

state Numpy matrix, 3 by 1. This is to represent an initial 3D position vector that we want to rotate through, first, azimuth angle, specified by the negative of current_phi, and second, polar angle, specified by the negative of current_theta.
current_phi Double. Current azimuth angle of the vector.
current_theta Double. Current polar angle of the vector.
return

out

Numpy matrix, 3 by 1. This is the rotated vector, see the description of state.
import kero.werithmetic.rotation as rot
import numpy as np

ro = rot.Rotation3D()
x, y, z = -0.1, 0.54, 2
state = np.matrix([[x], [y], [z]])
r, phi, theta = ro.get_spherical_coords(x, y, z)
print("BEFORE: r, phi, theta =", r, phi, theta)
current_phi = phi
current_theta = theta
state_after = ro.rotate_towards_z_axis(state, current_phi, current_theta)
# print(state_after)
x1 = state_after.item(0)
y1 = state_after.item(1)
z1 = state_after.item(2)
print("x,y,z = ",x,y,z)
r1, phi1, theta1 = ro.get_spherical_coords(x1, y1, z1)
print("AFTER: r, phi, theta =", r1, phi1, theta1)
print("x,y,z = ",x1, y1, z1)

The output is the following.

BEFORE: r, phi, theta = 2.0740298937093455 -1.3876855095324125 0.2679855592098951
x,y,z = -0.1 0.54 2
AFTER: r, phi, theta = 2.074029893709346 0.0 0.5359711184197905
x,y,z = -1.0591577496072508 -2.7755575615628914e-17 1.7831951271374942

Example 3 is like this example applied to many particles, but the rotation is done through the angles of the mean position of the particles.

Example 3: rotate_all_towards_z_axis()

This function is to be used after x_set, y_set and z_set properties of the object Rotation3D have been set. The function does not take in any arguments. Neither does it returns values. Instead, it sets xrot_set, yrot_set and zrot_set properties of the object, which are the rotated positions.

In this example, we show a number of particles are rotated via rotate_all_towards_z_axis(). This example generates 100 particles whose positions are initiated to x_set, y_set and z_set properties of the object Rotation3D. and compute their mean position. The polar and azimuth angles of the mean position are computed. And then, finally, each particle is first rotated through the negative of the azimuth angle of mean position w.r.t z axis, and then through negative of the polar angle of mean position w.r.t y axis.

import kero.werithmetic.rotation as rot
import numpy as np

ro = rot.Rotation3D()
no_of_particles = 100
ro.x_set = np.random.normal(loc=1, scale=0.05, size=no_of_particles)
ro.y_set = np.random.normal(loc=1, scale=0.05, size=no_of_particles)
ro.z_set = np.random.normal(loc=1, scale=0.5, size=no_of_particles)
ro.rotate_all_towards_z_axis()
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
print(ro.xrot_set)
print(ro.yrot_set)
print(ro.zrot_set)
fig = plt.figure()
fig.patch.set_facecolor('white')
ax = fig.add_subplot(111, projection='3d')
ax.scatter(ro.x_set, ro.y_set, ro.z_set)
ax.scatter(ro.xrot_set, ro.yrot_set, ro.zrot_set, c='r')
ax.scatter(np.linspace(-2 ,2 ,100), np.zeros(100) , np.zeros(100), c='k', s=1)
ax.scatter( np.zeros(100), np.linspace(-2, 2, 100) ,np.zeros(100), c='k', s=1)
ax.scatter(np.zeros(100), np.zeros(100) ,np.linspace(-2, 2, 100),  c='k', s=1)
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
# plt.xlim([-2,2])
# plt.ylim([-2, 2])
ax.set_xlim([-2 ,2])
ax.set_ylim([-2, 2])
ax.set_zlim([-2, 2])
plt.show()

The example rotation is shown in the left figure, while the right figure shows the two steps done to achieve the said rotation.

rotation_to_z.JPG

 

Example 4: extrinsic rotations

The functions rotate_wrt_x( self, state, angle ),  rotate_wrt_y( self, state, angle ) , rotate_wrt_y( self, state, angle ) perform extrinsic rotations.

Given an initial XYZ coordinates, points or vectors wrt XYZ. For extrinsic rotation, we have an absolute xyz coordinates that initially coincides with XYZ and we always rotate with respect to x/y/z axis of xyz. The coordinates xyz themselves remain unchanged. Note that in this description, points or vectors are constant wrt XYZ, but rotated wrt xyz.

state Numpy matrix, 3 by 1. Initial 3D position vector.
angle Double, the angle of rotation with respect to the absolute x/y/z axis
out Numpy matrix, 3 by 1. Final 3D position vector.

In this example, we show one single point (yellow arrow) rotate through almost 360 degrees or 2*pi radians wrt x (blue), y (green) and z (red) axes respectively.

exrot.PNG

kero version: 0.3 and above

index arithmetic

kero.werithmetic.index_arithmetic.py

class index_arith:
  def __init__(self, full_list): 
    self.listform  
    self.N_set
    self.max_index
    self.linear_to_index_map
    self.index_to_linear_map
    return
  def get_index_by_linear(self, linear_index):
    # see example 2
    return
  def get_linear_by_index(self, index)
    # see example 2
    return
  def get_element_by_index(self, index):
    # see example 3
    return
  def get_element_by_linear_index(self, linear_index):
    # see example 3
    return

 

Properties Description
listform List of list. This is the list whose indices will be computed. The property is initiated when the index_arith object is initiated.
N_set List of integers. k-th entry of N_set is the number of elements in the k-th entry of listform.
max_index List of integers, this is the same as N_set, except every entry is subtracted by 1.
linear_to_index_map List. k-th entry of the list is the index. See example 1 for a clearer understanding.
Index_to_linear_map List of list. Index-th entry of the list is the linear index. See example 1 for a clearer understanding.

Given a list, for example [[1,2],[10,20]], we may want to access the elements in an orderly manner. A natural way to arrange this linearly is, for example, to assign 0 to 1, 1 to 2, 2 to 10 and 3 to 20. But we can arrange it using tuple indices like (0,0) to 1, (1,0) to 2, (0,1) to 10 and (1,1) to 20. For whatever reason, we might want to know either of the indices.

Example 1

import kero.werithmetic.index_arithmetic as wer
import numpy as np

mylist = [[1, 2, 3, 4], [0.1 ,0.3, 0.3], [66 ,77, 88]]
ia = wer.index_arith(mylist)

for i in range(ia.no_of_elements):
    temp = ia.linear_to_index_map[i]
    temp2 = ia.index_to_linear_map[tuple(temp)]
    print(temp2, " : " ,temp)

linear_to_index_map is the property of this object that stores the tuple indices, for example [0,0,0], [1,0,0] etc, and the key to access each of these indices is called the linear index, for example 0, 1, 2 etc. On the other hand, index_to_linear_map is the reverse; it is property of this object that stores the linear index 0,1,2 etc with the tuple indices [0,0,0], [1,0,0] etc as the key. Since mylist has

The output is the following.

0 : [0, 0, 0]
1 : [1, 0, 0]
2 : [2, 0, 0]
3 : [3, 0, 0]
4 : [0, 1, 0]
5 : [1, 1, 0]
6 : [2, 1, 0]
7 : [3, 1, 0]
8 : [0, 2, 0]
9 : [1, 2, 0]
10 : [2, 2, 0]
11 : [3, 2, 0]
12 : [0, 0, 1]
13 : [1, 0, 1]
14 : [2, 0, 1]
15 : [3, 0, 1]
16 : [0, 1, 1]
17 : [1, 1, 1]
18 : [2, 1, 1]
19 : [3, 1, 1]
20 : [0, 2, 1]
21 : [1, 2, 1]
22 : [2, 2, 1]
23 : [3, 2, 1]
24 : [0, 0, 2]
25 : [1, 0, 2]
26 : [2, 0, 2]
27 : [3, 0, 2]
28 : [0, 1, 2]
29 : [1, 1, 2]
30 : [2, 1, 2]
31 : [3, 1, 2]
32 : [0, 2, 2]
33 : [1, 2, 2]
34 : [2, 2, 2]
35 : [3, 2, 2]

Example 2

In this example. we want to compute what is the tuple index if we know the linear index (specified as the variable choose_linear_index) and vice versa. These are done using get_index_by_linear() and get_linear_by_index() respectively.

import kero.werithmetic.index_arithmetic as wer
import numpy as np

mylist = [[1, 2, 3, 4], [0.1, 0.3, 0.3], [66, 77, 88]]
ia = wer.index_arith(mylist)
choose_linear_index = [0, 1, 11, 12, 20, 21]
for x in choose_linear_index:
    ind = ia.get_index_by_linear(x)
    linear_index = ia.get_linear_by_index(ind)
    print("index = ", ind, "// linear index = ", linear_index)

The output is the following.

index = [0, 0, 0] // linear index = 0
index = [1, 0, 0] // linear index = 1
index = [3, 2, 0] // linear index = 11
index = [0, 0, 1] // linear index = 12
index = [0, 2, 1] // linear index = 20
index = [1, 2, 1] // linear index = 21

Example 3

In this example,

import kero.werithmetic.index_arithmetic as wer
import numpy as np

mylist = [[1, 2, 3, 4], [0.1, 0.3, 0.3], [66, 77, 88]]
ia = wer.index_arith(mylist)
choose_linear_index = range(ia.no_of_elements)
# choose_linear_index = [0, 1, 11, 12, 20, 21]
# choose_linear_index = [0]
for x in choose_linear_index:
    ind = ia.get_index_by_linear(x)
    linear_index = ia.get_linear_by_index(ind)
    print("index = ", ind, "// linear index = ", linear_index)
    print(" - by linear : ", ia.get_element_by_linear_index(x))
    print(" - by index  : ", ia.get_element_by_index(ind))

In this example, we show how to get the element of the list given either the linear index or the tuple index. These are done using get_element_by_linear_index() and get_element_by_index() respectively.

index = [0, 0, 0] // linear index = 0
- by linear : [1, 0.1, 66]
- by index : [1, 0.1, 66]
index = [1, 0, 0] // linear index = 1
- by linear : [2, 0.1, 66]
- by index : [2, 0.1, 66]
index = [2, 0, 0] // linear index = 2
- by linear : [3, 0.1, 66]
- by index : [3, 0.1, 66]
index = [3, 0, 0] // linear index = 3
- by linear : [4, 0.1, 66]
- by index : [4, 0.1, 66]
index = [0, 1, 0] // linear index = 4
- by linear : [1, 0.3, 66]
- by index : [1, 0.3, 66]
index = [1, 1, 0] // linear index = 5
- by linear : [2, 0.3, 66]
- by index : [2, 0.3, 66]

kero version: 0.1 and above

Deep Neural Network Regressor

home > kero > Machine Learning

DNN Regressor

  1. Part 1. Synthetic Regression
  2. Part 2.1. Synthetic Regression. Train and save the model.
  3. Part 2.2. Synthetic Regression. Load model and predict.

The following codes can be found here in folder DNN regression, python under the name synregMVar.ipynb (Jupyter notebook) and synregMVar.py format.

Prerequisites: make sure tensorflow works in your machine.

Sometimes we have large volume of data we can use to predict continuous values. For example, as seen here, given the features of a house, we may want to predict what is the price of the house. This is as opposed to, say, MNIST, which is a classification problem. In MNIST, given an image of handwritten number, we want to use machine learning to tell us which number it is. This is not a very trivial task for the machine, since, for example, number 4, when written fast or blurred, can look like 9. In MNIST problem, we are classifying data (images) into either 0, 1, … or 9. But for the house pricing problem, the price can be continuous: it could be $1,234,000 or $623,691 or anything in decimal number. This is a regression problem.

Synthetic Regression

We call this synthetic regression because this is not based on any real data. We randomly generated the data points and create arbitrary formulas as shown below.

DNN reg.png

Figure 1. Randomly generated data table.

Our tasks are:

  1. generate a lot of data like the table above and use them to train the machine learning model,
  2. create more data like above, except without the columns output1 and output2. We then use the model we have trained to predict what output1 and output2 are. At the end of this post, a graph will be plotted to show how accurate the predicted values and the theoretically correct values.

Preparing Data

This section does only one thing: prepare training and testing data tables like the one shown in figure 1 above.

To simulate a regression problem, we create a table (data frame) made of columns of either boolean and double that can take a range of continuous values. This code will save regressionMVartest_train.csv and regressionMVartest_test.csv, the data set for training and testing respectively. Note that we create a table of 8 columns and two extra columns specified by temp and temp2 (see blue highlight in the code below). These 2 columns are the outputs of the table that depend on the other 8 columns, called “output1” and “output2”.

I recommend using Jupyter so that we can slowly see the output of each process in steps.

Also, note that hidden_unit_set, step_set and activationfn_set are lists of 2 different settings, one for “output1” and the other for “output2”. The variable no_of_data_points = [2500, 1000] specifies the number of training data and test data we want to generate. In this example regressionMVartest_train.csv will have 2500 rows of data and regressionMVartest_test.csv will have 1000 rows of data. Do play around with other variables to obtain optimum prediction (ideally we should do hyper-parameter tuning). For example, set the number of steps in step_set higher or to None in order to allow for more number of model updates.

Note that regressionMVartest_test.csv does not have output columns, since the output are to be predicted by us. However, we know the theoretically correct values of the outputs since we specify the exact formula. These values will be saved separately in regressionMVartest_test_correctans.csv for double checking.

import kero.DataHandler.RandomDataFrame as RDF
import kero.DataHandler.DataTransform as dt
from kero.DataHandler.Generic import *

import numpy as np
import pandas as pd
import tensorflow as tf
import itertools
import matplotlib.pyplot as plt
import matplotlib
from sklearn.model_selection import train_test_split
from scipy.stats.stats import pearsonr
from pylab import rcParams

# CONTROL PANEL

# example 1
hiddenunit_set=[[8,4],[8,4]]
step_set=[6400, 6400]
activationfn_set=[tf.nn.relu, tf.nn.relu]

no_of_data_points = [2500, 1000] # number of rows for training and testing data sets to be generated.
puncture_rate=0.001

rdf = RDF.RandomDataFrame()
####################################################
# Specify the input variables here
####################################################
FEATURES = ["first","second","third", "fourth","bool1", "bool2", "bool3", "bool4"]
output_label="output1" # !! List all the output column names
output_label2= "output2"
col1 = {"column_name": "first", "items": list(range(4))}
col2 = {"column_name": "second", "items": list(np.linspace(10, 20, 8))}
col3 = {"column_name": "third", "items": list(np.linspace(-100, 100, 1250))}
col4 = {"column_name": "fourth", "items": list(np.linspace(-1, 1, 224))}
col5 = {"column_name": "bool1", "items": [0, 1]}
col6 = {"column_name": "bool2", "items": [0, 1]}
col7 = {"column_name": "bool3", "items": [0, 1]}
col8 = {"column_name": "bool4", "items": [0, 1]}


LABEL = [output_label, output_label2]

for toggley in [0, 1]: # once for train.csv once for test.csv
    rdf.initiate_random_table(no_of_data_points[toggley], col1, col2, col3, col4, col5, col6, col7, col8, panda=True)
    # print("clean\n", rdf.clean_df)

    df_temp = rdf.clean_df
    listform, _ = dt.dataframe_to_list(df_temp)
    
    
    ########################################################
    # Specify the system of equations which determines
    # the output variables.
    ########################################################
    tempcol = []
    tempcol2 = []
    gg = listform[:]

    ########## Specifiy the name(s) of the output variable(s) ##########
    

    listform = list(listform)
    for i in range(len(listform[0])):
        # example 0 (very easy)
#             temp = gg[0][i] + gg[1][i] + gg[2][i] + gg[3][i] + gg[4][i] + gg[5][i] + gg[6][i] + gg[7][i]
#             temp2 = gg[0][i] - gg[1][i] + gg[2][i] - gg[3][i] + gg[4][i] - gg[5][i] + gg[6][i] - gg[7][i]

        # example 1
        temp = gg[0][i]**2 + gg[1][i] + gg[2][i] + (gg[4][i] + gg[5][i])*gg[3][i] + gg[6][i] + gg[7][i]
        temp2 = gg[0][i] - gg[1][i]**2 + gg[2][i] - gg[3][i]*(0.5*(gg[6][i] - gg[7][i])) + gg[4][i] - gg[5][i] 
        ########################################
        tempcol = tempcol + [temp]
        tempcol2 = tempcol2 + [temp2]
    if toggley==0:
        listform = listform + [tempcol, tempcol2]
        column_name_list = FEATURES + LABEL
    else:
        correct_test_df = pd.DataFrame(np.transpose([tempcol, tempcol2]),columns=LABEL)
        correct_test_df.to_csv("regressionMVartest_test_correctans.csv", index=False)
        column_name_list = FEATURES
    # for i in range(len(listform)):
    #     print(column_name_list[i], '-', listform[i])
    ########################################################
    

    listform = transpose_list(listform)
    # print(listform)
    # print(column_name_list)
    temp_df = pd.DataFrame(listform, columns=column_name_list)
    rdf.clean_df = temp_df
    # print(rdf.clean_df)

    if toggley==0:
        rdf.crepify_table(rdf.clean_df, rate=puncture_rate)
        # print("post crepfify\n", rdf.crepified_df)
        rdf.crepified_df.to_csv("regressionMVartest_train.csv", index=False)
    else:
        rdf.crepify_table(rdf.clean_df, rate=0)
        # print("post crepfify\n", rdf.crepified_df)
        rdf.crepified_df.to_csv("regressionMVartest_test.csv", index=False)

Pre-processing

Alright, we have created some random data tables. In the following code, we first load the csv files we have created and store them as pandas data frame. We load the training data set, extract only the clean part i.e. we drop all the defective data rows, and use them to train our model. Also, the train data are further split into training and testing sets using train_test_split() from scikit. We then have variables x_train, x_test, y_train, y_test, where x refers to the 8 columns, and y the 2 outputs, and each x and y are split to training part and testing part of the training set. The testing part is used to validate the result of training that we have performed along the way.

df_train = pd.read_csv(r"regressionMVartest_train.csv")
print('df train shape =',df_train.shape)
# print("train data:\n", df_train.head())
cleanD_train, crippD_train, _ = dt.data_sieve(df_train)  # cleanD, crippD, origD'
cleanD_train.get_list_from_df()
df_train_clean = cleanD_train.clean_df
df_train_crippled = crippD_train.crippled_df

print("clean train data:\n", df_train.head())
print('df train clean shape =',df_train_clean.shape)
if df_train_crippled is not None:
    print('df train crippled shape =',df_train_crippled.shape)
else:
    print('df train: no defect')

colname_set_train = df_train.columns

# prepare
dftr=df_train_clean[:]
print(list(dftr.columns))
train = dftr


COLUMNS = FEATURES + LABEL
feature_cols = [tf.contrib.layers.real_valued_column(k) for k in FEATURES]
training_set = train[COLUMNS]
prediction_set = train[LABEL]


x_train, x_test, y_train, y_test = train_test_split(training_set[FEATURES] , prediction_set, test_size=0.33, random_state=42)
y_train = pd.DataFrame(y_train, columns = LABEL)
training_set = pd.DataFrame(x_train, columns = FEATURES).merge(y_train, left_index = True, right_index = True)
print(training_set.head()) # NOT YET PREPROCESSED
y_test = pd.DataFrame(y_test, columns = LABEL)
testing_set = pd.DataFrame(x_test, columns = FEATURES).merge(y_test, left_index = True, right_index = True)
print(testing_set.head()) # NOT YET PREPROCESSED
print("training size = ", training_set.shape)
print("test size = ", testing_set.shape)

Here we show some rows of the training data.

    first     second      third    fourth  bool1  bool2  bool3  bool4  \
0    0.0  10.000000 -41.072858 -0.345291    1.0    1.0    1.0    1.0   
1    3.0  11.428571  49.719776 -0.076233    0.0    1.0    1.0    1.0   
2    0.0  20.000000  -4.243395 -0.766816    0.0    1.0    0.0    1.0   
3    3.0  17.142857  15.292234 -0.811659    0.0    1.0    0.0    0.0   
4    1.0  20.000000 -55.164131 -0.363229    1.0    0.0    0.0    1.0
     output1     output2  
0 -29.763441 -141.072858  
1  72.072114  -78.892469  
2  15.989789 -405.626803  
3  40.623432 -276.585317  
4 -33.527360 -453.345746

The next part of the code does the main pre-processing. In this example we scale columns “second”, “third” and “fourth” down. The original_scale [xmin,xmax] tells the algorithm we should treat the column as having maximum xmax and minimum xmin. In this example, we specify them through range_second etc. The scale specifies the final scale for the column. For example if the column is about temperature in K ranging from 0 to 1000, we can set original_scale=[0,1000], and scale it down to scale=[0,1]. But, say, if we anticipate further incoming data to take in higher temperature, we can set original_scale=[0,5000] as well, and the scaling will be performed accordingly. In the section, we transform the original data frame to the conjugate data frame. The latter is the data frame that the machine can understand better or process more easily. Details can be found here. We do transformation for both training and testing part of the training data set.

range_second = [10,20]
range_third = [-100,100]
range_fourth = [-1,1]
# range_input_set = [range_second, range_third, range_fourth]
range_output1 = [-200,200]
range_output2 = [-600,600]
range_output_set = {'output1': range_output1, 'output2': range_output2}
conj_command_set = {COLUMNS[0]: "",
                    COLUMNS[1]: "cont_to_scale",
                    COLUMNS[2]: "cont_to_scale",
                    COLUMNS[3]: "cont_to_scale",
                    COLUMNS[4]: "",
                    COLUMNS[5]: "",
                    COLUMNS[6]: "",
                    COLUMNS[7]: "",
                    # OUTPUT
                    COLUMNS[8]: "cont_to_scale",
                    COLUMNS[9]: "cont_to_scale",
                   }
scale_output1 = [0,1]
scale_output2 = [0,1]
scale_output_set = {'output1': scale_output1, 'output2': scale_output2}
cont_to_scale_settings_second = {"scale": [-1, 1], "mode": "uniform", "original_scale":range_second}
cont_to_scale_settings_third = {"scale": [0, 1], "mode": "uniform", "original_scale":range_third}
cont_to_scale_settings_fourth = {"scale": [0, 1], "mode": "uniform", "original_scale":range_fourth}
cont_to_scale_settings_output1 = {"scale": scale_output1 , "mode": "uniform", "original_scale":range_output1}
cont_to_scale_settings_output2 = {"scale": scale_output2 , "mode": "uniform", "original_scale":range_output2}
conj_command_setting_set = {COLUMNS[0]: None,
                            COLUMNS[1]: cont_to_scale_settings_second,
                            COLUMNS[2]: cont_to_scale_settings_third,
                            COLUMNS[3]: cont_to_scale_settings_fourth,
                            COLUMNS[4]: None,
                            COLUMNS[5]: None,
                            COLUMNS[6]: None,
                            COLUMNS[7]: None,
                            # OUTPUT
                            COLUMNS[8]: cont_to_scale_settings_output1,
                            COLUMNS[9]: cont_to_scale_settings_output2,
                            }
cleanD_training_set = dt.clean_data()   
cleanD_training_set.clean_df = training_set
cleanD_training_set.build_conj_dataframe(conj_command_set, conj_command_setting_set=conj_command_setting_set)

train_conj = cleanD_training_set.clean_df_conj[:]
print("scaled train=\n", train_conj.head(10))

# Same thing (preprocessing) but for the test part of the training set

cleanD_testing_set = dt.clean_data() 
cleanD_testing_set.clean_df = testing_set
cleanD_testing_set.build_conj_dataframe(conj_command_set, conj_command_setting_set=conj_command_setting_set)

test_conj = cleanD_testing_set.clean_df_conj[:]
print("scaled test=\n", test_conj.head(10))
print(test_conj.shape)
print(testing_set.shape)

Just a reminder: what are we doing above? Our data may come in many sort of shapes and values. What we are doing is pre-processing, i.e. to scale the numbers down to typically the scale of -1 to 1, so that the algorithm can process the data by treating each data fairly at the start**. Here we show some rows of the transformed table.

    first    second     third    fourth  bool1  bool2  bool3  bool4   output1  \
0    1.0 -0.142857  0.739792  0.139013    1.0    0.0    1.0    1.0  0.661305   
1    2.0  0.428571  0.343475  0.865471    0.0    1.0    0.0    0.0  0.476422   
2    0.0 -0.428571  0.633307  0.228700    0.0    0.0    0.0    0.0  0.598796   
3    1.0  1.000000  0.392314  0.941704    1.0    1.0    0.0    0.0  0.503074   
4    0.0 -0.428571  0.034428  0.448430    0.0    0.0    0.0    0.0  0.299357   
5    3.0  1.000000  0.973579  0.627803    1.0    0.0    1.0    0.0  0.812428   
6    1.0  0.714286  0.351481  0.385650    1.0    0.0    1.0    0.0  0.476597   
7    1.0  0.714286  0.052042  0.789238    1.0    1.0    0.0    1.0  0.330342   
8    2.0  0.428571  0.898319  0.170404    0.0    1.0    0.0    1.0  0.752868   
9    1.0 -0.142857  0.498799  0.035874    0.0    1.0    0.0    0.0  0.535293   

    output2  
0  0.371564  
1  0.229848  
2  0.384463  
3  0.149552  
4  0.284649  
5  0.248823  
6  0.189594  
7  0.139000  
8  0.322047  
9  0.329732

Training Machine Learning Model

The next portion does the model training. This is where Deep neural Network comes in. Notice that we define input_fn(), which is an artefact of tensorflow that requires a function to be defined for feeding data set into the algorithm. Do not sweat over it.

# Model
tf.logging.set_verbosity(tf.logging.ERROR)
regressor_set = []
for i in range(len(LABEL)): # in our example, we create 2 regressors, one for output1 and the other for output2.
    regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_cols, 
                                              activation_fn = activationfn_set[i], hidden_units=hiddenunit_set[i])#,
#                                             optimizer = tf.train.GradientDescentOptimizer(learning_rate= 0.1))
    regressor_set.append(regressor)

# Reset the index of training
training_set.reset_index(drop = True, inplace =True)
def input_fn(data_set, one_label, pred = False):
    # one_label is the element of LABEL
    if pred == False:
        
        feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
        labels = tf.constant(data_set[one_label].values)
        
        return feature_cols, labels

    if pred == True:
        feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
        
        return feature_cols

# TRAINING HERE
for i in range(len(LABEL)):
    regressor_set[i].fit(input_fn=lambda: input_fn(train_conj, LABEL[i]), steps=step_set[i])
    
# Evaluation on the test set created by train_test_split
print("Final Loss on the testing set: ")
predictions_prev_set = []
for i in range(len(LABEL)):
    ev = regressor_set[i].evaluate(input_fn=lambda: input_fn(test_conj,LABEL[i]), steps=1)
    loss_score1 = ev["loss"]
    print(LABEL[i],"{0:f}".format(loss_score1))
    # Predictions
    y = regressor_set[i].predict(input_fn=lambda: input_fn(test_conj,LABEL[i]))
    predictions_prev = list(itertools.islice(y, test_conj.shape[0]))
    predictions_prev_set.append(predictions_prev)
print("predictions_prev_set length = ",len(predictions_prev_set))

Recall that we have separated our training data set to training and testing parts. We use the training part for DNN model training, and use the testing parts to test the result of our model training. The plots are shown below; since they plot the real vs predicted values, the closer are the points y=x, the better the prediction is. Pearson correlation coefficients are used as auxiliary measures. A good prediction has coefficient close to 1.0. However, a prediction with systematic error can still have 1.0 coefficient, for example when the values are shifted by a constant. Note that we do not scale the output values back to their original values (this can be easily done), since the point is to show accuracy of the trained model.

corrcoeff_set = []
predictions_set = []
reality_set = []
print("pearson correlation coefficients  =  ")

for i in range(len(LABEL)):
    # print(LABEL[i]," : ",init_scale_max ,init_scale_min)
    # need to inverse transform #
    # since prediction is in conj form
    initial_scale = [range_output_set[LABEL[i]][0],range_output_set[LABEL[i]][1]]
    orig_scale = [scale_output_set[LABEL[i]][0],scale_output_set[LABEL[i]][1]]
    pred_inv = dt.conj_from_cont_to_scaled(predictions_prev_set[i], scale=initial_scale, mode="uniform",original_scale=orig_scale)
    #############################
    predictions = pd.DataFrame(pred_inv,columns = ['Prediction'])
    predictions_set = predictions_set + [pred_inv] # a list, or column
    # predictions.head()
    
    reality = testing_set[LABEL[i]].values # a list, or column
    reality_set = reality_set + [reality]
    # reality.head()
    corrcoeff=pearsonr(list(predictions.Prediction), list(reality))
    corrcoeff_set.append(corrcoeff)
    print(LABEL[i], " : ", corrcoeff)

matplotlib.rc('xtick', labelsize=20) 
matplotlib.rc('ytick', labelsize=20) 
for i in range(len(LABEL)):    
    fig, ax = plt.subplots()
#     plt.style.use('ggplot')
    plt.scatter(predictions_set[i], reality_set[i],s=3, c='r', lw=0) # ,'ro'
    plt.xlabel('Predictions', fontsize = 20)
    plt.ylabel('Reality', fontsize = 20)
    plt.title('Predictions x Reality on dataset Test: '+LABEL[i], fontsize = 20)

    ax.plot([reality_set[i].min(), reality_set[i].max()], [reality_set[i].min(), reality_set[i].max()], 'k--', lw=4)

The predictions for output 1 and 2 are shown here.

symvaroutput.PNG

Finally, the following save the output of the prediction in another separate csv called synregMVar_submission.csv. The result is compared with the theoretically correct outputs and plotted below.

df_test = pd.read_csv(r"regressionMVartest_test.csv")
print('df test shape =',df_test.shape)
# print("\ntest data", df_test.head())
cleanD_test, crippD_test, _ = dt.data_sieve(df_test) 
cleanD_test.get_list_from_df()
df_test_clean = cleanD_test.clean_df
df_test_crippled = crippD_test.crippled_df

print("\ntest data", df_test.head())
if df_test_crippled is not None:
    print('df test crippled shape =',df_test_crippled.shape)
else:
    print('df test: no defect')
print('df test clean shape =',df_test_clean.shape)

conj_command_set_test = {FEATURES[0]: "",
                    FEATURES[1]: "cont_to_scale",
                    FEATURES[2]: "cont_to_scale",
                    FEATURES[3]: "cont_to_scale",
                    FEATURES[4]: "",
                    FEATURES[5]: "",
                    FEATURES[6]: "",
                    FEATURES[7]: "",
                   }
conj_command_setting_set_test = {FEATURES[0]: None,
                            FEATURES[1]: cont_to_scale_settings_second,
                            FEATURES[2]: cont_to_scale_settings_third,
                            FEATURES[3]: cont_to_scale_settings_fourth,
                            FEATURES[4]: None,
                            FEATURES[5]: None,
                            FEATURES[6]: None,
                            FEATURES[7]: None,
                            }

# Same thing (preprocessing) but for the test set

cleanD_test.build_conj_dataframe(conj_command_set_test, conj_command_setting_set=conj_command_setting_set_test)

test_predict_conj = cleanD_test.clean_df_conj[:]
print("scaled test=\n", test_predict_conj.head(10))
print(df_test.shape)
print(test_predict_conj.shape)

filename = "synregMVar_submission.csv"
df_test_correct_ans = pd.read_csv(r"regressionMVartest_test_correctans.csv")

y_predict_inv_set = []
for i in range(len(LABEL)):
    y_predict = regressor_set[i].predict(input_fn=lambda: input_fn(test_predict_conj , LABEL[i], pred = True))
    # need to transform back
    y_predict_before = list(itertools.islice(y_predict, df_test.shape[0]))
    # !!
    initial_scale = [range_output_set[LABEL[i]][0],range_output_set[LABEL[i]][1]]
    orig_scale = [scale_output_set[LABEL[i]][0],scale_output_set[LABEL[i]][1]]
    y_predict_inv = dt.conj_from_cont_to_scaled(y_predict_before, scale=initial_scale, mode="uniform",original_scale=orig_scale)
    y_predict_inv_set = y_predict_inv_set + [y_predict_inv]
    
    fig2, ax2 = plt.subplots()
    real_test = np.array(list(df_test_correct_ans[LABEL[i]]))
    plt.scatter(y_predict_inv, real_test,s=3, c='r', lw=0) # ,'ro'
    plt.xlabel('Predictions', fontsize = 20)
    plt.ylabel('Reality', fontsize = 20)
    plt.title('Predictions x Reality on dataset Test: '+LABEL[i], fontsize = 20)

    ax2.plot([real_test.min(), real_test.max()], [real_test.min(), real_test.max()], 'k--', lw=4)
y_predict_inv_set = transpose_list(y_predict_inv_set)
# print(y_predict_inv_set)
y_predict_for_csv = pd.DataFrame(y_predict_inv_set, columns = LABEL)
y_predict_for_csv.to_csv(filename, index=False)

symvaroutput_subm

In the next installment, we split similar code into 2. The first part will train the model and save the model externally. Then in the second part, the model will be loaded, used for prediction, trained further, and again used for prediction. Cheers!

Some comments

** We scale the data with the intention of letting the algorithm train better. However, this is contestable and further research is necessary. Some columns may carry more weight than others, and it might be the case that more significant or “heavier” columns may be easier to process if they are scaled at larger values.

repair_table

home > kero > Documentation

kero.DataHandler.DataTransform.py

class original_data:
  def repair_table(self, repair_command_settings=None): 
    return
repair_command_settings Dictionary. The key specifies the name of the column in the data frame to repair, and the value specifies the mode by which we repair the column. See the usage example below and the details of the mode here.

The object original_data will store the original data frame, possibly with defects. We call this function to create the repaired version of the data frame as the property of original_data object, repaired_df.

Example Usage 1

First, we create a randomly generated table with unique ID.

import numpy as np
import pandas as pd
import kero.DataHandler.RandomDataFrame as RDF
import kero.DataHandler.DataTransform as dt

rdf = RDF.RandomDataFrame()
# col0 : NOTE THAT IN THIS EXAMPLE we have column for unique ID
col1 = {"column_name": "first", "items": [1, 2, 3]}
itemlist = list(np.linspace(10, 20, 48))
col2 = {"column_name": "second", "items": itemlist}
col3 = {"column_name": "third", "items": ["gg", "not"]}
col4 = {"column_name": "fourth", "items": ["my", "sg", "id", "jp", "us", "bf"]}
col_out = {"column_name": "result", "items": ["classA", "classB", "classC"]}
rdf.initiate_random_table(20, col1, col2, col3, col4, col_out, panda=True, with_unique_ID="person")
rdf.crepify_table(rdf.clean_df, rate=0.1, column_index_exception=[0])  # do not crepify column 0
rdf.crepified_df.to_csv("check_repair_single_column.csv", index=False)

df = pd.read_csv(r"check_repair_single_column.csv")
cleanD, _, origD = dt.data_sieve(df)  # cleanD, crippD, origD

The example table randomly generated is as shown.

           ID  first     second third fourth  result
0    person0    2.0        NaN   not     us  classB
1    person1    2.0        NaN   not     sg  classC
2    person2    1.0  18.936170    gg     my  classC
3    person3    1.0        NaN   not     id  classC
4    person4    1.0  13.617021    gg     sg  classC
5    person5    2.0  13.617021   not     bf  classB
6    person6    1.0  13.404255    gg     my  classA
7    person7    1.0  19.148936   not     bf  classB
8    person8    2.0        NaN    gg     id  classB
9    person9    3.0  13.829787    gg    NaN  classB
10  person10    1.0  12.127660   NaN     id  classC
11  person11    3.0  18.510638   not     sg  classA
12  person12    1.0  19.148936   not     us     NaN
13  person13    1.0  16.595745    gg     sg  classC
14  person14    2.0  15.957447   not     sg  classC
15  person15    3.0  12.978723   not     bf  classB
16  person16    NaN  16.170213   not     my  classA
17  person17    1.0  18.510638    gg     bf  classB
18  person18    1.0        NaN   not     jp  classC
19  person19    1.0  18.723404    gg     my  classB

In the following, we show the repaired table, i.e. the table with all the defective portions replaced according to the commands we specify in the settings. For a more detailed information, see repair_single_column, which is the function called by this repair table function for each column.

# Repair Choices :
# 1. mean
# 2. mean_floor
# 3. mean_ceil
# 4. max_occuring
# 5. min_occuring
# 6. mid_occuring
repair_setting = {
"ID": None,
"first": "mean_floor",
"second": "mean",
"third": "max_occuring",
"fourth": "max_occuring",
"result": "mid_occuring"
}
origD.initialize_dataframe_repair()
print(origD.original_df, "\n\nCOMPARE: repaired\n")
origD.repair_table(repair_command_settings=repair_setting)
print(origD.repaired_df)

The repaired table is the following.

COMPARE: repaired

          ID  first     second third fourth  result
0    person0    2.0  16.085106   not     us  classB
1    person1    2.0  16.085106   not     sg  classC
2    person2    1.0  18.936170    gg     my  classC
3    person3    1.0  16.085106   not     id  classC
4    person4    1.0  13.617021    gg     sg  classC
5    person5    2.0  13.617021   not     bf  classB
6    person6    1.0  13.404255    gg     my  classA
7    person7    1.0  19.148936   not     bf  classB
8    person8    2.0  16.085106    gg     id  classB
9    person9    3.0  13.829787    gg     sg  classB
10  person10    1.0  12.127660   not     id  classC
11  person11    3.0  18.510638   not     sg  classA
12  person12    1.0  19.148936   not     us  classA
13  person13    1.0  16.595745    gg     sg  classC
14  person14    2.0  15.957447   not     sg  classC
15  person15    3.0  12.978723   not     bf  classB
16  person16    1.0  16.170213   not     my  classA
17  person17    1.0  18.510638    gg     bf  classB
18  person18    1.0  16.085106   not     jp  classC
19  person19    1.0  18.723404    gg     my  classB

kero version: 0.1 and above