Utilities

Reading CSV files

Example usage: Create “tryy.csv” file containing rows and columns of decimal numbers. Run the following.

import csv
import numpy as np

def read_csv(filename, header_skip=0,get_the_first_N_rows = 0):
    # filename : string, name of csv file without extension
    # headerskip : nonnegative integer, skip the first few rows
    # get_the_first_N_rows : integer, the first few rows after header_skip to be read
    #   if all rows are to be read, set to zero
    
    out = []
    with open(str(filename)+'.csv') as csv_file:
        data_reader = csv.reader(csv_file)
        count = 0
        i = 0
        for row in data_reader:
            if count < header_skip:
                count = count + 1
            else:
                out.append(row)
                if get_the_first_N_rows>0:
                	i = i + 1
                	if i == get_the_first_N_rows:
                		break
    return out

# example
xx = read_csv('tryy', 0)
xx = [[np.float(x) for x in y] for y in xx] # nice here
print(xx)
print(np.sum(xx[0]))

 

 

MNIST Neural Network test 1

home > Machine Learning

To test MNIST using kero 0.6.3, I will use jupyter notebook in a virtual environment. Also, in this folder, place adhoc_utils.py containing the function read_csv() from here. I will use virtualenv as usual: see here. Then after activating the virtual environment, simply:

pip install jupyter
pip install kero
pip install matplotlib
pip install opencv-python
jupyter notebook

Download MNIST file that has been converted into CSV form; I got it from this link. Now, create the python notebook  mnist_dnn.ipynb (see below) and run all the cellsYou can find this test run and similar test runs here.

Unfortunately, it appears that the trained models only predict one single output for any input (it predicts only 6 for any image in one of the attempts, which is bad). Several possible issues and remarks include the following.

  1. There might be defective data points. Update: not likely, it is easy to check it with tested machine learning algorithm. I tried using keras on the same data here; training and prediction have been successful.
  2. Different loss functions are more suitable, check out, for example, KL divergence. Update: this is certainly more than meets the eye. See a tutorial from Stanford here. Using MSE, which is L2, appears to be harder to optimize. Use instead L1 norms like cross-entropy loss.
  3. This example uses no softmax layer at the end; in fact, using default Neural Network from kero, the final layer is activated using the same activation function (in this example, sigmoid function) as other layers. The maximum value at the output layer is taken as the predicted output.
  4. DNN has been treated like a black box; nobody quite knows what happens throughout the process in a coherent manner; in fact it could be just that the randomly initialized weights before training were not chosen in the right range. This might be interesting to study in the future (hopefully the experts come out with new insights soon).

All the above said, the little modification I did (before softmax) includes initiating all biases to zero instead of random and allow for options to generate random weights in a normalized manner (that depend on the number of neurons). I might change the interface a little, but in any case, seems like there might be more works to do! That’s all for now, happy new year!

mnist_dnn.ipynb

[1]

import numpy as np
import adhoc_utils as Aut
import matplotlib.pyplot as plt
import cv2, time

import kero.multib.NeuralNetwork as nn
import kero.utils.utils as ut

[2]

# Loading MNIST image data from csv files. 
# Also, binarize labels.
#
#

def simple_binarizer(mnist_label, bin_factor=1, bin_shift=0):
    # mnist_label: int, 0,1,2... or 9
    out = [0+bin_shift]*10
    out[mnist_label] = 1*bin_factor
    return np.transpose(np.matrix(out))
def convert_list_of_string_to_float(this_list):
    out = []
    for i in range(len(this_list)):
        out.append(float(this_list[i]))
    return out

bin_shift = 0
bin_factor = 1
img_width, img_height = 28, 28
pixel_normalizing_factor = 255
# read_csv returns list of list.
# good news is, the loaded data is already flattened.
mnist_train =  Aut.read_csv("mnist_train", header_skip=1,get_the_first_N_rows = 6400)
mnist_train_labels_binarized = [simple_binarizer(int(x[0]),bin_factor=bin_factor,bin_shift=bin_shift) for x in mnist_train]
mnist_train_data = [1/pixel_normalizing_factor*np.transpose(np.matrix(convert_list_of_string_to_float(x[1:]))) for x in mnist_train]
# 

# Uncomment this to print the binarized labels
#
for i in range(5):
    print(mnist_train[i][0] ,":",ut.numpy_matrix_to_list(mnist_train_labels_binarized[i]))

[3]

# Uncomment this to see the flattened image profile
#
# temp = mnist_train_data[0]
# print("max = ", np.max(temp))
# print("min = ", np.min(temp))
# mean_val = np.mean(temp)
# print("mean = ", mean_val)
# fig0 = plt.figure()
# ax0 = fig0.add_subplot(111)
# ax0.plot(range(len(temp)),temp)
# ax0.plot(range(len(temp)),[mean_val]*len(temp))

[4]

# To visualize the loaded data, uncomment and run this section.
#
# 

# mnist_train_labels = [x[0] for x in mnist_train]
# mnist_train_data_image_form = [np.array(x[1:]).reshape(img_height,img_width).astype(np.uint8) for x in mnist_train]

# data_length = len(mnist_train_data)
# for i in range(10):
#     if i < data_length:
#         print(mnist_train_data_image_form[i].shape,end=",")

# #  
# count=0
# title_set = []
# for label,img_data in zip(mnist_train_labels,mnist_train_data_image_form):
#     title = "count: "+str(count)+"| label: "+str(label)
#     title_set.append(title)
#     cv2.imshow(title, img_data)
#     cv2.resizeWindow(title, 300,300)
#     count = count + 1
#     if count == 5:
#         break
# cv2.waitKey(0)
# for title in title_set:
#     cv2.destroyWindow(title)

[5]

# input_set: list of numpy matrix [x], 
#   where each x is a column vector m by 1, m the size of input layer.
# Y0_set: list of numpy matrix [Y0],
#   where each Y0 is a column vector N by 1, N the size of output layer.
#   This is equal to 10, since it corresponds to labels 0,1,...,9.
#
#

input_set = mnist_train_data
Y0_set = mnist_train_labels_binarized
number_of_neurons = [784,28,10]
lower_bound, upper_bound = 0 ,1
bounds = [lower_bound, upper_bound]
bulk = {
    "number_of_neurons" : number_of_neurons,
    "bounds": bounds,
    "layerwise_normalization": True,
}

NeuralNetwork = nn.NeuralNetwork()
NeuralNetwork.learning_rate = 1
NeuralNetwork.initiate_neural_network(bulk, mode="UniformRandom",
    verbose = False,
    verbose_init_mode=False,
    verbose_consistency=False)

nu = nn.NetworkUpdater()
nu.set_settings(method="RegularStochastic",
method_specific_settings={
        "batch_size":8,
        "no_of_epoch":32,
        "shuffle_batch":True,
})
nu.set_training_data(input_set,Y0_set)
nu.set_neural_network(NeuralNetwork)

[6]

AF = nn.activationFunction(func = "Sigmoid")
start = time.time()
weights_next, biases_next, mse_list = nu.update_wb(input_set, Y0_set, 
                NeuralNetwork.weights, NeuralNetwork.biases, AF,
                mse_mode="compute_and_print", verbose=11)
end = time.time()
elapsed = end - start

[7]

print("epoch | mse value ")
mark = 1
for i in range(len(mse_list)):
    if mark >= 0.1*len(mse_list) or i==0:
        print(" + epoch {",i ,"} ", mse_list[i])
        mark = 1
    else:
        mark = mark + 1

fig = plt.figure()
ax1 = fig.add_subplot(211)
plt.plot(range(len(mse_list)), mse_list)

[8]

print("time taken [s] = ", elapsed)
print("time taken [min] = ", elapsed/60)
print("time taken [hr] = ", elapsed/3600)
print("time taken at 10k x [s] = ", elapsed*1e4)
print("time taken at 10k x [min] = ", elapsed*1e4/(60))
print("time taken at 10k x [hr] = ", elapsed*1e4/(3600))
print("time taken at 10k x [day] = ", elapsed*1e4/(3600*24))

[9]

no_of_images_to_test=500
mnist_test =  Aut.read_csv("mnist_test", header_skip=1,get_the_first_N_rows = no_of_images_to_test)
mnist_test_labels = [int(x[0]) for x in mnist_test]
mnist_test_data = [1/pixel_normalizing_factor*np.transpose(np.matrix(convert_list_of_string_to_float(x[1:]))) for x in mnist_test]

hit_list = []
predict_list = []
predict_val = []
for i in range(no_of_images_to_test):
    a_1 = mnist_test_data[i]
    test_a_l_set, _ = nu.feed_forward(weights_next, biases_next, a_1, AF,
        verbose=False,
        matrix_formatting="%6.2f")
    Y_after = test_a_l_set[-1]
    predicted_label = int(np.argmax(Y_after))
    actual_label= mnist_test_labels[i]
    # print(Y_after)
#     print("predicted vs actual = ", predicted_label,"/",actual_label)
    predict_list.append(predicted_label)
    predict_val.append(Y_after)
    if actual_label==predicted_label:
        hit_list.append(1)
    else:
        hit_list.append(0)
print("predict list = ")
print(predict_list)
print("predict values = ")
for i in range(10):
#     print(ut.numpy_matrix_to_list(predict_val[i]))
    ut.print_numpy_matrix(np.transpose(predict_val[i]),formatting="%9.6f",no_of_space=20)
print("hit list = ")
print(hit_list)
print("percentage correct = ", 100* np.sum(hit_list)/len(hit_list))

 

kero version 0.6.3

Side Quest: parameter estimate for differential equations

Hi all,

I got a little side quest from a friend doing PhD in NTU. To put it simply, there is a cubic differential equation whose parameters are to be fit with existing data. Since I was playing around with gradient descent recently, I decide to give it a try. You can view it in this link Parameter Estimation for Differential Equations using Gradient Descent including a little more technical note. For that I also write a very simple Runge-Kutta wrapper in this link.

Update: I tested kero 0.6.3 Neural Network library on MNIST here; more work needs to be done!

‘Til next time!

This slideshow requires JavaScript.

 

Runge-Kutta with wrapper

Home > Routine

See the codes for each example at the end of the post.

Example 1

Let us solve the differential equations \frac{dx}{dt}=-x. The following wrapper uses Runge-Kutta solver from scipy. The purpose of the wrapper is to compute the specified points \{(x_i,t_i)\}.

  1. Input time_checkpoints into RK45_wrapper. time_checkpoints is the list [t1,t2,…,tN] whose values of x we want to compute.
  2. x0 is the initial value, i.e. the value of x at t1. This is the initial condition for the initial value problem.
  3. stepsize. This sets the maximum step size between tk and t(k+1). For example, if tk=1 and t(k+1)=2, stepsize can be set to 0.1 and the values will be computed at 1.1, 1.2, …, 2.0. By default stepsize = None.  When set to None, the stepsize is a-hundreth of tk and t(k+1) interval.

As seen in the figure below, the green “check points” are the values that we specify for the wrapper to compute via time_checkpoints. Note that t_checkpoint_new will be computed, since the solver DOES NOT exactly computes the values at t_checkpoint. However, by setting the stepsize as small as possible, t_checkpoint_new will be closer to t_checkpoint. We print max checkpt diff to show the difference between t_checkpoint_new and t_checkpoint. In this example, using the default stepsize is sufficiently close, i.e. max checkpt diff=0.

Red circles are the solutions generated by RK45 solver, and blue curve is the solution plotted analytically.

rk45wrapper

Example 2

Now we try out similarly with pendulum equation example like the example here. This shows that essentially the same method works for multi-valued function as well. The equation is given by

\ddot{\theta}(t)+ b \cdot \dot{\theta}(t) +c\cdot sin(\theta (t)) = 0

which can be linearized as a system of equations:

\dot{\theta}(t) = \omega(t)
\dot{\omega}(t) = -b\cdot \omega (t) - c\cdot sin(\theta (t))

We get the following. Each color correspond to one component in the linearized system. The scatter dots are solutions solved by RK45, the cross and circle are respectively the checkpoints we want to compute and the same checkpoints produced by RK45.

rk45wrap2.JPG

Code for Example 1

import numpy as np
from scipy.integrate import RK45
import matplotlib.pyplot as plt

def f(t,y):
	# dy/dt = f(t,y)
	return np.array([-1*y[0]])

x0 = np.array([1.0]) # initial value
# t0 = 0  # initial time
# t1 = 10 # final time

def RK45_wrapper(x0, time_checkpoints, f, stepsize=None):

	if stepsize is None:
		stepsize = 0.01*(time_checkpoints[-1]-time_checkpoints[0])
	t_set = [time_checkpoints[0]]
	x_set = [x0]
	t_checkpoint_new = [time_checkpoints[0]]
	x_checkpoint = [x0]

	for i in range(len(time_checkpoints)-1):	
		t1 = time_checkpoints[i+1]
		if i>0:
			x0 = ox.y
			t0 = ox.t
			
		else:
			t0 = time_checkpoints[i]
		ox = RK45(f,t0,x0,t1,max_step=stepsize)
		
		while ox.t < t1:
			msg = ox.step()
			t_set.append(ox.t)
			x_set.append(ox.y)
		x_checkpoint.append(ox.y)
		t_checkpoint_new.append(ox.t)
	return x_set, t_set, x_checkpoint, t_checkpoint_new

# all data by DE (differential equation)
time_checkpoints = np.linspace(0,10,11)
x_set, t_set, x_checkpoint, t_checkpoint_new = RK45_wrapper(x0, time_checkpoints, f, stepsize=None)

t_checkpt_diff = []
for t,t_new in zip(time_checkpoints, t_checkpoint_new):
	t_checkpt_diff.append(abs(t-t_new))
print("max checkpt diff = ", np.max(t_checkpt_diff))

# all data, analytic
t0 = time_checkpoints[0]
t1 = time_checkpoints[-1]
t_set2 = np.linspace(t0,t1,100)
f_int = lambda t: np.exp(-t)
x_set2 = [f_int(t_set2[i]) for i in range(len(t_set2))]

fig = plt.figure()
ax=fig.add_subplot(111)
ax.scatter(t_set,x_set,facecolors='none',edgecolors='r',label="all data by DE")
ax.scatter(t_checkpoint_new,x_checkpoint,20,'g',label="check points")
ax.plot(t_set2,x_set2,label="all data, analytic")
ax.set_xlabel("t")
ax.set_ylabel("x")
ax.legend()
plt.show()

Code for Example 2

import numpy as np
from scipy.integrate import RK45
import matplotlib.pyplot as plt

def f(t,y, f_args):
	theta, omega = y
	dydt = [omega, -f_args[0]*omega - f_args[1]*np.sin(theta)]
	return dydt

def RK45_wrapper(x0, time_checkpoints, f, f_args, stepsize=None):
	if stepsize is None:
		stepsize = 0.01*(time_checkpoints[-1]-time_checkpoints[0])
	t_set = [time_checkpoints[0]]
	x_set = [x0]
	t_checkpoint_new = [time_checkpoints[0]]
	x_checkpoint = [x0]

	f_temp = lambda t,y : f(t,y, f_args)

	for i in range(len(time_checkpoints)-1):	
		t1 = time_checkpoints[i+1]
		if i>0:
			x0 = ox.y
			t0 = ox.t
			
		else:
			t0 = time_checkpoints[i]
		ox = RK45(f_temp,t0,x0,t1,max_step=stepsize)
		
		while ox.t < t1:
			msg = ox.step()
			t_set.append(ox.t)
			x_set.append(ox.y)
		x_checkpoint.append(ox.y)
		t_checkpoint_new.append(ox.t)
	return x_set, t_set, x_checkpoint, t_checkpoint_new

b = 0.25
c = 5.0
f_args = [b,c]
x0 = [np.pi - 0.1, 0.0]
time_checkpoints = np.linspace(0, 10, 11)
x_set, t_set, x_checkpoint, t_checkpoint_new = RK45_wrapper(x0, time_checkpoints, f, f_args, stepsize=None)

# print(np.array(x_set).shape) # (112, 2)
fig=plt.figure()
ax = fig.add_subplot(111)
ax.scatter(t_set,[x[0] for x in x_set],3,label="RK45 x[0]")
ax.scatter(t_set,[x[1] for x in x_set],3,label="RK45 x[1]")
ax.scatter(t_checkpoint_new,[x[0] for x in x_checkpoint],50,label="output ckpt x[1]",facecolors='none',edgecolors='b')
ax.scatter(t_checkpoint_new,[x[1] for x in x_checkpoint],50,label="output ckpt x[1]",facecolors='none',edgecolors='r')
ax.scatter(time_checkpoints,[x[0] for x in x_checkpoint],30,label="manual ckpt x[1]",marker="x",color="b")
ax.scatter(time_checkpoints,[x[1] for x in x_checkpoint],30,label="manual ckpt x[1]",marker="x",color="r")

ax.set_xlabel("x")
ax.set_ylabel("y")
ax.legend()
plt.show()

Parameter Estimation for Differential Equations using Gradient Descent

home > Research

The post is about parameter estimation based on gradient descent specifically for the differential equation \frac{dn}{dt}=G-k_1 n-k_2 n^2-k_3 n^3, i.e. we are looking for a good (G,k_1,k_2,k_3) that best fits the experimental data. We will go through the methods of choosing initial parameter guesses and running the optimization. This post is more fully explained in this note: Differential Equation Parameter Estimations. The data used here is a fraction of the sample data I was given. The codes and sample data can also be found here.

Consider this link instead Parameter Estimation for Differential Equations using Scipy Least Square since it utilizes more robust optimization component from scipy.

Place fitting.py, fitting_post_processing.py and param_estimate.py (the codes are at the end of the post) in the project folder. Some required python packages include matplotlib, scipy and kero; do pip install them.

  1. In the command line, cd into the directory where the project is.
  2. Preliminary. Prepare sample_data.csv containing data points (t,n). First column is for t (horizontal axis), second column is n (vertical axis). Set the following toggles in fitting.py.
    plot_data True
    do_initial_testing False
    start_op False

    Run python fitting.py. This plots the sample data, shown in figure 1. In this context, the relevant data points are the data after the peak. The orange curve shows the plot of only the extracted relevant data.

  3. Initial testing stage. Set the following toggles in fitting.py.
    plot_data False
    do_initial_testing True
    start_op False

    Set collection_of_p_init. You can place as many guesses as you want. In this example,

    collection_of_p_init =[
      [0,1e-3,0.5e-3,1.2e-7], # guess 1
      [0,1e-2,0,1e-7], # guess 2
      [0,0,1.5e-3,1e-7], # guess 3
    ]

    we try using 3 different initial parameters, where each of them is of the format [G,k1,k2,k3], corresponding to the 4 parameters that we want to fit. Then, run the command:  python fitting.py. The results are shown in figure 2. It can be seen that the first and third guesses are closer to the experimental data; we might want to use the parameters near these values for random initiation in the optimization stage (but for demonstration sake, we will not do it here).

  4. Optimization stage. Set the following toggles in fitting.py.
    plot_data False
    do_initial_testing False
    start_op True

    Set the preferred settings. p_init_max and p_init_min are the range of random initial parameters to try from. We can set the number of attempts via the variable no_of_tries, and of course each attempt starts with a random initial parameter based on the range we just specified. Run the command:  python fitting.py. You will see optimization in progress with decreasing mse values.

  5. Viewing the optimization result. We want to load all 3 fitting_data files we have created. Thus, in fitting _post_processing.py, set
    filename = 'fitting_data'
    save_data_series = [1,2,3]

    Run the command:  python fitting _post_processing.py. Figure 3 shows how after every iteration, the resulting curve is approaching the experimental data.

Finally, we want to see what is the best parameter p. When viewing the optimization result, the parameters will be printed. We are more or less done!

What else to do?

In this post I have only run a few iterations over a few number of random parameter initializations. To obtain the best results, we should run them over larger iterations and try more random initial parameters.

That’s all, have fun~


diffit1.JPG

Figure 1. A plot of n-t from the sample data is in blue. The orange data is in

Figure 2. The curves show 3 different plots from 3 different guesses of initial parameters. The red curve is the experimental data. The console shows recommended setting for learning_rate variable.

diffit3.JPG

Figure 3. As the algorithm is iterated, the curve goes from red to blue. The figures generally show that the curve approaches the experimental data and that MSE trend decreases along the iteration of i.

fitting.py

import numpy as np
import kero.utils.utils as ut
import param_estimate as pe
import matplotlib.pyplot as plt
from scipy.integrate import RK45
import csv
import pickle

es = pe.DE_param_estimator()

print("----------------- Load Data -------------------------------")
Xr, Yr = pe.load_data(plot_data=0)

x0 = np.array([Yr[0]])
t_set_expt = [np.array([x]) for x in Xr]
n_set_expt = [np.array([x]) for x in Yr]
print("x0 = ",x0)
print("Xr[:5] = ",Xr[:5])
print("Yr[:5] = ",Yr[:5])
print("t_set_expt[:5] = ",t_set_expt[:5])
print("n_set_expt[:5] = ",n_set_expt[:5])

#.#

print("----------------- Prepare Model ---------------------------") 
# MODEL used to fit this data
#
# We are estimating the parameters of the following differential equations:
#   dn/dt = F(n, p) = G - k1 * n - k2 * n**2 - k3 * n**3
#   where p is the parameter (G,k1,k2,k3)
def F(y,p):
	return p[0] - p[1] * y -p[2] * y**2 - p[3] * y**3
diff_part_G = lambda x: 1
diff_part_k1 = lambda x: -x 
diff_part_k2 = lambda x: -x**2 
diff_part_k3 = lambda x: -x**3 
list_of_derivatives = [diff_part_G, diff_part_k1, diff_part_k2, diff_part_k3]
print("  Model prepared.")
#.#

do_initial_testing = 0
if do_initial_testing:
	print("---------------- Prepare Initial guess --------------------")
	# Before optimizing, try playing around the p_init values here
	# We will choose the range of uniform random values for guess p based on tests conducted here
	# Write down choices that seem good and any remarks here 
	# 
	#   1. p_init = [0,5e-2,2e-3,1.1e-7] # near the real value. all n less than expt data
	#   2. p_init = [0,5e-2,1e-3,1.1e-7] # seems to be better than 1
	#   3. p_init = [0,5e-2,0,1.1e-7] # all n above expt data 
	#   4. p_init = [0,1e-1,1e-4,1.1e-7] # small n above expt data, larger n below
	#   5. p_init = [0,1e-2,1e-3,1.1e-7] # exceedingly close! Let's vary around these values instead
	#
	#
	#

	collection_of_p_init =[
		[0,1e-3,0.5e-3,1.2e-7], # guess 1
		[0,1e-2,0,1e-7], # guess 2
		[0,0,1.5e-3,1e-7], # guess 3
	]

	fig = plt.figure()
	ax = fig.add_subplot(111)
	no_of_test = len(collection_of_p_init)
	for i in range(no_of_test):
		p_init = collection_of_p_init[i] # guess p
		time_checkpoints = t_set_expt
		def f(t,y):
			return np.array([F(y[0],p_init)])
		_,_, x_checkpoint, t_checkpoint_new = pe.RK45_wrapper(x0, time_checkpoints, f, stepsize=None)
		ax.plot(t_checkpoint_new,x_checkpoint, label="guess "+str(i+1), color=(0,1-1*(i+1)/no_of_test,1*(i+1)/no_of_test))

		# To find what is the best learning_rate
		print(" testing initial guess parameters: ",i+1)
		es.update_p(t_set_expt, n_set_expt, F, p_init, list_of_derivatives, verbose=11)
	ax.plot(Xr,Yr, color = "r", label="expt values")
	ax.legend( )
	ax.set_xlabel("t")
	ax.set_ylabel("n")

	plt.show()
#.#


# ------------------------------------------------------------------------------- #
start_op = 1
if start_op:	
	print("----------------- Start Optimizing ---------------------------")
	
	# ********* Settings *************
	p_init_max = [1e-10,0 ,0,0.9e-7] # [Gmax,k1max,k2max,k3max]
	p_init_min = [1e-10, 2e-2,2e-3,1.2e-7] # [Gmin ,k1min,k2min,k3min]
	no_of_tries = 3
	save_name = 'fitting_data'
	es.learning_rate= 1e-16
	no_of_iterations = 10
	save_interval = 1
	# ********************************

	n_set_expt_MAT = pe.RK45_output_to_list(n_set_expt) # MATRIX list form
	
	for j in range(no_of_tries):
		p_init = []
		for a,b in zip(p_init_min, p_init_max):
			p_init.append(np.random.uniform(a,b))
		p_now= p_init
		save_count = 0
		SAVE_DATA = []
		print("TRY ", j+1," / ",no_of_tries)
		mse_list = []
		for i in range(no_of_iterations + 1):
			if i>0:
				# for i =0 , initial state, do not iterate yet
				p_now = es.update_p(t_set_expt, n_set_expt, F, p_now, list_of_derivatives, verbose=0)
			
			#------------------------ FOR REALTIME OBSERVATION -----------
			def f(t,y):
				return np.array([F(y[0],p_now)])
			time_checkpoints = t_set_expt
			_,_, n_set_next, t_set_next = pe.RK45_wrapper(x0, time_checkpoints, f, stepsize=None)
			n_set_next_MAT = pe.RK45_output_to_list(n_set_next)
			mse = pe.MSE(n_set_next_MAT,n_set_expt_MAT)
			mse_list.append(mse)
			print(" i = ", i , " -- > mse = ", mse)
			# print("   p_new = ", p_now)

			#------------------------ TO BE SAVED -------------------------
			save_count = save_count + 1
			if save_count == save_interval or i==0:
				save_count = 0
				data_store = {
				"t_set":t_set_next,
				"n_set":n_set_next,
				"p":p_now,
				"learning_rate":es.learning_rate,
				"mse_list": mse_list
				}
				SAVE_DATA.append(data_store)
		output = open(save_name+"_"+ str(j+1) +".par", 'wb')
		pickle.dump(SAVE_DATA, output)
		output.close()

print("\nClosing Program...")

fitting_post_processing.py

import param_estimate as pe
import matplotlib.pyplot as plt
import numpy as np
import time
import pickle

filename = 'fitting_data'
save_data_series = [1,2,3]

for j in save_data_series:
	pkl_file = open(filename+'_'+str(j)+'.par', 'rb')
	SAVE_DATA = None
	SAVE_DATA = pickle.load(pkl_file)
	pkl_file.close()


	fig = plt.figure(j)
	ax = None
	ax2 = None
	ax = fig.add_subplot(211)

	# ------------------- EXPT DATA ---------------------------------=
	Xr, Yr = pe.load_data(plot_data=0)
	ax.scatter(Xr,Yr,3,marker="x",c="g")

	# ------------------- DATA FROM OPTIMIZATION ---------------------
	color_scheme = np.linspace(0,1,len(SAVE_DATA))
	for i in range(len(SAVE_DATA)):
		data_store = SAVE_DATA[i]
		n_set = data_store["n_set"]
		t_set = data_store["t_set"]
		p = data_store["p"]
		mse_list = data_store["mse_list"]
		print(" i = ", i)
		print("   p = ", p )
		if i==0:
			this_label = "init curve" 
		else:
			this_label = str(i)
		ax.plot(t_set,n_set, color=(1-color_scheme[i],0,color_scheme[i]), label=this_label)
		if i ==0:
			this_title = "initial G,k1,k2,k3 = " + str(p)
	
	plt.title(this_title)
	ax2 = fig.add_subplot(212)
	ax2.plot(range(len(mse_list)),mse_list)
	ax.legend()
	ax.set_xlabel("t")
	ax.set_ylabel("n")
	ax2.set_xlabel("i")
	ax2.set_ylabel("MSE")
plt.show()	

print("\nClosing Post-Processing Program...")

param_estimate.py

# We are estimating the parameters of the following differential equations:
#   dn/dt = F(n, p) = G - k1 * n - k2 * n**2 - k3 * n**3
#   where p is the parameter (G,k1,k2,k3)

import numpy as np
import kero.utils.utils as ut
from scipy.integrate import RK45
import matplotlib.pyplot as plt
import csv


class DE_param_estimator:
	def __init__(self):
		self.learning_rate = 1e-21
		return
	# l2
	def update_p(self,t_set_expt, n_set_expt, F, p_now, list_of_derivatives,
		verbose = False):
		# t_set_expt: list of numpy array
		#   for example: [array([25.2047]), array([25.279]), array([25.3533]), array([25.4277]),...]
		# n_set_expt: list of numpy array
		#   for example: [array([789.696]), array([784.109]), array([763.528]), array([726.782]),...]
		# p_now: list of parameters
		# 
		#
		# return
		#   p_next: list of parameters
 		nabla_mse = self.nabla_MSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives)
 		if verbose>10:
 			# temp = self.learning_rate*nabla_mse
 			temp2 = np.transpose(np.matrix(p_now))
 			for_recommend = []
 			if verbose>30:
	 			print("    nabla_mse = ")
	 			ut.print_numpy_matrix(nabla_mse,formatting="%s",no_of_space=10)
	 			print("    p_now =")
	 			ut.print_numpy_matrix(temp2,formatting="%s",no_of_space=10)
	 		det_list = []
	 		with np.errstate(divide='ignore'):
	 			for k in range(len(p_now)):
	 				det = 1e-2*p_now[k]/nabla_mse.item(k)
	 				if verbose>20:
		 				print("    p_now[k]/(nabla_mse[k]*100) =",det)
		 			det2 = np.log10(np.abs(det))
		 			if (det2!=np.inf) and (det2!=-np.inf) and (not np.isnan(det2)):
			 			det_list.append(det2)
		 	if verbose>20: print("      det_list = ",det_list)
		 	print("       recommended learning rate ~ ", 10**np.min(det_list))
 		p_next= np.transpose(np.matrix(p_now)) - self.learning_rate * nabla_mse
 		p_next = np.transpose(p_next).tolist()
 		p_next = p_next[0]
 		return p_next
	# L3	
	def dkdMSE(self, t_set_expt, n_set_expt, F, p_now, diff_part):
		# lambda function : d/dk d/dt of n
		# t_set_expt, n_set_expt : experimental data
		N = len(t_set_expt)
		dkmse = 0 

		def f(t,y):
			return np.array([F(y[0],p_now)])

		x0 = n_set_expt[0] # array
		time_checkpoints= t_set_expt
		x_set, t_set, x_checkpoint, t_checkpoint_new = RK45_wrapper(x0, time_checkpoints, f, stepsize=None)

		for i in range(1,len(n_set_expt)):			
			ni = x_checkpoint[i][0]
			# print(" ni =", ni)
			m = i+1
			diff_term = 0
			for j in range(m):
				diff_term = diff_term + diff_part(x_checkpoint[j][0])
			diff_term = diff_term / m 
			dkmse = dkmse + ( ni - n_set_expt[i][0])* diff_term
		dkmse = 1/N*dkmse 
		return dkmse

	def nabla_MSE(self, t_set_expt, n_set_expt, F, p_now, list_of_derivatives,
		verbose = False):
		# list_of_derivatives: list of lambda functions

		dGdMSE = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[0])
		dk1mse = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[1])
		dk2mse = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[2])
		dk3mse = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[3])
		# print("dGdMSE=", dGdMSE)
		# print("dk1mse=",dk1mse)
		# print("dk2mse=",dk2mse)
		# print("dk3mse=",dk3mse)
		nabla_mse = np.transpose(np.matrix([dGdMSE, dk1mse, dk2mse, dk3mse]))
		return nabla_mse

def load_data(plot_data=False):
	print(" --+ load_data().")

	# load data
	filename = 'sample_data'
	all_data = read_csv(filename, header_skip=0)
	all_data = [[np.float(x) for x in y] for y in all_data]

	X=[x[0] for x in all_data]
	Y=[x[1] for x in all_data]

	# customize your data loading here
	max_index = np.argmax(Y)
	relevant_data= all_data[max_index:]
	Xr = [x[0] for x in relevant_data]
	Yr = [x[1] for x in relevant_data]

	print("     data size = ", len(Xr))
	print("     peak = ", X[max_index],",", Y[max_index])
	if plot_data:
		fig = plt.figure()
		ax=fig.add_subplot(111)
		ax.scatter(X,Y,3)
		ax.scatter(X[max_index],Y[max_index])
		ax.plot(Xr,Yr,'r')
		ax.set_xlabel("t")
		ax.set_ylabel("n")
		plt.show()

	return Xr, Yr

def read_csv(filename, header_skip=1,get_the_first_N_rows = 0):
    # filename : string, name of csv file without extension
    # headerskip : nonnegative integer, skip the first few rows
    # get_the_first_N_rows : integer, the first few rows after header_skip to be read
    #   if all rows are to be read, set to zero

    # example
    # xx = read_csv('tryy', 3)
    # xx = [[np.float(x) for x in y] for y in xx] # nice here
    # print(xx)
    # print(np.sum(xx[0]))
    
    out = []
    with open(str(filename)+'.csv') as csv_file:
        data_reader = csv.reader(csv_file)
        count = 0
        i = 0
        for row in data_reader:
            if count < header_skip:
                count = count + 1
            else:
                out.append(row)
                if get_the_first_N_rows>0:
                	i = i + 1
                	if i == get_the_first_N_rows:
                		break
    return out

def MSE(Y_set,Y0_set):
	# Y_set is a list of Y
	# Y0_set is a list of Y0
	# Assume Y_set and Y0_set have the same size, otherwise error will be raised
	#
	# Y (numpy matrix) is a vector: computed data from model
	# Y0 (numpy matrix) is a vector: observed/true data
	# Assume Y and Y0 have the same size, otherwise error will be raised
	# Assume all entries are real
	
	mse = 0
	n = len(Y_set)
	if not (n==len(Y0_set)):
		print("MSE(). Error. Inconsistent dimension between Y_set and Y_set0. Return nothing.")
		return	
	for i in range(n):
		Y = Y_set[i]
		Y0 = Y0_set[i]
		Nout1 = len(Y.tolist())
		Nout2 = len(Y0.tolist())
		if not (Nout1==Nout2):
			print("MSE(). Error. Inconsistent dimension between Y and Y0. Return nothing.")
			return
		# now compute the norms between true and model output values
		for j in range(Nout1):
			mse = mse + (np.abs(Y0.item(j)-Y.item(j)))**2
	mse = mse / (2*n)

	return mse

def RK45_wrapper(x0, time_checkpoints, f, stepsize=None):
	# example
	#
	#
	#
	# def f(t,y):
	# 	# dx/dt = f(x,t)
	# 	return np.array([-1*y[0]])
	# x0 = np.array([1.0]) # initial value
	# time_checkpoints = np.linspace(0,10,11)
	# x_set, t_set, x_checkpoint, t_checkpoint_new = RK45_wrapper(x0, time_checkpoints, f, stepsize=None)

	# t_checkpt_diff = []
	# for t,t_new in zip(time_checkpoints, t_checkpoint_new):
	# 	t_checkpt_diff.append(abs(t-t_new))
	# print("max checkpt diff = ", np.max(t_checkpt_diff))

	if stepsize is None:
		stepsize = 0.01*(time_checkpoints[-1]-time_checkpoints[0])
	t_set = [time_checkpoints[0]]
	x_set = [x0]
	t_checkpoint_new = [time_checkpoints[0]]
	x_checkpoint = [x0]

	for i in range(len(time_checkpoints)-1):	
		t1 = time_checkpoints[i+1]
		if i>0:
			x0 = ox.y
			t0 = ox.t
			
		else:
			t0 = time_checkpoints[i]
		ox = RK45(f,t0,x0,t1,max_step=stepsize)
		
		while ox.t < t1:
			msg = ox.step()
			t_set.append(ox.t)
			x_set.append(ox.y)
		x_checkpoint.append(ox.y)
		t_checkpoint_new.append(ox.t)
	return x_set, t_set, x_checkpoint, t_checkpoint_new

def RK45_output_to_list(xx):
	return [np.matrix((x.tolist())[0]) for x in xx]

Using Virtual Environment

Home > Routine

Using virtual environment ensure that we do not mess up package versions for other projects. We use python 3.6.2, although similar versions should be fine.

Suppose we want to create a virtual environment project MyProject, at the directory ~/Desktop/MyProject.

cd "<directory to Desktop>"

Then, in the command line enter the following.

pip install virtualenv
virtualenv MyProject
cd MyProject
Scripts\activate

The first line is to install virtualenv package if not yet installed. The last line activated the virtual project you: whatever we install into python will not affect the global python settings. You will see (MyProject) when it is activated. To deactivate the virtual environment, just type:

deactivate

Test the following while virtual environment is activated:

pip install kero

and you will see that the package kero is installed in

MyProject/Lib/site-packages

but not in the global python library. You are already able to use kero in this virtual environment.

Convolutional Neural Network with VGG Architecture

home > ML Concepts

This post seeks to illustrate the difference between Convolutional Neural Network (CNN) and deep neural network (DNN) and hopes to add a little bit more clarity in the CNN process.

CNN vs DNN

Figure 1. Comparison between a regular deep neural network with the convolutional neural network.

CNN is a variant of DNN with the constraint that the input is an image, or image-like. More technically,

  1. In a DNN, every layer is fully connected to the layer before. This means every neuron in layer n+1 is affected by every neuron in layer n via linear combination (with bias, and then activation).
  2. For CNN, such as in VGG architecture, only a few layers near the end are fully connected. Using the receptive field, a single neuron in layer n+1 is connected to some small number of neurons in the previous layer, corresponding to a small region in the visual space.

The above is illustrated in figure 1. In DNN, each neuron is fully connected to the image. With a large number of neurons, there will be a large number of weights to compute, not to mention that there will be a lot more weights between neurons from one layer to the next. In CNN, on the other hand, each neuron will be “in charge of” a small region in the image.

You might have seen the illustration for VGG architecture like figure 2 (I took the images from here; do visit the original sources of the image). VGG is an implementation of CNN by the Visual Geometry Group, Oxford (official link here). Figure 3 illustrates the process of convolution in the first layer, while figure 4 illustrates the process through fully connected layers. In essence, both are performing linear sums weighted by filters in figure 3 and the usual weights in figure 4 (like DNN) respectively.

VGG1VGG2

Figure 2. VGG architecture.

conv layer

Figure 3. Illustration for convolutional layer.fc layer

Figure 4. Illustration for fully connected layers.

print_numpy_matrix()

home > kero > Documentation

Print numpy matrix with nice indentation.

kero.utils.utils.py

def print_numpy_matrix(np_mat,formatting="%6.2f",no_of_space=20):
  return

Arguments/Return

np_mat,formatting String. The usual format for printing in python.

Default=”%6.2f”S

no_of_space Integer. Number of spaces from the left with which the matrix is indented.

Default=20

Example Usage 1

import kero.utils.utils as ut
import numpy as np

x=np.matrix(np.random.normal(size=(3,4)))
print("x = ")
print(x)
print("\nwith print_numpy_matrix(): ")
print("x = ")
ut.print_numpy_matrix(x,formatting="%6.5f",no_of_space=10)

whose output is

x =
[[ 1.61318393 -0.37102784 -0.89363802 -0.56962807]
[-0.08657515 1.20537268 -1.95176244 -1.84425267]
[-0.59647232 -0.96593909 1.02197755 0.39493434]]

with print_numpy_matrix():
x =
1.61318 -0.37103 -0.89364 -0.56963
-0.08658 1.20537 -1.95176 -1.84425
-0.59647 -0.96594 1.02198 0.39493

Example Usage 2

See Deep Learning and Neural Network with kero PART 1.

kero version: 0.6.2

numpy_matrix_to_list()

home > kero > Documentation

Convert a numpy matrix to a list.

kero.utils.utils.py

def numpy_matrix_to_list(np_mat):
  return out

Arguments/Return

np_mat Numpy matrix.
Return out List.

Example Usage 1

testUtils3.py

import kero.utils.utils as ut
import numpy as np

n_row = 3
n_col = 2
M = np.random.normal(size=(n_row,n_col))
print(M)
print("------convert to list-----")
out = ut.numpy_matrix_to_list(M)
print(out)
print("------convert back to matrix--------")
MM = np.matrix(out)
print(MM)

The output is the following.

[[-0.07988029  1.2434192 ]
 [-0.5162123  -0.59034063]
 [-1.02402997 -0.74755916]]
------convert to list-----
[[-0.07988028825737249, 1.24341920016077], [-0.5162122966577327, -0.5903406251819941], [-1.024029973127149, -0.7475591629901397]]
------convert back to matrix--------
[[-0.07988029  1.2434192 ]
 [-0.5162123  -0.59034063]
 [-1.02402997 -0.74755916]]

kero version: 0.6.2