## Prelude to Medical Imaging

Here is the repository for pre-processing of some medical imaging data.

## 1. Attention-Gated-Networks_auxiliary

Working on Attention-Gated-Networks from https://github.com/ozan-oktay/Attention-Gated-Networks, you might want to run the algorithm on the original datasets. One of them is PANCREAS CT-82 which can be found from https://wiki.cancerimagingarchive.net/display/Public/Pancreas-CT.

The .dcm files store 2D slices. This code combines (and normalize) the slices of each patient into 3D volume. Each patient is coded as PANCREAS_0001, PANCREAS_0002 etc.

## 2. Multiv

While working on medical images, for example in NIFTI formats, we might face memory problem. This is because a NIFTI volume might come in large sizes, for example 192x192x19 with many modalities. With large convolutional neural network, feeding the entire volume may result in out of memory error (at least my measly 4GB RAM does. Multi-view sampling is the way out of this. Using multi-view sampling, slices of the images (green rectangles) are sampled. The “multi” part of the multi-view can take the form of larger slice (red rectangles).

## Parameter Estimation for Differential Equations using Scipy Least Square

The post is about parameter estimation based on gradient descent specifically for the differential equation $\frac{dn}{dt}=G-k_1 n-k_2 n^2-k_3 n^3$, i.e. we are looking for a good $(G,k_1,k_2,k_3)$ that best fits the experimental data. We will go through the methods of choosing initial parameter guesses and running the optimization. This post is more fully explained in this note: Differential Equation Parameter Estimations. The data used here is a fraction of the sample data I was given. The codes and sample data can also be found here.

As usual, do use virtual environment for cleaner package management (you can see here). The packages needed can be installed via pip.

pip install numpy
pip install scipy
pip install matplotlib

Put the scripts scipyode_header.pycubicODE.py and cubicODE_post_processing.py into the project folder – in the virtual environment if you are using one. The code can be found at the end of the post.

1. PART 1. Loading data. In scipyode_header.py, customize load_data() as you see fit. As it is, it will load the data starting from a maximum point, and assume that the csv file contains 2 columns: column 1 is for variable t, column 2 is n. The file does not have any header.
Go into cubicODE.py, set the following toggles.

 plot_data True initial_guess_mode False optimize_mode False

Run cubicODE.py. You will see figure like figure 1. In this example, the relevant data points (blue) are the points that overlap with the orange plot.

2. PART 1. Define the ODE. Write the ODE in def f(t,n,f_args).
3. PART 2. Choosing initial guesses. Set any number of initial guesses you want to try out with by setting collection_of_initial_guesses.  Since we optimize [G,k1,k2,k3], an example guess is [0,1e-3,0.5e-3,1.2e-7]. Set the following toggles and then run cubicODE.py. Plot like figure 1 will be generated.
 plot_data False initial_guess_mode True optimize_mode False
4. PART 2. Optimization. Once the choices of initial guesses are decided upon, set the following toggles and then run cubicODE.py. The optimized parameters from each initial guess will be saved as a .par file, and the save name is set by the variable save_name. The code in our example used 3 initial guesses and save_name = “20190110_cubic”. Thus, 20190110_cubic_1.par, 20190110_cubic_2.par and 20190110_cubic_3.par will be created.
 plot_data False initial_guess_mode False optimize_mode True
5. Loading optimized parameters. Go to cubicODE_post_processing.py. If we want to load all the 3 saved .par files, then set load_name = “20190110_cubic” and serial_number = [1,2,3]. Then run cubicODE_post_processing.py. The plot results are shown in figure 2(A, B, C), showing good fit. The optimized parameters for each initial guess can be read off from the console, as in figure 2(D). The smaller cost value is, the better the fit is generally.

The optimized parameters seem to produce good fitting. We are done!

Figure 1.Green plot shows the experimental data points while the rest are generated using RK45 using 3 different initial guesses, given by collection_of_initial_guesses = [  [0,1e-3,0.5e-3,1.2e-7], [0,1e-2,0,1e-7], [0,0,1.5e-3,1e-7] ].

Figure 2. (A, B, C) shows the results of parameter estimation from 3 different initial guesses. (D) shows the results in the console. For each serial number, the optimized parameters are printed as [G, k1, k2, k3].

import numpy as np
from scipy.integrate import RK45
import matplotlib.pyplot as plt
import csv

def RK45_wrapper(x0, time_checkpoints, f, f_args, stepsize=None):
if stepsize is None:
stepsize = 0.01*(time_checkpoints[-1]-time_checkpoints[0])
t_set = [time_checkpoints[0]]
x_set = [x0]
t_checkpoint_new = [time_checkpoints[0]]
x_checkpoint = [x0]

f_temp = lambda t,y : f(t,y, f_args)

for i in range(len(time_checkpoints)-1):
t1 = time_checkpoints[i+1]
if i>0:
x0 = ox.y
t0 = ox.t

else:
t0 = time_checkpoints[i]
ox = RK45(f_temp,t0,x0,t1,max_step=stepsize)

while ox.t < t1:
msg = ox.step()
t_set.append(ox.t)
x_set.append(ox.y)
x_checkpoint.append(ox.y)
t_checkpoint_new.append(ox.t)
return x_set, t_set, x_checkpoint, t_checkpoint_new

# filename : string, name of csv file without extension
# headerskip : nonnegative integer, skip the first few rows
#   if all rows are to be read, set to zero

# example
# xx = [[np.float(x) for x in y] for y in xx] # nice here
# print(xx)
# print(np.sum(xx[0]))

out = []
with open(str(filename)+'.csv') as csv_file:
count = 0
i = 0
count = count + 1
else:
out.append(row)
if get_the_first_N_rows>0:
i = i + 1
if i == get_the_first_N_rows:
break
return out

filename = 'sample_data'
all_data = [[np.float(x) for x in y] for y in all_data]

X=[x[0] for x in all_data]
Y=[x[1] for x in all_data]

max_index = np.argmax(Y)
relevant_data= all_data[max_index:]
Xr = [x[0] for x in relevant_data]
Yr = [x[1] for x in relevant_data]

print("     data size = ", len(Xr))
print("     peak = ", X[max_index],",", Y[max_index])
if plot_data:
fig = plt.figure()
ax.scatter(X,Y,3)
ax.scatter(X[max_index],Y[max_index])
ax.plot(Xr,Yr,'r')
ax.set_xlabel("t")
ax.set_ylabel("n")
plt.show()

return Xr, Yr

cubicODE.py

import numpy as np
import matplotlib.pyplot as plt
import pickle
from scipy.optimize import least_squares

# TOGGLES:
initial_guess_mode = 0
optimize_mode = 1
# SETTINGS:
save_name = "20190110_cubic"

# --------------- PART 1 ----------------
#
# Load data, prepare the differential equation

def f(t,n, f_args): # function for pendulum motion with friction
dndt = [f_args[0] - f_args[1] * n - f_args[2] * n**2 - f_args[3] * n**3]
return dndt

x0 = np.array([Yr[0]])
t_set_expt = [np.array([x]) for x in Xr]
n_set_expt = [np.array([x]) for x in Yr]
print("x0 = ",x0)
print("Xr[:5] = ",Xr[:5])
print("Yr[:5] = ",Yr[:5])
print("t_set_expt[:5] = ",t_set_expt[:5])
print("n_set_expt[:5] = ",n_set_expt[:5])

# --------------- PART 2 ----------------
#
# try out initial guesses

# initial guess [G, k1, k2, k3]
collection_of_initial_guesses = [
[0,1e-3,0.5e-3,1.2e-7],
[0,1e-2,0,1e-7],
[0,0,1.5e-3,1e-7]
]

if initial_guess_mode:
print("\n------------- INITIAL TESTING ----------------\n")
color_scheme = np.linspace(0,1,len(collection_of_initial_guesses))
fig = plt.figure()
ax.scatter(Xr,Yr,3,color='g',label="expt data")
for i in range(len(collection_of_initial_guesses)):
f_args0 = collection_of_initial_guesses[i]
# print(f_args0)
this_label = "initial guess" + str(i+1)
_, _, x_test, t_test = sh.RK45_wrapper(x0, t_set_expt, f, f_args0, stepsize=None)
ax.scatter(t_test,x_test,3,color=(1-color_scheme[i],0,color_scheme[i]),label=this_label)
ax.legend()
ax.set_xlabel("t")
ax.set_ylabel("n")

if optimize_mode:
def target_function(f_args):
# "checkpoint" here refers to the intended data points
#   since RK45 will generate more points in between the checkpoints.
#   What happens is just that with smaller time steps in between the checkpoints,
#   we can compute the checkpoints more precisely.
_, _, x_checkpoint, t_checkpoint_new = sh.RK45_wrapper(x0, t_set_expt, f, f_args, stepsize=None)

collection_of_f_i = []
# each i is a data point in this particular example
for x1, x2 in zip(x_checkpoint,n_set_expt):
this_norm = np.linalg.norm(np.array(x1)-np.array(x2))
collection_of_f_i.append(this_norm)
return np.array(collection_of_f_i)
print("\n------------- OPTIMIZATION STAGE ----------------\n")

for i in range(len(collection_of_initial_guesses)):
f_args0 = collection_of_initial_guesses[i]
print(" initial guess = ", f_args0)
optimized_res = least_squares(target_function,f_args0)
f_args_optimized = optimized_res.x

SAVE_DATA = {
"initial_guess": f_args0,
"optimized": f_args_optimized,
"cost":optimized_res.cost
}

print(" + optimized result = ",f_args_optimized)
print(" + cost = ",optimized_res.cost)
output = open(save_name+"_"+ str(i+1) +".par", 'wb')
pickle.dump(SAVE_DATA, output)
output.close()

plt.show()

cubicODE_post_processing.py

import numpy as np
import matplotlib.pyplot as plt
import pickle

# SETTINGS
serial_number = [1,2,3]

def f(t,n, f_args): # function for pendulum motion with friction
dndt = [f_args[0] - f_args[1] * n - f_args[2] * n**2 - f_args[3] * n**3]
return dndt
x0 = np.array([Yr[0]])
t_set_expt = [np.array([x]) for x in Xr]
n_set_expt = [np.array([x]) for x in Yr]
print("x0 = ",x0)
print("Xr[:5] = ",Xr[:5])
print("Yr[:5] = ",Yr[:5])
print("t_set_expt[:5] = ",t_set_expt[:5])
print("n_set_expt[:5] = ",n_set_expt[:5])

print("\n------------- PLOTTING ----------------\n")
color_scheme = np.linspace(0,1,len(serial_number))

for j in range(len(serial_number)):
fig = plt.figure()
ax.scatter(Xr,Yr,3,color='g',label="expt data")
i = serial_number[j]
this_label = "initial guess" + str(i+1)
SAVE_DATA = None
pkl_file.close()
# print(SAVE_DATA)
f_args_initial = SAVE_DATA["initial_guess"]
f_args_optimized = SAVE_DATA["optimized"]
cost  = SAVE_DATA["cost"]
_, _, x_op, t_op = sh.RK45_wrapper(x0, t_set_expt, f, f_args_optimized, stepsize=None)
print("Serial number:",j+1, " - Optimized parameters = " )
print(" ",f_args_optimized)
print(" cost = ", cost)
ax.plot(t_op,x_op,color=(1-color_scheme[j],0,color_scheme[j]),label=this_label)
ax.legend()
ax.set_title(title_msg)
ax.set_xlabel("t")
ax.set_ylabel("n")

plt.show()

## Convolutional Neural Network with keras: MNIST

In this post we use Convolutional Neural Network, with VGG-like convnet structure for MNIST problem: i.e. we train the model to recognize hand-written digits. We mainly follow the official keras guide, in this link.

The jupyter notebook detailing the attempt in this post is found here by the name keras_test2.ipynb.

Starting with a conclusion: it works pretty well, for a very quick training, the model can recognize hand-written digit with 98% accuracy.

As shown below, our input is 28 by 28 with 1 channel (1 color), since the hand-written digit is stored in a 28 by 28-pixel greyscale image. The layers used are

1. 2x 2D convolutional layers with 32x 3 by 3 filters  followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used; this is to prevent over-fitting.
2. 2x 2D convolutional layers with 64x 3 by 3 filters followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used.
3. Flatten layer just reshapes 2D image-like output from the previous layer to a 1D list of values. The first denses layer has 256 neurons, followed by dropout layer and finally a dense layer of 10 neurons corresponding to 10 classes or 10 different digits in MNIST. All activation functions are ReLu except the last one, softmax, as usual.

See the link here on how the data is prepared for training (i.e. the missing code shown as … partial code… below).

# ... partial code ...

model = Sequential()
# input: 28x28 images with 1 channels -> (28 ,28, 1) tensors.

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.fit(x_train, y_train, batch_size=16, epochs=10)
model.evaluate(x_test, y_test, batch_size=32)

For a quick training, this model obtains a very high accuracy of 0.98, as shown below.

Epoch 1/10
6400/6400 [==============================] - 6s 904us/step - loss: 0.8208 - acc: 0.7206
Epoch 2/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.2427 - acc: 0.9266
Epoch 3/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.1702 - acc: 0.9483
Epoch 4/10
6400/6400 [==============================] - 2s 380us/step - loss: 0.1353 - acc: 0.9589
Epoch 5/10
6400/6400 [==============================] - 2s 373us/step - loss: 0.1117 - acc: 0.9650
Epoch 6/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.1080 - acc: 0.9697
Epoch 7/10
6400/6400 [==============================] - 2s 374us/step - loss: 0.0881 - acc: 0.9734
Epoch 8/10
6400/6400 [==============================] - 2s 375us/step - loss: 0.0880 - acc: 0.9736 1s - los
Epoch 9/10
6400/6400 [==============================] - 2s 377us/step - loss: 0.0690 - acc: 0.9766
Epoch 10/10
6400/6400 [==============================] - 2s 373us/step - loss: 0.0686 - acc: 0.9800
100/100 [==============================] - 0s 940us/step

## Neural Network with keras: Remainder Problem

The problem we try to solve here is the remainder problem. We train our neural network to find the remainder of a number randomly drawn from 0 to 99 inclusive when it is divided by 17. For example, given 20, the remainder is 3.

The code (in Jupyter notebook) detailing the results of this post can be found here by the name keras_test1.ipynb. In all the tests, we use only 1 hidden layers made of 64 neurons and different input and output layers to take into account the context of the problem. With the context taken into account, we show that we can help the neural network model train better!

Test 1A and Test 1B

Note: See the corresponding sections in the Jupyter notebook.

We start with a much simpler problem. Draw a random number from 0 to 10 inclusive. We find their remainders when divided by 10, which is quite trivial. From test 1A, with 4 epochs, we see a steady improvement in prediction accuracy up to 82%. With 12 epochs in test 1B, our accuracy is approximately 100%. Good!

Test 2A and Test 2B

Now, we raise the hurdle. We draw wider range of random numbers, from 0 to 99 inclusive. To be fair we give the neural network more data points for training. We get pretty bad outcome; the trained model in test 2A suffers the problem of predicting only 1 outcome (it always predicts the remainder is 0). In test 2B, we perform the same training, but for longer epochs. The problem still occurs.

Test 3A

Now we solve the problem in test 2A and 2B by contextualizing the problem. Notice that in test 1A, 1B, 2A and 2B, there is only 1 input (i.e. 1 neuron in the input layer) which exactly corresponds to the random number whose remainder is to be computed.

Now, in this test, we convert it into 2 inputs, splitting the unit and tenth digits. For example, if the number is 64, the input to our neural network is now (6,4). If the number is 5, then it becomes (0,5). This is done using extract_digit() function. The possible “concept” that the neural network can learn is the fact that for division by 10, only the last digit matters. That is to say, if our input is (a,b) after the conversion, then only b matters.

What do we get? 100% accuracy! All is good.

Test 3B

Finally, we raise the complexity and solve our original problem. We draw from 0 to 99 inclusive, and find the remainder from division with 17. We use extract_digit() function here as well. Running it over 24 epochs, we get an accuracy of 96% (and it does look like it can be improved)!

Conclusion? First thing first, this is just a demonstration of neural network using keras. But more importantly, contextualizing the input does help!

The code for Test3B can be found in the following.

[1]

import numpy as np
from keras.models import Sequential
from keras.layers import Dense

[2]

N = 100
D = 17

def simple_binarizer17(y, bin_factor=1, bin_shift=0):
out = [0+bin_shift]*17
out[y] = 1*bin_factor
return out

def extract_digit(x):
b = x%10
a = (x-b)/10
return [int(a),int(b)]

X0_train = np.random.randint(N+1,size=(256000,1))
Y_train = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_train).tolist()[0]])
X0_test = np.random.randint(N+1,size=(100,1))
Y_test = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_test).tolist()[0]])

X_train = np.array([extract_digit(X[0]) for X in X0_train])
X_test = np.array([extract_digit(X[0]) for X in X0_test])
for X0,X in zip(X0_train[:10],X_train[:10]):
print(X0,"->",X)

[3]

model = Sequential()

model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
model.fit(X_train, Y_train, epochs=24, batch_size=32)

loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=10)
print("--LOSS and METRIC--")
print(loss_and_metrics)
print("--PREDICT--")
classes = model.predict(X_test, batch_size=16)

[4]

count = 0
correct_count = 0
for y0,y in zip(Y_test,classes):
count = count+1
correct_pred = False
if np.argmax(y0)==np.argmax(y):
correct_pred = True
correct_count = correct_count + 1
if count<20:
print(np.argmax(y0),"->",np.argmax(y), "(",correct_pred,")")
accuracy = correct_count/len(Y_test)
print("accuracy = ", accuracy)

## Utilities

Example usage: Create “tryy.csv” file containing rows and columns of decimal numbers. Run the following.

import csv
import numpy as np

# filename : string, name of csv file without extension
# headerskip : nonnegative integer, skip the first few rows
#   if all rows are to be read, set to zero

out = []
with open(str(filename)+'.csv') as csv_file:
count = 0
i = 0
count = count + 1
else:
out.append(row)
if get_the_first_N_rows>0:
i = i + 1
if i == get_the_first_N_rows:
break
return out

# example
xx = [[np.float(x) for x in y] for y in xx] # nice here
print(xx)
print(np.sum(xx[0]))

## MNIST Neural Network test 1

To test MNIST using kero 0.6.3, I will use jupyter notebook in a virtual environment. Also, in this folder, place adhoc_utils.py containing the function read_csv() from here. I will use virtualenv as usual: see here. Then after activating the virtual environment, simply:

pip install jupyter
pip install kero
pip install matplotlib
pip install opencv-python
jupyter notebook

Download MNIST file that has been converted into CSV form; I got it from this link. Now, create the python notebook  mnist_dnn.ipynb (see below) and run all the cellsYou can find this test run and similar test runs here.

Unfortunately, it appears that the trained models only predict one single output for any input (it predicts only 6 for any image in one of the attempts, which is bad). Several possible issues and remarks include the following.

1. There might be defective data points. Update: not likely, it is easy to check it with tested machine learning algorithm. I tried using keras on the same data here; training and prediction have been successful.
2. Different loss functions are more suitable, check out, for example, KL divergence. Update: this is certainly more than meets the eye. See a tutorial from Stanford here. Using MSE, which is L2, appears to be harder to optimize. Use instead L1 norms like cross-entropy loss.
3. This example uses no softmax layer at the end; in fact, using default Neural Network from kero, the final layer is activated using the same activation function (in this example, sigmoid function) as other layers. The maximum value at the output layer is taken as the predicted output.
4. DNN has been treated like a black box; nobody quite knows what happens throughout the process in a coherent manner; in fact it could be just that the randomly initialized weights before training were not chosen in the right range. This might be interesting to study in the future (hopefully the experts come out with new insights soon).

All the above said, the little modification I did (before softmax) includes initiating all biases to zero instead of random and allow for options to generate random weights in a normalized manner (that depend on the number of neurons). I might change the interface a little, but in any case, seems like there might be more works to do! That’s all for now, happy new year!

mnist_dnn.ipynb

[1]

import numpy as np
import matplotlib.pyplot as plt
import cv2, time

import kero.multib.NeuralNetwork as nn
import kero.utils.utils as ut

[2]

# Loading MNIST image data from csv files.
# Also, binarize labels.
#
#

def simple_binarizer(mnist_label, bin_factor=1, bin_shift=0):
# mnist_label: int, 0,1,2... or 9
out = [0+bin_shift]*10
out[mnist_label] = 1*bin_factor
return np.transpose(np.matrix(out))
def convert_list_of_string_to_float(this_list):
out = []
for i in range(len(this_list)):
out.append(float(this_list[i]))
return out

bin_shift = 0
bin_factor = 1
img_width, img_height = 28, 28
pixel_normalizing_factor = 255
# read_csv returns list of list.
mnist_train_labels_binarized = [simple_binarizer(int(x[0]),bin_factor=bin_factor,bin_shift=bin_shift) for x in mnist_train]
mnist_train_data = [1/pixel_normalizing_factor*np.transpose(np.matrix(convert_list_of_string_to_float(x[1:]))) for x in mnist_train]
#

# Uncomment this to print the binarized labels
#
for i in range(5):
print(mnist_train[i][0] ,":",ut.numpy_matrix_to_list(mnist_train_labels_binarized[i]))

[3]

# Uncomment this to see the flattened image profile
#
# temp = mnist_train_data[0]
# print("max = ", np.max(temp))
# print("min = ", np.min(temp))
# mean_val = np.mean(temp)
# print("mean = ", mean_val)
# fig0 = plt.figure()
# ax0.plot(range(len(temp)),temp)
# ax0.plot(range(len(temp)),[mean_val]*len(temp))

[4]

# To visualize the loaded data, uncomment and run this section.
#
#

# mnist_train_labels = [x[0] for x in mnist_train]
# mnist_train_data_image_form = [np.array(x[1:]).reshape(img_height,img_width).astype(np.uint8) for x in mnist_train]

# data_length = len(mnist_train_data)
# for i in range(10):
#     if i < data_length:
#         print(mnist_train_data_image_form[i].shape,end=",")

# #
# count=0
# title_set = []
# for label,img_data in zip(mnist_train_labels,mnist_train_data_image_form):
#     title = "count: "+str(count)+"| label: "+str(label)
#     title_set.append(title)
#     cv2.imshow(title, img_data)
#     cv2.resizeWindow(title, 300,300)
#     count = count + 1
#     if count == 5:
#         break
# cv2.waitKey(0)
# for title in title_set:
#     cv2.destroyWindow(title)

[5]

# input_set: list of numpy matrix [x],
#   where each x is a column vector m by 1, m the size of input layer.
# Y0_set: list of numpy matrix [Y0],
#   where each Y0 is a column vector N by 1, N the size of output layer.
#   This is equal to 10, since it corresponds to labels 0,1,...,9.
#
#

input_set = mnist_train_data
Y0_set = mnist_train_labels_binarized
number_of_neurons = [784,28,10]
lower_bound, upper_bound = 0 ,1
bounds = [lower_bound, upper_bound]
bulk = {
"number_of_neurons" : number_of_neurons,
"bounds": bounds,
"layerwise_normalization": True,
}

NeuralNetwork = nn.NeuralNetwork()
NeuralNetwork.learning_rate = 1
NeuralNetwork.initiate_neural_network(bulk, mode="UniformRandom",
verbose = False,
verbose_init_mode=False,
verbose_consistency=False)

nu = nn.NetworkUpdater()
nu.set_settings(method="RegularStochastic",
method_specific_settings={
"batch_size":8,
"no_of_epoch":32,
"shuffle_batch":True,
})
nu.set_training_data(input_set,Y0_set)
nu.set_neural_network(NeuralNetwork)

[6]

AF = nn.activationFunction(func = "Sigmoid")
start = time.time()
weights_next, biases_next, mse_list = nu.update_wb(input_set, Y0_set,
NeuralNetwork.weights, NeuralNetwork.biases, AF,
mse_mode="compute_and_print", verbose=11)
end = time.time()
elapsed = end - start

[7]

print("epoch | mse value ")
mark = 1
for i in range(len(mse_list)):
if mark >= 0.1*len(mse_list) or i==0:
print(" + epoch {",i ,"} ", mse_list[i])
mark = 1
else:
mark = mark + 1

fig = plt.figure()
plt.plot(range(len(mse_list)), mse_list)

[8]

print("time taken [s] = ", elapsed)
print("time taken [min] = ", elapsed/60)
print("time taken [hr] = ", elapsed/3600)
print("time taken at 10k x [s] = ", elapsed*1e4)
print("time taken at 10k x [min] = ", elapsed*1e4/(60))
print("time taken at 10k x [hr] = ", elapsed*1e4/(3600))
print("time taken at 10k x [day] = ", elapsed*1e4/(3600*24))

[9]

no_of_images_to_test=500
mnist_test_labels = [int(x[0]) for x in mnist_test]
mnist_test_data = [1/pixel_normalizing_factor*np.transpose(np.matrix(convert_list_of_string_to_float(x[1:]))) for x in mnist_test]

hit_list = []
predict_list = []
predict_val = []
for i in range(no_of_images_to_test):
a_1 = mnist_test_data[i]
test_a_l_set, _ = nu.feed_forward(weights_next, biases_next, a_1, AF,
verbose=False,
matrix_formatting="%6.2f")
Y_after = test_a_l_set[-1]
predicted_label = int(np.argmax(Y_after))
actual_label= mnist_test_labels[i]
# print(Y_after)
#     print("predicted vs actual = ", predicted_label,"/",actual_label)
predict_list.append(predicted_label)
predict_val.append(Y_after)
if actual_label==predicted_label:
hit_list.append(1)
else:
hit_list.append(0)
print("predict list = ")
print(predict_list)
print("predict values = ")
for i in range(10):
#     print(ut.numpy_matrix_to_list(predict_val[i]))
ut.print_numpy_matrix(np.transpose(predict_val[i]),formatting="%9.6f",no_of_space=20)
print("hit list = ")
print(hit_list)
print("percentage correct = ", 100* np.sum(hit_list)/len(hit_list))

kero version 0.6.3

## Runge-Kutta with wrapper

See the codes for each example at the end of the post.

Example 1

Let us solve the differential equations $\frac{dx}{dt}=-x$. The following wrapper uses Runge-Kutta solver from scipy. The purpose of the wrapper is to compute the specified points $\{(x_i,t_i)\}$.

1. Input time_checkpoints into RK45_wrapper. time_checkpoints is the list [t1,t2,…,tN] whose values of x we want to compute.
2. x0 is the initial value, i.e. the value of x at t1. This is the initial condition for the initial value problem.
3. stepsize. This sets the maximum step size between tk and t(k+1). For example, if tk=1 and t(k+1)=2, stepsize can be set to 0.1 and the values will be computed at 1.1, 1.2, …, 2.0. By default stepsize = None.  When set to None, the stepsize is a-hundreth of tk and t(k+1) interval.

As seen in the figure below, the green “check points” are the values that we specify for the wrapper to compute via time_checkpoints. Note that t_checkpoint_new will be computed, since the solver DOES NOT exactly computes the values at t_checkpoint. However, by setting the stepsize as small as possible, t_checkpoint_new will be closer to t_checkpoint. We print max checkpt diff to show the difference between t_checkpoint_new and t_checkpoint. In this example, using the default stepsize is sufficiently close, i.e. max checkpt diff=0.

Red circles are the solutions generated by RK45 solver, and blue curve is the solution plotted analytically.

Example 2

Now we try out similarly with pendulum equation example like the example here. This shows that essentially the same method works for multi-valued function as well. The equation is given by

$\ddot{\theta}(t)+ b \cdot \dot{\theta}(t) +c\cdot sin(\theta (t)) = 0$

which can be linearized as a system of equations:

$\dot{\theta}(t) = \omega(t)$
$\dot{\omega}(t) = -b\cdot \omega (t) - c\cdot sin(\theta (t))$

We get the following. Each color correspond to one component in the linearized system. The scatter dots are solutions solved by RK45, the cross and circle are respectively the checkpoints we want to compute and the same checkpoints produced by RK45.

Code for Example 1

import numpy as np
from scipy.integrate import RK45
import matplotlib.pyplot as plt

def f(t,y):
# dy/dt = f(t,y)
return np.array([-1*y[0]])

x0 = np.array([1.0]) # initial value
# t0 = 0  # initial time
# t1 = 10 # final time

def RK45_wrapper(x0, time_checkpoints, f, stepsize=None):

if stepsize is None:
stepsize = 0.01*(time_checkpoints[-1]-time_checkpoints[0])
t_set = [time_checkpoints[0]]
x_set = [x0]
t_checkpoint_new = [time_checkpoints[0]]
x_checkpoint = [x0]

for i in range(len(time_checkpoints)-1):
t1 = time_checkpoints[i+1]
if i>0:
x0 = ox.y
t0 = ox.t

else:
t0 = time_checkpoints[i]
ox = RK45(f,t0,x0,t1,max_step=stepsize)

while ox.t < t1:
msg = ox.step()
t_set.append(ox.t)
x_set.append(ox.y)
x_checkpoint.append(ox.y)
t_checkpoint_new.append(ox.t)
return x_set, t_set, x_checkpoint, t_checkpoint_new

# all data by DE (differential equation)
time_checkpoints = np.linspace(0,10,11)
x_set, t_set, x_checkpoint, t_checkpoint_new = RK45_wrapper(x0, time_checkpoints, f, stepsize=None)

t_checkpt_diff = []
for t,t_new in zip(time_checkpoints, t_checkpoint_new):
t_checkpt_diff.append(abs(t-t_new))
print("max checkpt diff = ", np.max(t_checkpt_diff))

# all data, analytic
t0 = time_checkpoints[0]
t1 = time_checkpoints[-1]
t_set2 = np.linspace(t0,t1,100)
f_int = lambda t: np.exp(-t)
x_set2 = [f_int(t_set2[i]) for i in range(len(t_set2))]

fig = plt.figure()
ax.scatter(t_set,x_set,facecolors='none',edgecolors='r',label="all data by DE")
ax.scatter(t_checkpoint_new,x_checkpoint,20,'g',label="check points")
ax.plot(t_set2,x_set2,label="all data, analytic")
ax.set_xlabel("t")
ax.set_ylabel("x")
ax.legend()
plt.show()

Code for Example 2

import numpy as np
from scipy.integrate import RK45
import matplotlib.pyplot as plt

def f(t,y, f_args):
theta, omega = y
dydt = [omega, -f_args[0]*omega - f_args[1]*np.sin(theta)]
return dydt

def RK45_wrapper(x0, time_checkpoints, f, f_args, stepsize=None):
if stepsize is None:
stepsize = 0.01*(time_checkpoints[-1]-time_checkpoints[0])
t_set = [time_checkpoints[0]]
x_set = [x0]
t_checkpoint_new = [time_checkpoints[0]]
x_checkpoint = [x0]

f_temp = lambda t,y : f(t,y, f_args)

for i in range(len(time_checkpoints)-1):
t1 = time_checkpoints[i+1]
if i>0:
x0 = ox.y
t0 = ox.t

else:
t0 = time_checkpoints[i]
ox = RK45(f_temp,t0,x0,t1,max_step=stepsize)

while ox.t < t1:
msg = ox.step()
t_set.append(ox.t)
x_set.append(ox.y)
x_checkpoint.append(ox.y)
t_checkpoint_new.append(ox.t)
return x_set, t_set, x_checkpoint, t_checkpoint_new

b = 0.25
c = 5.0
f_args = [b,c]
x0 = [np.pi - 0.1, 0.0]
time_checkpoints = np.linspace(0, 10, 11)
x_set, t_set, x_checkpoint, t_checkpoint_new = RK45_wrapper(x0, time_checkpoints, f, f_args, stepsize=None)

# print(np.array(x_set).shape) # (112, 2)
fig=plt.figure()
ax.scatter(t_set,[x[0] for x in x_set],3,label="RK45 x[0]")
ax.scatter(t_set,[x[1] for x in x_set],3,label="RK45 x[1]")
ax.scatter(t_checkpoint_new,[x[0] for x in x_checkpoint],50,label="output ckpt x[1]",facecolors='none',edgecolors='b')
ax.scatter(t_checkpoint_new,[x[1] for x in x_checkpoint],50,label="output ckpt x[1]",facecolors='none',edgecolors='r')
ax.scatter(time_checkpoints,[x[0] for x in x_checkpoint],30,label="manual ckpt x[1]",marker="x",color="b")
ax.scatter(time_checkpoints,[x[1] for x in x_checkpoint],30,label="manual ckpt x[1]",marker="x",color="r")

ax.set_xlabel("x")
ax.set_ylabel("y")
ax.legend()
plt.show()

## Parameter Estimation for Differential Equations using Gradient Descent

The post is about parameter estimation based on gradient descent specifically for the differential equation $\frac{dn}{dt}=G-k_1 n-k_2 n^2-k_3 n^3$, i.e. we are looking for a good $(G,k_1,k_2,k_3)$ that best fits the experimental data. We will go through the methods of choosing initial parameter guesses and running the optimization. This post is more fully explained in this note: Differential Equation Parameter Estimations. The data used here is a fraction of the sample data I was given. The codes and sample data can also be found here.

Consider this link instead Parameter Estimation for Differential Equations using Scipy Least Square since it utilizes more robust optimization component from scipy.

Place fitting.py, fitting_post_processing.py and param_estimate.py (the codes are at the end of the post) in the project folder. Some required python packages include matplotlib, scipy and kero; do pip install them.

1. In the command line, cd into the directory where the project is.
2. Preliminary. Prepare sample_data.csv containing data points (t,n). First column is for t (horizontal axis), second column is n (vertical axis). Set the following toggles in fitting.py.
 plot_data True do_initial_testing False start_op False

Run python fitting.py. This plots the sample data, shown in figure 1. In this context, the relevant data points are the data after the peak. The orange curve shows the plot of only the extracted relevant data.

3. Initial testing stage. Set the following toggles in fitting.py.
 plot_data False do_initial_testing True start_op False

Set collection_of_p_init. You can place as many guesses as you want. In this example,

collection_of_p_init =[
[0,1e-3,0.5e-3,1.2e-7], # guess 1
[0,1e-2,0,1e-7], # guess 2
[0,0,1.5e-3,1e-7], # guess 3
]

we try using 3 different initial parameters, where each of them is of the format [G,k1,k2,k3], corresponding to the 4 parameters that we want to fit. Then, run the command:  python fitting.py. The results are shown in figure 2. It can be seen that the first and third guesses are closer to the experimental data; we might want to use the parameters near these values for random initiation in the optimization stage (but for demonstration sake, we will not do it here).

4. Optimization stage. Set the following toggles in fitting.py.
 plot_data False do_initial_testing False start_op True

Set the preferred settings. p_init_max and p_init_min are the range of random initial parameters to try from. We can set the number of attempts via the variable no_of_tries, and of course each attempt starts with a random initial parameter based on the range we just specified. Run the command:  python fitting.py. You will see optimization in progress with decreasing mse values.

5. Viewing the optimization result. We want to load all 3 fitting_data files we have created. Thus, in fitting _post_processing.py, set
filename = 'fitting_data'
save_data_series = [1,2,3]

Run the command:  python fitting _post_processing.py. Figure 3 shows how after every iteration, the resulting curve is approaching the experimental data.

Finally, we want to see what is the best parameter p. When viewing the optimization result, the parameters will be printed. We are more or less done!

What else to do?

In this post I have only run a few iterations over a few number of random parameter initializations. To obtain the best results, we should run them over larger iterations and try more random initial parameters.

That’s all, have fun~

Figure 1. A plot of n-t from the sample data is in blue. The orange data is in

Figure 2. The curves show 3 different plots from 3 different guesses of initial parameters. The red curve is the experimental data. The console shows recommended setting for learning_rate variable.

Figure 3. As the algorithm is iterated, the curve goes from red to blue. The figures generally show that the curve approaches the experimental data and that MSE trend decreases along the iteration of i.

fitting.py

import numpy as np
import kero.utils.utils as ut
import param_estimate as pe
import matplotlib.pyplot as plt
from scipy.integrate import RK45
import csv
import pickle

es = pe.DE_param_estimator()

x0 = np.array([Yr[0]])
t_set_expt = [np.array([x]) for x in Xr]
n_set_expt = [np.array([x]) for x in Yr]
print("x0 = ",x0)
print("Xr[:5] = ",Xr[:5])
print("Yr[:5] = ",Yr[:5])
print("t_set_expt[:5] = ",t_set_expt[:5])
print("n_set_expt[:5] = ",n_set_expt[:5])

#.#

print("----------------- Prepare Model ---------------------------")
# MODEL used to fit this data
#
# We are estimating the parameters of the following differential equations:
#   dn/dt = F(n, p) = G - k1 * n - k2 * n**2 - k3 * n**3
#   where p is the parameter (G,k1,k2,k3)
def F(y,p):
return p[0] - p[1] * y -p[2] * y**2 - p[3] * y**3
diff_part_G = lambda x: 1
diff_part_k1 = lambda x: -x
diff_part_k2 = lambda x: -x**2
diff_part_k3 = lambda x: -x**3
list_of_derivatives = [diff_part_G, diff_part_k1, diff_part_k2, diff_part_k3]
print("  Model prepared.")
#.#

do_initial_testing = 0
if do_initial_testing:
print("---------------- Prepare Initial guess --------------------")
# Before optimizing, try playing around the p_init values here
# We will choose the range of uniform random values for guess p based on tests conducted here
# Write down choices that seem good and any remarks here
#
#   1. p_init = [0,5e-2,2e-3,1.1e-7] # near the real value. all n less than expt data
#   2. p_init = [0,5e-2,1e-3,1.1e-7] # seems to be better than 1
#   3. p_init = [0,5e-2,0,1.1e-7] # all n above expt data
#   4. p_init = [0,1e-1,1e-4,1.1e-7] # small n above expt data, larger n below
#   5. p_init = [0,1e-2,1e-3,1.1e-7] # exceedingly close! Let's vary around these values instead
#
#
#

collection_of_p_init =[
[0,1e-3,0.5e-3,1.2e-7], # guess 1
[0,1e-2,0,1e-7], # guess 2
[0,0,1.5e-3,1e-7], # guess 3
]

fig = plt.figure()
no_of_test = len(collection_of_p_init)
for i in range(no_of_test):
p_init = collection_of_p_init[i] # guess p
time_checkpoints = t_set_expt
def f(t,y):
return np.array([F(y[0],p_init)])
_,_, x_checkpoint, t_checkpoint_new = pe.RK45_wrapper(x0, time_checkpoints, f, stepsize=None)
ax.plot(t_checkpoint_new,x_checkpoint, label="guess "+str(i+1), color=(0,1-1*(i+1)/no_of_test,1*(i+1)/no_of_test))

# To find what is the best learning_rate
print(" testing initial guess parameters: ",i+1)
es.update_p(t_set_expt, n_set_expt, F, p_init, list_of_derivatives, verbose=11)
ax.plot(Xr,Yr, color = "r", label="expt values")
ax.legend( )
ax.set_xlabel("t")
ax.set_ylabel("n")

plt.show()
#.#

# ------------------------------------------------------------------------------- #
start_op = 1
if start_op:
print("----------------- Start Optimizing ---------------------------")

# ********* Settings *************
p_init_max = [1e-10,0 ,0,0.9e-7] # [Gmax,k1max,k2max,k3max]
p_init_min = [1e-10, 2e-2,2e-3,1.2e-7] # [Gmin ,k1min,k2min,k3min]
no_of_tries = 3
save_name = 'fitting_data'
es.learning_rate= 1e-16
no_of_iterations = 10
save_interval = 1
# ********************************

n_set_expt_MAT = pe.RK45_output_to_list(n_set_expt) # MATRIX list form

for j in range(no_of_tries):
p_init = []
for a,b in zip(p_init_min, p_init_max):
p_init.append(np.random.uniform(a,b))
p_now= p_init
save_count = 0
SAVE_DATA = []
print("TRY ", j+1," / ",no_of_tries)
mse_list = []
for i in range(no_of_iterations + 1):
if i>0:
# for i =0 , initial state, do not iterate yet
p_now = es.update_p(t_set_expt, n_set_expt, F, p_now, list_of_derivatives, verbose=0)

#------------------------ FOR REALTIME OBSERVATION -----------
def f(t,y):
return np.array([F(y[0],p_now)])
time_checkpoints = t_set_expt
_,_, n_set_next, t_set_next = pe.RK45_wrapper(x0, time_checkpoints, f, stepsize=None)
n_set_next_MAT = pe.RK45_output_to_list(n_set_next)
mse = pe.MSE(n_set_next_MAT,n_set_expt_MAT)
mse_list.append(mse)
print(" i = ", i , " -- > mse = ", mse)
# print("   p_new = ", p_now)

#------------------------ TO BE SAVED -------------------------
save_count = save_count + 1
if save_count == save_interval or i==0:
save_count = 0
data_store = {
"t_set":t_set_next,
"n_set":n_set_next,
"p":p_now,
"learning_rate":es.learning_rate,
"mse_list": mse_list
}
SAVE_DATA.append(data_store)
output = open(save_name+"_"+ str(j+1) +".par", 'wb')
pickle.dump(SAVE_DATA, output)
output.close()

print("\nClosing Program...")

fitting_post_processing.py

import param_estimate as pe
import matplotlib.pyplot as plt
import numpy as np
import time
import pickle

filename = 'fitting_data'
save_data_series = [1,2,3]

for j in save_data_series:
pkl_file = open(filename+'_'+str(j)+'.par', 'rb')
SAVE_DATA = None
pkl_file.close()

fig = plt.figure(j)
ax = None
ax2 = None

# ------------------- EXPT DATA ---------------------------------=
ax.scatter(Xr,Yr,3,marker="x",c="g")

# ------------------- DATA FROM OPTIMIZATION ---------------------
color_scheme = np.linspace(0,1,len(SAVE_DATA))
for i in range(len(SAVE_DATA)):
data_store = SAVE_DATA[i]
n_set = data_store["n_set"]
t_set = data_store["t_set"]
p = data_store["p"]
mse_list = data_store["mse_list"]
print(" i = ", i)
print("   p = ", p )
if i==0:
this_label = "init curve"
else:
this_label = str(i)
ax.plot(t_set,n_set, color=(1-color_scheme[i],0,color_scheme[i]), label=this_label)
if i ==0:
this_title = "initial G,k1,k2,k3 = " + str(p)

plt.title(this_title)
ax2.plot(range(len(mse_list)),mse_list)
ax.legend()
ax.set_xlabel("t")
ax.set_ylabel("n")
ax2.set_xlabel("i")
ax2.set_ylabel("MSE")
plt.show()

print("\nClosing Post-Processing Program...")

param_estimate.py

# We are estimating the parameters of the following differential equations:
#   dn/dt = F(n, p) = G - k1 * n - k2 * n**2 - k3 * n**3
#   where p is the parameter (G,k1,k2,k3)

import numpy as np
import kero.utils.utils as ut
from scipy.integrate import RK45
import matplotlib.pyplot as plt
import csv

class DE_param_estimator:
def __init__(self):
self.learning_rate = 1e-21
return
# l2
def update_p(self,t_set_expt, n_set_expt, F, p_now, list_of_derivatives,
verbose = False):
# t_set_expt: list of numpy array
#   for example: [array([25.2047]), array([25.279]), array([25.3533]), array([25.4277]),...]
# n_set_expt: list of numpy array
#   for example: [array([789.696]), array([784.109]), array([763.528]), array([726.782]),...]
# p_now: list of parameters
#
#
# return
#   p_next: list of parameters
nabla_mse = self.nabla_MSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives)
if verbose>10:
# temp = self.learning_rate*nabla_mse
temp2 = np.transpose(np.matrix(p_now))
for_recommend = []
if verbose>30:
print("    nabla_mse = ")
ut.print_numpy_matrix(nabla_mse,formatting="%s",no_of_space=10)
print("    p_now =")
ut.print_numpy_matrix(temp2,formatting="%s",no_of_space=10)
det_list = []
with np.errstate(divide='ignore'):
for k in range(len(p_now)):
det = 1e-2*p_now[k]/nabla_mse.item(k)
if verbose>20:
print("    p_now[k]/(nabla_mse[k]*100) =",det)
det2 = np.log10(np.abs(det))
if (det2!=np.inf) and (det2!=-np.inf) and (not np.isnan(det2)):
det_list.append(det2)
if verbose>20: print("      det_list = ",det_list)
print("       recommended learning rate ~ ", 10**np.min(det_list))
p_next= np.transpose(np.matrix(p_now)) - self.learning_rate * nabla_mse
p_next = np.transpose(p_next).tolist()
p_next = p_next[0]
return p_next
# L3
def dkdMSE(self, t_set_expt, n_set_expt, F, p_now, diff_part):
# lambda function : d/dk d/dt of n
# t_set_expt, n_set_expt : experimental data
N = len(t_set_expt)
dkmse = 0

def f(t,y):
return np.array([F(y[0],p_now)])

x0 = n_set_expt[0] # array
time_checkpoints= t_set_expt
x_set, t_set, x_checkpoint, t_checkpoint_new = RK45_wrapper(x0, time_checkpoints, f, stepsize=None)

for i in range(1,len(n_set_expt)):
ni = x_checkpoint[i][0]
# print(" ni =", ni)
m = i+1
diff_term = 0
for j in range(m):
diff_term = diff_term + diff_part(x_checkpoint[j][0])
diff_term = diff_term / m
dkmse = dkmse + ( ni - n_set_expt[i][0])* diff_term
dkmse = 1/N*dkmse
return dkmse

def nabla_MSE(self, t_set_expt, n_set_expt, F, p_now, list_of_derivatives,
verbose = False):
# list_of_derivatives: list of lambda functions

dGdMSE = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[0])
dk1mse = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[1])
dk2mse = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[2])
dk3mse = self.dkdMSE(t_set_expt, n_set_expt, F, p_now, list_of_derivatives[3])
# print("dGdMSE=", dGdMSE)
# print("dk1mse=",dk1mse)
# print("dk2mse=",dk2mse)
# print("dk3mse=",dk3mse)
nabla_mse = np.transpose(np.matrix([dGdMSE, dk1mse, dk2mse, dk3mse]))
return nabla_mse

filename = 'sample_data'
all_data = [[np.float(x) for x in y] for y in all_data]

X=[x[0] for x in all_data]
Y=[x[1] for x in all_data]

max_index = np.argmax(Y)
relevant_data= all_data[max_index:]
Xr = [x[0] for x in relevant_data]
Yr = [x[1] for x in relevant_data]

print("     data size = ", len(Xr))
print("     peak = ", X[max_index],",", Y[max_index])
if plot_data:
fig = plt.figure()
ax.scatter(X,Y,3)
ax.scatter(X[max_index],Y[max_index])
ax.plot(Xr,Yr,'r')
ax.set_xlabel("t")
ax.set_ylabel("n")
plt.show()

return Xr, Yr

# filename : string, name of csv file without extension
# headerskip : nonnegative integer, skip the first few rows
#   if all rows are to be read, set to zero

# example
# xx = [[np.float(x) for x in y] for y in xx] # nice here
# print(xx)
# print(np.sum(xx[0]))

out = []
with open(str(filename)+'.csv') as csv_file:
count = 0
i = 0
count = count + 1
else:
out.append(row)
if get_the_first_N_rows>0:
i = i + 1
if i == get_the_first_N_rows:
break
return out

def MSE(Y_set,Y0_set):
# Y_set is a list of Y
# Y0_set is a list of Y0
# Assume Y_set and Y0_set have the same size, otherwise error will be raised
#
# Y (numpy matrix) is a vector: computed data from model
# Y0 (numpy matrix) is a vector: observed/true data
# Assume Y and Y0 have the same size, otherwise error will be raised
# Assume all entries are real

mse = 0
n = len(Y_set)
if not (n==len(Y0_set)):
print("MSE(). Error. Inconsistent dimension between Y_set and Y_set0. Return nothing.")
return
for i in range(n):
Y = Y_set[i]
Y0 = Y0_set[i]
Nout1 = len(Y.tolist())
Nout2 = len(Y0.tolist())
if not (Nout1==Nout2):
print("MSE(). Error. Inconsistent dimension between Y and Y0. Return nothing.")
return
# now compute the norms between true and model output values
for j in range(Nout1):
mse = mse + (np.abs(Y0.item(j)-Y.item(j)))**2
mse = mse / (2*n)

return mse

def RK45_wrapper(x0, time_checkpoints, f, stepsize=None):
# example
#
#
#
# def f(t,y):
# 	# dx/dt = f(x,t)
# 	return np.array([-1*y[0]])
# x0 = np.array([1.0]) # initial value
# time_checkpoints = np.linspace(0,10,11)
# x_set, t_set, x_checkpoint, t_checkpoint_new = RK45_wrapper(x0, time_checkpoints, f, stepsize=None)

# t_checkpt_diff = []
# for t,t_new in zip(time_checkpoints, t_checkpoint_new):
# 	t_checkpt_diff.append(abs(t-t_new))
# print("max checkpt diff = ", np.max(t_checkpt_diff))

if stepsize is None:
stepsize = 0.01*(time_checkpoints[-1]-time_checkpoints[0])
t_set = [time_checkpoints[0]]
x_set = [x0]
t_checkpoint_new = [time_checkpoints[0]]
x_checkpoint = [x0]

for i in range(len(time_checkpoints)-1):
t1 = time_checkpoints[i+1]
if i>0:
x0 = ox.y
t0 = ox.t

else:
t0 = time_checkpoints[i]
ox = RK45(f,t0,x0,t1,max_step=stepsize)

while ox.t < t1:
msg = ox.step()
t_set.append(ox.t)
x_set.append(ox.y)
x_checkpoint.append(ox.y)
t_checkpoint_new.append(ox.t)
return x_set, t_set, x_checkpoint, t_checkpoint_new

def RK45_output_to_list(xx):
return [np.matrix((x.tolist())[0]) for x in xx]

## Convolutional Neural Network with VGG Architecture

This post seeks to illustrate the difference between Convolutional Neural Network (CNN) and deep neural network (DNN) and hopes to add a little bit more clarity in the CNN process.

Figure 1. Comparison between a regular deep neural network with the convolutional neural network.

CNN is a variant of DNN with the constraint that the input is an image, or image-like. More technically,

1. In a DNN, every layer is fully connected to the layer before. This means every neuron in layer n+1 is affected by every neuron in layer n via linear combination (with bias, and then activation).
2. For CNN, such as in VGG architecture, only a few layers near the end are fully connected. Using the receptive field, a single neuron in layer n+1 is connected to some small number of neurons in the previous layer, corresponding to a small region in the visual space.

The above is illustrated in figure 1. In DNN, each neuron is fully connected to the image. With a large number of neurons, there will be a large number of weights to compute, not to mention that there will be a lot more weights between neurons from one layer to the next. In CNN, on the other hand, each neuron will be “in charge of” a small region in the image.

You might have seen the illustration for VGG architecture like figure 2 (I took the images from here; do visit the original sources of the image). VGG is an implementation of CNN by the Visual Geometry Group, Oxford (official link here). Figure 3 illustrates the process of convolution in the first layer, while figure 4 illustrates the process through fully connected layers. In essence, both are performing linear sums weighted by filters in figure 3 and the usual weights in figure 4 (like DNN) respectively.

Figure 2. VGG architecture.

Figure 3. Illustration for convolutional layer.

Figure 4. Illustration for fully connected layers.