Convolutional Neural Network with keras: MNIST

home > Machine Learning

In this post we use Convolutional Neural Network, with VGG-like convnet structure for MNIST problem: i.e. we train the model to recognize hand-written digits. We mainly follow the official keras guide, in this link.

Download MNIST file that has been converted into CSV form; I got it from this link.

The jupyter notebook detailing the attempt in this post is found here by the name keras_test2.ipynb.

Starting with a conclusion: it works pretty well, for a very quick training, the model can recognize hand-written digit with 98% accuracy.

As shown below, our input is 28 by 28 with 1 channel (1 color), since the hand-written digit is stored in a 28 by 28-pixel greyscale image. The layers used are

  1. 2x 2D convolutional layers with 32x 3 by 3 filters  followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used; this is to prevent over-fitting.
  2. 2x 2D convolutional layers with 64x 3 by 3 filters followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used.
  3. Flatten layer just reshapes 2D image-like output from the previous layer to a 1D list of values. The first denses layer has 256 neurons, followed by dropout layer and finally a dense layer of 10 neurons corresponding to 10 classes or 10 different digits in MNIST. All activation functions are ReLu except the last one, softmax, as usual.

See the link here on how the data is prepared for training (i.e. the missing code shown as … partial code… below).

# ... partial code ...

model = Sequential()
# input: 28x28 images with 1 channels -> (28 ,28, 1) tensors.
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.fit(x_train, y_train, batch_size=16, epochs=10)
model.evaluate(x_test, y_test, batch_size=32)

For a quick training, this model obtains a very high accuracy of 0.98, as shown below.

Epoch 1/10
6400/6400 [==============================] - 6s 904us/step - loss: 0.8208 - acc: 0.7206
Epoch 2/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.2427 - acc: 0.9266
Epoch 3/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.1702 - acc: 0.9483
Epoch 4/10
6400/6400 [==============================] - 2s 380us/step - loss: 0.1353 - acc: 0.9589
Epoch 5/10
6400/6400 [==============================] - 2s 373us/step - loss: 0.1117 - acc: 0.9650
Epoch 6/10
6400/6400 [==============================] - 2s 379us/step - loss: 0.1080 - acc: 0.9697
Epoch 7/10
6400/6400 [==============================] - 2s 374us/step - loss: 0.0881 - acc: 0.9734
Epoch 8/10
6400/6400 [==============================] - 2s 375us/step - loss: 0.0880 - acc: 0.9736 1s - los
Epoch 9/10
6400/6400 [==============================] - 2s 377us/step - loss: 0.0690 - acc: 0.9766
Epoch 10/10
6400/6400 [==============================] - 2s 373us/step - loss: 0.0686 - acc: 0.9800
100/100 [==============================] - 0s 940us/step

 

Neural Network with keras: Remainder Problem

home > Machine Learning

The problem we try to solve here is the remainder problem. We train our neural network to find the remainder of a number randomly drawn from 0 to 99 inclusive when it is divided by 17. For example, given 20, the remainder is 3.

The code (in Jupyter notebook) detailing the results of this post can be found here by the name keras_test1.ipynb. In all the tests, we use only 1 hidden layers made of 64 neurons and different input and output layers to take into account the context of the problem. With the context taken into account, we show that we can help the neural network model train better!

Test 1A and Test 1B

Note: See the corresponding sections in the Jupyter notebook.

We start with a much simpler problem. Draw a random number from 0 to 10 inclusive. We find their remainders when divided by 10, which is quite trivial. From test 1A, with 4 epochs, we see a steady improvement in prediction accuracy up to 82%. With 12 epochs in test 1B, our accuracy is approximately 100%. Good!

Test 2A and Test 2B

Now, we raise the hurdle. We draw wider range of random numbers, from 0 to 99 inclusive. To be fair we give the neural network more data points for training. We get pretty bad outcome; the trained model in test 2A suffers the problem of predicting only 1 outcome (it always predicts the remainder is 0). In test 2B, we perform the same training, but for longer epochs. The problem still occurs.

Test 3A

Now we solve the problem in test 2A and 2B by contextualizing the problem. Notice that in test 1A, 1B, 2A and 2B, there is only 1 input (i.e. 1 neuron in the input layer) which exactly corresponds to the random number whose remainder is to be computed.

Now, in this test, we convert it into 2 inputs, splitting the unit and tenth digits. For example, if the number is 64, the input to our neural network is now (6,4). If the number is 5, then it becomes (0,5). This is done using extract_digit() function. The possible “concept” that the neural network can learn is the fact that for division by 10, only the last digit matters. That is to say, if our input is (a,b) after the conversion, then only b matters.

What do we get? 100% accuracy! All is good.

Test 3B

Finally, we raise the complexity and solve our original problem. We draw from 0 to 99 inclusive, and find the remainder from division with 17. We use extract_digit() function here as well. Running it over 24 epochs, we get an accuracy of 96% (and it does look like it can be improved)!

Conclusion? First thing first, this is just a demonstration of neural network using keras. But more importantly, contextualizing the input does help!

The code for Test3B can be found in the following.

[1]

import numpy as np
from keras.models import Sequential
from keras.layers import Dense

[2]

N = 100
D = 17

def simple_binarizer17(y, bin_factor=1, bin_shift=0):
    out = [0+bin_shift]*17
    out[y] = 1*bin_factor
    return out

def extract_digit(x):
    b = x%10
    a = (x-b)/10
    return [int(a),int(b)]

X0_train = np.random.randint(N+1,size=(256000,1))
Y_train = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_train).tolist()[0]])
X0_test = np.random.randint(N+1,size=(100,1))
Y_test = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_test).tolist()[0]])

X_train = np.array([extract_digit(X[0]) for X in X0_train])
X_test = np.array([extract_digit(X[0]) for X in X0_test])
for X0,X in zip(X0_train[:10],X_train[:10]):
    print(X0,"->",X)

[3]

model = Sequential()

model.add(Dense(units=64, activation='relu', input_dim=2))
model.add(Dense(units=17, activation='softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
model.fit(X_train, Y_train, epochs=24, batch_size=32)

loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=10)
print("--LOSS and METRIC--")
print(loss_and_metrics)
print("--PREDICT--")
classes = model.predict(X_test, batch_size=16)

[4]

count = 0
correct_count = 0
for y0,y in zip(Y_test,classes):    
    count = count+1    
    correct_pred = False
    if np.argmax(y0)==np.argmax(y):
        correct_pred = True
        correct_count = correct_count + 1
    if count<20:
        print(np.argmax(y0),"->",np.argmax(y), "(",correct_pred,")")
accuracy = correct_count/len(Y_test)
print("accuracy = ", accuracy)

MNIST Neural Network test 1

home > Machine Learning

To test MNIST using kero 0.6.3, I will use jupyter notebook in a virtual environment. Also, in this folder, place adhoc_utils.py containing the function read_csv() from here. I will use virtualenv as usual: see here. Then after activating the virtual environment, simply:

pip install jupyter
pip install kero
pip install matplotlib
pip install opencv-python
jupyter notebook

Download MNIST file that has been converted into CSV form; I got it from this link. Now, create the python notebook  mnist_dnn.ipynb (see below) and run all the cellsYou can find this test run and similar test runs here.

Unfortunately, it appears that the trained models only predict one single output for any input (it predicts only 6 for any image in one of the attempts, which is bad). Several possible issues and remarks include the following.

  1. There might be defective data points. Update: not likely, it is easy to check it with tested machine learning algorithm. I tried using keras on the same data here; training and prediction have been successful.
  2. Different loss functions are more suitable, check out, for example, KL divergence. Update: this is certainly more than meets the eye. See a tutorial from Stanford here. Using MSE, which is L2, appears to be harder to optimize. Use instead L1 norms like cross-entropy loss.
  3. This example uses no softmax layer at the end; in fact, using default Neural Network from kero, the final layer is activated using the same activation function (in this example, sigmoid function) as other layers. The maximum value at the output layer is taken as the predicted output.
  4. DNN has been treated like a black box; nobody quite knows what happens throughout the process in a coherent manner; in fact it could be just that the randomly initialized weights before training were not chosen in the right range. This might be interesting to study in the future (hopefully the experts come out with new insights soon).

All the above said, the little modification I did (before softmax) includes initiating all biases to zero instead of random and allow for options to generate random weights in a normalized manner (that depend on the number of neurons). I might change the interface a little, but in any case, seems like there might be more works to do! That’s all for now, happy new year!

mnist_dnn.ipynb

[1]

import numpy as np
import adhoc_utils as Aut
import matplotlib.pyplot as plt
import cv2, time

import kero.multib.NeuralNetwork as nn
import kero.utils.utils as ut

[2]

# Loading MNIST image data from csv files. 
# Also, binarize labels.
#
#

def simple_binarizer(mnist_label, bin_factor=1, bin_shift=0):
    # mnist_label: int, 0,1,2... or 9
    out = [0+bin_shift]*10
    out[mnist_label] = 1*bin_factor
    return np.transpose(np.matrix(out))
def convert_list_of_string_to_float(this_list):
    out = []
    for i in range(len(this_list)):
        out.append(float(this_list[i]))
    return out

bin_shift = 0
bin_factor = 1
img_width, img_height = 28, 28
pixel_normalizing_factor = 255
# read_csv returns list of list.
# good news is, the loaded data is already flattened.
mnist_train =  Aut.read_csv("mnist_train", header_skip=1,get_the_first_N_rows = 6400)
mnist_train_labels_binarized = [simple_binarizer(int(x[0]),bin_factor=bin_factor,bin_shift=bin_shift) for x in mnist_train]
mnist_train_data = [1/pixel_normalizing_factor*np.transpose(np.matrix(convert_list_of_string_to_float(x[1:]))) for x in mnist_train]
# 

# Uncomment this to print the binarized labels
#
for i in range(5):
    print(mnist_train[i][0] ,":",ut.numpy_matrix_to_list(mnist_train_labels_binarized[i]))

[3]

# Uncomment this to see the flattened image profile
#
# temp = mnist_train_data[0]
# print("max = ", np.max(temp))
# print("min = ", np.min(temp))
# mean_val = np.mean(temp)
# print("mean = ", mean_val)
# fig0 = plt.figure()
# ax0 = fig0.add_subplot(111)
# ax0.plot(range(len(temp)),temp)
# ax0.plot(range(len(temp)),[mean_val]*len(temp))

[4]

# To visualize the loaded data, uncomment and run this section.
#
# 

# mnist_train_labels = [x[0] for x in mnist_train]
# mnist_train_data_image_form = [np.array(x[1:]).reshape(img_height,img_width).astype(np.uint8) for x in mnist_train]

# data_length = len(mnist_train_data)
# for i in range(10):
#     if i < data_length:
#         print(mnist_train_data_image_form[i].shape,end=",")

# #  
# count=0
# title_set = []
# for label,img_data in zip(mnist_train_labels,mnist_train_data_image_form):
#     title = "count: "+str(count)+"| label: "+str(label)
#     title_set.append(title)
#     cv2.imshow(title, img_data)
#     cv2.resizeWindow(title, 300,300)
#     count = count + 1
#     if count == 5:
#         break
# cv2.waitKey(0)
# for title in title_set:
#     cv2.destroyWindow(title)

[5]

# input_set: list of numpy matrix [x], 
#   where each x is a column vector m by 1, m the size of input layer.
# Y0_set: list of numpy matrix [Y0],
#   where each Y0 is a column vector N by 1, N the size of output layer.
#   This is equal to 10, since it corresponds to labels 0,1,...,9.
#
#

input_set = mnist_train_data
Y0_set = mnist_train_labels_binarized
number_of_neurons = [784,28,10]
lower_bound, upper_bound = 0 ,1
bounds = [lower_bound, upper_bound]
bulk = {
    "number_of_neurons" : number_of_neurons,
    "bounds": bounds,
    "layerwise_normalization": True,
}

NeuralNetwork = nn.NeuralNetwork()
NeuralNetwork.learning_rate = 1
NeuralNetwork.initiate_neural_network(bulk, mode="UniformRandom",
    verbose = False,
    verbose_init_mode=False,
    verbose_consistency=False)

nu = nn.NetworkUpdater()
nu.set_settings(method="RegularStochastic",
method_specific_settings={
        "batch_size":8,
        "no_of_epoch":32,
        "shuffle_batch":True,
})
nu.set_training_data(input_set,Y0_set)
nu.set_neural_network(NeuralNetwork)

[6]

AF = nn.activationFunction(func = "Sigmoid")
start = time.time()
weights_next, biases_next, mse_list = nu.update_wb(input_set, Y0_set, 
                NeuralNetwork.weights, NeuralNetwork.biases, AF,
                mse_mode="compute_and_print", verbose=11)
end = time.time()
elapsed = end - start

[7]

print("epoch | mse value ")
mark = 1
for i in range(len(mse_list)):
    if mark >= 0.1*len(mse_list) or i==0:
        print(" + epoch {",i ,"} ", mse_list[i])
        mark = 1
    else:
        mark = mark + 1

fig = plt.figure()
ax1 = fig.add_subplot(211)
plt.plot(range(len(mse_list)), mse_list)

[8]

print("time taken [s] = ", elapsed)
print("time taken [min] = ", elapsed/60)
print("time taken [hr] = ", elapsed/3600)
print("time taken at 10k x [s] = ", elapsed*1e4)
print("time taken at 10k x [min] = ", elapsed*1e4/(60))
print("time taken at 10k x [hr] = ", elapsed*1e4/(3600))
print("time taken at 10k x [day] = ", elapsed*1e4/(3600*24))

[9]

no_of_images_to_test=500
mnist_test =  Aut.read_csv("mnist_test", header_skip=1,get_the_first_N_rows = no_of_images_to_test)
mnist_test_labels = [int(x[0]) for x in mnist_test]
mnist_test_data = [1/pixel_normalizing_factor*np.transpose(np.matrix(convert_list_of_string_to_float(x[1:]))) for x in mnist_test]

hit_list = []
predict_list = []
predict_val = []
for i in range(no_of_images_to_test):
    a_1 = mnist_test_data[i]
    test_a_l_set, _ = nu.feed_forward(weights_next, biases_next, a_1, AF,
        verbose=False,
        matrix_formatting="%6.2f")
    Y_after = test_a_l_set[-1]
    predicted_label = int(np.argmax(Y_after))
    actual_label= mnist_test_labels[i]
    # print(Y_after)
#     print("predicted vs actual = ", predicted_label,"/",actual_label)
    predict_list.append(predicted_label)
    predict_val.append(Y_after)
    if actual_label==predicted_label:
        hit_list.append(1)
    else:
        hit_list.append(0)
print("predict list = ")
print(predict_list)
print("predict values = ")
for i in range(10):
#     print(ut.numpy_matrix_to_list(predict_val[i]))
    ut.print_numpy_matrix(np.transpose(predict_val[i]),formatting="%9.6f",no_of_space=20)
print("hit list = ")
print(hit_list)
print("percentage correct = ", 100* np.sum(hit_list)/len(hit_list))

 

kero version 0.6.3

update_wb_regular_stochastic()

home > kero > Documentation

Perform regular stochastic mode of gradient descent. See Data Science, Review.

kero.multib.NeuralNetwork.py

class NetworkUpdater:
  def update_wb_regular_stochastic(self, input_set, Y0_set,
		weights, biases, AF,		
		verbose=False,
		verbose_feed_forward=False,
		verbose_compute_diff=False,
		verbose_delta_L=False,
		verbose_compute_delta=False)
    return weights_next, biases_next

Arguments/Return

input_set list of numpy matrix [x]. Each x a column vector m x 1, m the number of neurons in input layer
Y0_set list of numpy matrix [Y0]. Each Y0 nx1, where n is the no of neurons in layer l=L. The true/observed values in the output layer corresponding to the input set. In another words, for each k=1,…,N, Y0_set[k] = f(x[k]) where f is the true function that our neural network is modelling and N the number of data points.
weights The collection of weights in the neural network.

weights is a list [w_l], where w_l is the collection of weights between the (l-1)-th and l-th layer, for l=2,3,…,L where l=1 is the input layer, l=2 the first hidden layer ad and l=L is the output layer.

w_l is a matrix (list of list) so that w_l[i][j] is the weight between neuron j at layer l-1 and neuron i at layer l.

biases the collection of biases in the neural network.

biases is a list [b_l], where b_l is the collection of biases in the l-th layer for l=2,3,…,L

AF AF (activationFunction). Assume it is initiated.
verbose
verbose_feed_forward
verbose_compute_diff
verbose_delta_L
verbose_compute_delta
Bool False or integer

The larger the integer, the more information is printed. Set them to suitable integers for debugging.

 

All default=False

return weights_next Same as weights, but has undergone 1 gradient descent iteration.
return biases_next Same as biases, but has undergone 1 gradient descent iteration.

 

Example Usage 1

Note that in order to use the following script, we need prep() function which is available in Deep Learning and Neural Network with kero PART 1.

testNNupdater2B.py

import testNNupdater2aux as taux
import kero.multib.NeuralNetwork as nn
import kero.utils.utils as ut
import numpy as np
import time

print("--- test2B ---")

# input_set : list of numpy matrix. 
# Y_set : list of numpy matrix. Output computed by NN
# Y0_set : list of numpy matrix. True/observed output
#  the grand objective is to train NN so that Y_set is equal to Y0_set 
# -------------------------------------------
# this is a collection of a_l_set and z_l_set over all data points
#   z_l_set is the collection of z values over all layers, l=2,3,...L
#   and a_l_set is the corresponding activated values
#   Recall: a_l_set and z_l_set each is a list of numpy matrices 

out = taux.prep(print_data=False)
input_set=out["input_set"]
Y_set=out["Y_set"]
Y0_set=out["Y0_set"]
collection_of_fed_forward_a_l=out["collection_of_fed_forward_a_l"]
collection_of_fed_forward_z_l=out["collection_of_fed_forward_z_l"]
weights=out["weights"]
biases=out["biases"]
NeuralNetwork=out["NeuralNetwork"]

a_L_set = Y_set
# weights = [w_2,w_3,w_4]
# biases = [b_2,b_3,b_4]


nu = nn.NetworkUpdater()
nu.set_settings(method="RegularStochastic",
		method_specific_settings={
		"batch_size":4,
		"no_of_epoch":1,
		"shuffle_batch":True,
		})
nu.set_training_data(input_set,Y0_set)
nu.set_neural_network(NeuralNetwork) 

L = len(weights) + 1
n = len(input_set)
AF = nn.activationFunction(func = "Sigmoid")
start = time.time()
weights_next, biases_next = nu.update_wb_regular_stochastic(input_set, Y0_set,
		weights, biases, AF,
		verbose=31,
		verbose_feed_forward=False,
		verbose_compute_diff=False,
		verbose_delta_L=False,
		verbose_compute_delta=False)

end = time.time()
elapsed = end - start

print("weights and biases:")
for W, B in zip(weights_next, biases_next):
	print(" > ", np.matrix(W).shape," | ", np.matrix(B).shape)


print("n (no of data points) = ",n)
print("time taken [s] = ", elapsed)
print("time taken at 10 k steps [s] = ", elapsed*1e4)
print("time taken at 10 k steps [min] = ", elapsed*1e4/(60))
print("time taken at 10 k steps [hr] = ", elapsed*1e4/(3600))
print("time taken at 10 k steps [day] = ", elapsed*1e4/(3600*24))

The output is the following. At high verbose level, this function also prints the layer and its corresponding index in the list (for detailed debugging, if you are interested in tweaking the source code). We show that the dimensions of weight matrices and biases are indeed correct, as shown under “weights and biases”. The time taken in seconds, minutes, hours and days are computed should the process is prolonged 10,000 k, just a rough estimate for scaling up the use of this function.

---aux---
Initializing a Neural Network object.
--- test2B ---
Initializing a Neural Network object.
 -+ update_wb_regular_stochastic().
    Layer (Output):  4  || i =  0 / 2
    Layer:  3  || i =  1 / 2
    Layer:  2  || i =  2 / 2
weights and biases:
 >  (3, 3)  |  (3, 1)
 >  (2, 3)  |  (2, 1)
 >  (2, 2)  |  (2, 1)
n (no of data points) =  24
time taken [s] =  0.015958786010742188
time taken at 10 k steps [s] =  159.58786010742188
time taken at 10 k steps [min] =  2.6597976684570312
time taken at 10 k steps [hr] =  0.04432996114095052
time taken at 10 k steps [day] =  0.0018470817142062718

kero version: 0.6.2

compute_delta_l_per_data_point()

home > kero > Documentation

Compute the value of \delta^l shown in Neural Network and Back Propagation.

kero.multib.NeuralNetwork.py

class NetworkUpdater:
  def compute_delta_l_per_data_point(self, w_l_plus_1, delta_l_plus_1, z_l, AF,
      verbose=False,
      print_format="%6.8f"):
    return delta_l

Arguments/Return

w_l_plus_1 numpy matrix. Matrix of size m x n, where m and n are the number of neurons in the (l+1)-th and l-th layers respectively. In the neural network, this is the weights between layer l and layer l+1.
delta_l_plus_1 numpy matrix. delta value from layer l+1. We are back-propagating using this function.
z_l numpy matrix. Vector of size m x 1, where m is the number of neurons in layer l. In the neural network this is the values at layer l before activation function.
AF AF (activationFunction). Assume it is initiated.
verbose Bool False or integer

The larger the integer, the more information is printed. Set them to suitable integers for debugging.

Default=False

print_format String. Format for printing numpy matrices when verbose is beyond some value.

Default=”%6.8f”

return delta_l numpy matrix. Vector of size m x 1 where m is the number of neurons in layer l.

Example Usage 1

See compute_delta_L_per_data_point().

kero version: 0.6.2

compute_delta_L_per_data_point()

home > kero > Documentation

Compute the value of \delta^L shown in Neural Network and Back Propagation.

kero.multib.NeuralNetwork.py

class NetworkUpdater:
  def compute_delta_L_per_data_point(self, z_L, Y0, Y_set, Y0_set, AF,
      verbose=False,
      print_format="%6.8f"):
    return delta_L

Arguments/Return

z_L numpy matrix. Vector of size m x 1, where m is the number of neurons in the output layer. In the neural network this is the values at the output layer before activation function.
Y0 numpy matrix. Observed/true output data. Vector of size m x 1, where m is the number of neurons in the output layer.
Y_set List of numpy matrix, [Y]. Each numpy matrix Y is a column vector. In a neural network, this is the values at the output layer predicted by the neural network.
Y0_set List of numpy matrix, [Y0]. Each numpy matrix Y is a column vector. There should be equal number of Y in Y_set as Y0 in Y0_set.
AF AF (activationFunction). Assume it is initiated.
verbose Bool False or integer

The larger the integer, the more information is printed. Set them to suitable integers for debugging.

Default=False

print_format String. Format for printing numpy matrices when verbose is beyond some value.

Default=”%6.8f”

return delta_L numpy matrix. Vector of size m x 1 where m is the number of neurons in layer L or output layer.

Example Usage 1

Note that in order to use the following script, we need prep() function which is available in Deep Learning and Neural Network with kero PART 1.

testNNupdater2A.py

import testNNupdater2aux as taux
import kero.multib.NeuralNetwork as nn
import kero.utils.utils as ut
import numpy as np

print("--- test2A ---")

# input_set : list of numpy matrix. 
# Y_set : list of numpy matrix. Output computed by NN
# Y0_set : list of numpy matrix. True/observed output
#  the grand objective is to train NN so that Y_set is equal to Y0_set 
# -------------------------------------------
# this is a collection of a_l_set and z_l_sedbt over all data points
#   z_l_set is the collection of z values over all layers, l=2,3,...L
#   and a_l_set is the corresponding activated values
#   Recall: a_l_set and z_l_set each is a list of numpy matrices 

out = taux.prep(print_data=False)
input_set=out["input_set"]
Y_set=out["Y_set"]
Y0_set=out["Y0_set"]
collection_of_fed_forward_a_l=out["collection_of_fed_forward_a_l"]
collection_of_fed_forward_z_l=out["collection_of_fed_forward_z_l"]
weights=out["weights"]
biases=out["biases"]
NeuralNetwork=out["NeuralNetwork"]

a_L_set = Y_set
# weights = [w_2,w_3,w_4]
# biases = [b_2,b_3,b_4]


nu = nn.NetworkUpdater()
AF = nn.activationFunction(func = "Sigmoid")
for i in range(len(input_set)): # for each data point
	z_l_set = collection_of_fed_forward_z_l[i]

	print("data point ", i," : ")
	# print("  z_L:")
	# ut.print_numpy_matrix(z_L,formatting="%6.2f",no_of_space=10)
	# print("  Y0:")
	# ut.print_numpy_matrix(Y0,formatting="%6.2f",no_of_space=10)
	
	# l = L = 4
	z_L = z_l_set[-1 - 0]
	Y0 = Y0_set[i]
	delta_L = nu.compute_delta_L_per_data_point(z_L, Y0, a_L_set, Y0_set, AF,
			verbose=11,
			print_format="%6.8f")

	# l = 3
	delta_l_plus_1 = delta_L
	w_l_plus_1 = weights[- 1 -0] # w_4
	z_l = z_l_set[-1 -1]
	delta_l = nu.compute_delta_l_per_data_point(w_l_plus_1, delta_l_plus_1, z_l, AF,
		verbose=11,
		print_format="%6.8f")

	# l = 2
	delta_l_plus_1 = delta_l
	w_l_plus_1 = weights[- 1 -1] # w_4
	z_l = z_l_set[-1 -2]
	delta_l = nu.compute_delta_l_per_data_point(w_l_plus_1, delta_l_plus_1, z_l, AF,
		verbose=11,
		print_format="%6.8f")
	# print("------------------------------------------")

We have prepared 24 data points using prep(), and the following shows the output for the first 2 data points.

---aux---
Initializing a Neural Network object.
--- test2A ---
Initializing a Neural Network object.
data point  0  :
 -+ compute_delta_L_per_data_point()
    delta_L =
          0.11825302
          0.12284915
 -+ compute_differential_term_at_l().
    np.array(w_l_plus_1).shape =  (2, 2)
    delta_l =
          0.00598236
          0.00598236
 -+ compute_differential_term_at_l().
    np.array(w_l_plus_1).shape =  (2, 3)
    delta_l =
          0.00029088
          0.00025446
          0.00029721
data point  1  :
 -+ compute_delta_L_per_data_point()
    delta_L =
          0.11825325
          0.12284961
 -+ compute_differential_term_at_l().
    np.array(w_l_plus_1).shape =  (2, 2)
    delta_l =
          0.00598275
          0.00598275
 -+ compute_differential_term_at_l().
    np.array(w_l_plus_1).shape =  (2, 3)
    delta_l =
          0.00029146
          0.00025704
          0.00029734
...

kero version: 0.6.3

select_activation_function()

home > kero > Documentation

kero.multib.NeuralNetwork.py

class activationFunction:
  def select_activation_function(self, func):
    return

Arguments/Return

func String.

If func==”Sigmoid”
Property af of activationFunction set to \sigma(x)=\frac{1}{1+e^{-x}}
Property afp of activationFunction set to \sigma^\prime (x) = \sigma(x) \times (1-\sigma(x))

add more variety of activation functions here

For other unlisted selection of func, then the activation function will be automatically set to “Sigmoid”.

Example Usage 1

See deep learning with kero part 1 and part 2.

kero version: 0.6.2 

activationFunction

home > kero > Documentation

This class of object is designed to contain a variety of activation functions.

kero.multib.NeuralNetwork.py

class activationFunction:
  def __init__(self, func = "Sigmoid"):
    self.af
    self.afp
    return
  def select_activation_function(self, func):
    return
Properties Description
af lambda function. The activation function.
afp lambda function. The derivative of activation function.

kero version: 0.6.2 

MSE()

home > kero > Documentation

MSE(X,Y)=\frac{1}{2n}\Sigma_{i=1}^{n} (X_i-Y_i)^2

kero.multib.NeuralNetwork.py

def MSE(Y_set,Y0_set):
  return mse

Arguments/Return

Y_set List of numpy matrix, [Y]. Each numpy matrix Y is a column vector
Y0_set List of numpy matrix, [Y0]. Each numpy matrix Y is a column vector. There should be equal number of Y in Y_set as Y0 in Y0_set
mse Mean squared values computed by the formula above.

Example Usage 1

testMSE.py

Y0_set= [
  [1,2,3],
  [4,5,6]
]

Y_set = [[1.1*x for x in y] for y in Y0_set]
print("Y0_set=",Y0_set)
print("Y_set=",Y_set)

import numpy as np
# convert to list of numpy matrix
Y0_set = [np.transpose(np.matrix(x)) for x in Y0_set]
Y_set = [np.transpose(np.matrix(x)) for x in Y_set]

import kero.multib.NeuralNetwork as nn

mse=nn.MSE(Y_set,Y0_set)
print("MSE test:\n  mse=",mse)

# manual computation
mse_= 0.01*(1**2+2**2+3**2+4**2+5**2+6**2)
mse_=mse_/(2*2)
print("  compare: ",mse_)

import kero.utils.utils as ut
print("nabla MSE test:\n  nabla mse = ")

nabla_mse = nn.nabla_MSE(Y_set,Y0_set)
ut.print_numpy_matrix(nabla_mse,formatting="%6.2f",no_of_space=5)
nabla_mse_ = 1/2* ((Y_set[0]-Y0_set[0])+(Y_set[1]-Y0_set[1]))
print("  compare: ")
ut.print_numpy_matrix(nabla_mse_,formatting="%6.2f",no_of_space=5)

This will print

Y0_set= [[1, 2, 3], [4, 5, 6]]
Y_set= [[1.1, 2.2, 3.3000000000000003], [4.4, 5.5, 6.6000000000000005]]
MSE test:
  mse= 0.22750000000000029
  compare:  0.2275
nabla MSE test:
  nabla mse =
       0.25
       0.35
       0.45
  compare:
       0.25
       0.35
       0.45

kero version: 0.6.2