The review paper A Survey on Explainable Artificial Intelligence (XAI): Towards Medical XAI is up on the arxiv! Enjoy~

# Category: Data Science

Here is the repository for pre-processing of some medical imaging data.

## 1. Attention-Gated-Networks_auxiliary

Working on Attention-Gated-Networks from https://github.com/ozan-oktay/Attention-Gated-Networks, you might want to run the algorithm on the original datasets. One of them is PANCREAS CT-82 which can be found from https://wiki.cancerimagingarchive.net/display/Public/Pancreas-CT.

The .dcm files store 2D slices. This code combines (and normalize) the slices of each patient into 3D volume. Each patient is coded as PANCREAS_0001, PANCREAS_0002 etc.

## 2. Multiv

While working on medical images, for example in NIFTI formats, we might face memory problem. This is because a NIFTI volume might come in large sizes, for example 192x192x19 with many modalities. With large convolutional neural network, feeding the entire volume may result in out of memory error (at least my measly 4GB RAM does. Multi-view sampling is the way out of this. Using multi-view sampling, slices of the images (green rectangles) are sampled. The “multi” part of the multi-view can take the form of larger slice (red rectangles).

The following link is the extended summary for the machine learning course. Below a preview is shown.

In this post we use Convolutional Neural Network, with VGG-like convnet structure for MNIST problem: i.e. we train the model to recognize hand-written digits. We mainly follow the official keras guide, in this link.

Download MNIST file that has been converted into CSV form; I got it from this link.

The jupyter notebook detailing the attempt in this post is found **here **by the name **keras_test2.ipynb**.

**Starting with a conclusion**: it works pretty well, for a very quick training, the model can recognize hand-written digit with 98% accuracy.

As shown below, our input is 28 by 28 with 1 channel (1 color), since the hand-written digit is stored in a 28 by 28-pixel greyscale image. The layers used are

- 2x 2D convolutional layers with 32x 3 by 3 filters followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used; this is to prevent over-fitting.
- 2x 2D convolutional layers with 64x 3 by 3 filters followed by max pooling for each 2 by 2 block of pixels. Then dropout layer is used.
- Flatten layer just reshapes 2D image-like output from the previous layer to a 1D list of values. The first denses layer has 256 neurons, followed by dropout layer and finally a dense layer of 10 neurons corresponding to 10 classes or 10 different digits in MNIST. All activation functions are ReLu except the last one, softmax, as usual.

See the link here on how the data is prepared for training (i.e. the missing code shown as … partial code… below).

# ... partial code ... model = Sequential() # input: 28x28 images with 1 channels -> (28 ,28, 1) tensors. model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28,28,1))) model.add(Conv2D(32, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(10, activation='softmax')) sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy']) model.fit(x_train, y_train, batch_size=16, epochs=10) model.evaluate(x_test, y_test, batch_size=32)

For a quick training, this model obtains a very high accuracy of 0.98, as shown below.

Epoch 1/10 6400/6400 [==============================] - 6s 904us/step - loss: 0.8208 - acc: 0.7206 Epoch 2/10 6400/6400 [==============================] - 2s 379us/step - loss: 0.2427 - acc: 0.9266 Epoch 3/10 6400/6400 [==============================] - 2s 379us/step - loss: 0.1702 - acc: 0.9483 Epoch 4/10 6400/6400 [==============================] - 2s 380us/step - loss: 0.1353 - acc: 0.9589 Epoch 5/10 6400/6400 [==============================] - 2s 373us/step - loss: 0.1117 - acc: 0.9650 Epoch 6/10 6400/6400 [==============================] - 2s 379us/step - loss: 0.1080 - acc: 0.9697 Epoch 7/10 6400/6400 [==============================] - 2s 374us/step - loss: 0.0881 - acc: 0.9734 Epoch 8/10 6400/6400 [==============================] - 2s 375us/step - loss: 0.0880 - acc: 0.9736 1s - los Epoch 9/10 6400/6400 [==============================] - 2s 377us/step - loss: 0.0690 - acc: 0.9766 Epoch 10/10 6400/6400 [==============================] - 2s 373us/step - loss: 0.0686 - acc: 0.9800 100/100 [==============================] - 0s 940us/step

The problem we try to solve here is the **remainder problem**. We train our neural network to find the remainder of a number randomly drawn from 0 to 99 inclusive when it is divided by 17. For example, given 20, the remainder is 3.

The code (in Jupyter notebook) detailing the results of this post can be found **here **by the name **keras_test1.ipynb**. In all the tests, we use only 1 hidden layers made of 64 neurons and different input and output layers to take into account the context of the problem. With the context taken into account, we show that we can help the neural network model train better!

**Test 1A and Test 1B**

*Note: See the corresponding sections in the Jupyter notebook.*

We start with a much simpler problem. Draw a random number from 0 to 10 inclusive. We find their remainders when divided by 10, which is quite trivial. From **test 1A**, with 4 epochs, we see a steady improvement in prediction accuracy up to 82%. With 12 epochs in **test 1B**, our accuracy is approximately 100%. Good!

**Test 2A and Test 2B**

Now, we raise the hurdle. We draw wider range of random numbers, from 0 to 99 inclusive. To be fair we give the neural network more data points for training. We get pretty bad outcome; the trained model in **test 2A** suffers the problem of predicting only 1 outcome (it always predicts the remainder is 0). In **test 2B**, we perform the same training, but for longer epochs. The problem still occurs.

**Test 3A**

Now we solve the problem in test 2A and 2B by contextualizing the problem. Notice that in test 1A, 1B, 2A and 2B, there is only 1 input (i.e. 1 neuron in the input layer) which exactly corresponds to the random number whose remainder is to be computed.

Now, in this test, we convert it into 2 inputs, splitting the unit and tenth digits. For example, if the number is 64, the input to our neural network is now (6,4). If the number is 5, then it becomes (0,5). This is done using **extract_digit()** function. The possible “concept” that the neural network can learn is the fact that for division by 10, only the last digit matters. That is to say, if our input is (a,b) after the conversion, then only b matters.

What do we get? 100% accuracy! All is good.

**Test 3B**

Finally, we raise the complexity and solve our original problem. We draw from 0 to 99 inclusive, and find the remainder from division with 17. We use **extract_digit()** function here as well. Running it over 24 epochs, we get an accuracy of 96% (and it does look like it can be improved)!

**Conclusion**? First thing first, this is just a demonstration of neural network using keras. But more importantly, contextualizing the input does help!

The code for Test3B can be found in the following.

[1]

import numpy as np from keras.models import Sequential from keras.layers import Dense

[2]

N = 100 D = 17 def simple_binarizer17(y, bin_factor=1, bin_shift=0): out = [0+bin_shift]*17 out[y] = 1*bin_factor return out def extract_digit(x): b = x%10 a = (x-b)/10 return [int(a),int(b)] X0_train = np.random.randint(N+1,size=(256000,1)) Y_train = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_train).tolist()[0]]) X0_test = np.random.randint(N+1,size=(100,1)) Y_test = np.array([simple_binarizer17(x%D) for x in np.transpose(X0_test).tolist()[0]]) X_train = np.array([extract_digit(X[0]) for X in X0_train]) X_test = np.array([extract_digit(X[0]) for X in X0_test]) for X0,X in zip(X0_train[:10],X_train[:10]): print(X0,"->",X)

[3]

model = Sequential() model.add(Dense(units=64, activation='relu', input_dim=2)) model.add(Dense(units=17, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy']) model.fit(X_train, Y_train, epochs=24, batch_size=32) loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=10) print("--LOSS and METRIC--") print(loss_and_metrics) print("--PREDICT--") classes = model.predict(X_test, batch_size=16)

[4]

count = 0 correct_count = 0 for y0,y in zip(Y_test,classes): count = count+1 correct_pred = False if np.argmax(y0)==np.argmax(y): correct_pred = True correct_count = correct_count + 1 if count<20: print(np.argmax(y0),"->",np.argmax(y), "(",correct_pred,")") accuracy = correct_count/len(Y_test) print("accuracy = ", accuracy)

To test MNIST using kero 0.6.3, I will use jupyter notebook in a virtual environment. Also, in this folder, place **adhoc_utils.py** containing the function read_csv() from here. I will use virtualenv as usual: see here. Then after activating the virtual environment, simply:

pip install jupyter pip install kero pip install matplotlib pip install opencv-python jupyter notebook

Download MNIST file that has been converted into CSV form; I got it from this link. Now, create the python notebook mnist_dnn.ipynb (see below) and run all the cells**. **You can find this test run and similar test runs here.

Unfortunately, it appears that the trained models only predict one single output for any input (it predicts only 6 for any image in one of the attempts, which is bad). Several possible issues and remarks include the following.

- There might be defective data points.
: not likely, it is easy to check it with tested machine learning algorithm. I tried using keras on the same data here; training and prediction have been successful.*Update* - Different loss functions are more suitable, check out, for example, KL divergence.
: this is certainly more than meets the eye. See a tutorial from Stanford here. Using MSE, which is L2, appears to be harder to optimize. Use instead L1 norms like cross-entropy loss.*Update* - This example uses no softmax layer at the end; in fact, using default Neural Network from kero, the final layer is activated using the same activation function (in this example, sigmoid function) as other layers. The maximum value at the output layer is taken as the predicted output.
- DNN has been treated like a black box; nobody quite knows what happens throughout the process in a coherent manner; in fact it could be just that the randomly initialized weights before training were not chosen in the right range. This might be interesting to study in the future (hopefully the experts come out with new insights soon).

All the above said, the little modification I did (before softmax) includes initiating all biases to zero instead of random and allow for options to generate random weights in a normalized manner (that depend on the number of neurons). I might change the interface a little, but in any case, seems like there might be more works to do! That’s all for now, happy new year!

**mnist_dnn.ipynb**

[1]

import numpy as np import adhoc_utils as Aut import matplotlib.pyplot as plt import cv2, time import kero.multib.NeuralNetwork as nn import kero.utils.utils as ut

[2]

# Loading MNIST image data from csv files. # Also, binarize labels. # # def simple_binarizer(mnist_label, bin_factor=1, bin_shift=0): # mnist_label: int, 0,1,2... or 9 out = [0+bin_shift]*10 out[mnist_label] = 1*bin_factor return np.transpose(np.matrix(out)) def convert_list_of_string_to_float(this_list): out = [] for i in range(len(this_list)): out.append(float(this_list[i])) return out bin_shift = 0 bin_factor = 1 img_width, img_height = 28, 28 pixel_normalizing_factor = 255 # read_csv returns list of list. # good news is, the loaded data is already flattened. mnist_train = Aut.read_csv("mnist_train", header_skip=1,get_the_first_N_rows = 6400) mnist_train_labels_binarized = [simple_binarizer(int(x[0]),bin_factor=bin_factor,bin_shift=bin_shift) for x in mnist_train] mnist_train_data = [1/pixel_normalizing_factor*np.transpose(np.matrix(convert_list_of_string_to_float(x[1:]))) for x in mnist_train] # # Uncomment this to print the binarized labels # for i in range(5): print(mnist_train[i][0] ,":",ut.numpy_matrix_to_list(mnist_train_labels_binarized[i]))

[3]

# Uncomment this to see the flattened image profile # # temp = mnist_train_data[0] # print("max = ", np.max(temp)) # print("min = ", np.min(temp)) # mean_val = np.mean(temp) # print("mean = ", mean_val) # fig0 = plt.figure() # ax0 = fig0.add_subplot(111) # ax0.plot(range(len(temp)),temp) # ax0.plot(range(len(temp)),[mean_val]*len(temp))

[4]

# To visualize the loaded data, uncomment and run this section. # # # mnist_train_labels = [x[0] for x in mnist_train] # mnist_train_data_image_form = [np.array(x[1:]).reshape(img_height,img_width).astype(np.uint8) for x in mnist_train] # data_length = len(mnist_train_data) # for i in range(10): # if i < data_length: # print(mnist_train_data_image_form[i].shape,end=",") # # # count=0 # title_set = [] # for label,img_data in zip(mnist_train_labels,mnist_train_data_image_form): # title = "count: "+str(count)+"| label: "+str(label) # title_set.append(title) # cv2.imshow(title, img_data) # cv2.resizeWindow(title, 300,300) # count = count + 1 # if count == 5: # break # cv2.waitKey(0) # for title in title_set: # cv2.destroyWindow(title)

[5]

# input_set: list of numpy matrix [x], # where each x is a column vector m by 1, m the size of input layer. # Y0_set: list of numpy matrix [Y0], # where each Y0 is a column vector N by 1, N the size of output layer. # This is equal to 10, since it corresponds to labels 0,1,...,9. # # input_set = mnist_train_data Y0_set = mnist_train_labels_binarized number_of_neurons = [784,28,10] lower_bound, upper_bound = 0 ,1 bounds = [lower_bound, upper_bound] bulk = { "number_of_neurons" : number_of_neurons, "bounds": bounds, "layerwise_normalization": True, } NeuralNetwork = nn.NeuralNetwork() NeuralNetwork.learning_rate = 1 NeuralNetwork.initiate_neural_network(bulk, mode="UniformRandom", verbose = False, verbose_init_mode=False, verbose_consistency=False) nu = nn.NetworkUpdater() nu.set_settings(method="RegularStochastic", method_specific_settings={ "batch_size":8, "no_of_epoch":32, "shuffle_batch":True, }) nu.set_training_data(input_set,Y0_set) nu.set_neural_network(NeuralNetwork)

[6]

AF = nn.activationFunction(func = "Sigmoid") start = time.time() weights_next, biases_next, mse_list = nu.update_wb(input_set, Y0_set, NeuralNetwork.weights, NeuralNetwork.biases, AF, mse_mode="compute_and_print", verbose=11) end = time.time() elapsed = end - start

[7]

print("epoch | mse value ") mark = 1 for i in range(len(mse_list)): if mark >= 0.1*len(mse_list) or i==0: print(" + epoch {",i ,"} ", mse_list[i]) mark = 1 else: mark = mark + 1 fig = plt.figure() ax1 = fig.add_subplot(211) plt.plot(range(len(mse_list)), mse_list)

[8]

print("time taken [s] = ", elapsed) print("time taken [min] = ", elapsed/60) print("time taken [hr] = ", elapsed/3600) print("time taken at 10k x [s] = ", elapsed*1e4) print("time taken at 10k x [min] = ", elapsed*1e4/(60)) print("time taken at 10k x [hr] = ", elapsed*1e4/(3600)) print("time taken at 10k x [day] = ", elapsed*1e4/(3600*24))

[9]

no_of_images_to_test=500 mnist_test = Aut.read_csv("mnist_test", header_skip=1,get_the_first_N_rows = no_of_images_to_test) mnist_test_labels = [int(x[0]) for x in mnist_test] mnist_test_data = [1/pixel_normalizing_factor*np.transpose(np.matrix(convert_list_of_string_to_float(x[1:]))) for x in mnist_test] hit_list = [] predict_list = [] predict_val = [] for i in range(no_of_images_to_test): a_1 = mnist_test_data[i] test_a_l_set, _ = nu.feed_forward(weights_next, biases_next, a_1, AF, verbose=False, matrix_formatting="%6.2f") Y_after = test_a_l_set[-1] predicted_label = int(np.argmax(Y_after)) actual_label= mnist_test_labels[i] # print(Y_after) # print("predicted vs actual = ", predicted_label,"/",actual_label) predict_list.append(predicted_label) predict_val.append(Y_after) if actual_label==predicted_label: hit_list.append(1) else: hit_list.append(0) print("predict list = ") print(predict_list) print("predict values = ") for i in range(10): # print(ut.numpy_matrix_to_list(predict_val[i])) ut.print_numpy_matrix(np.transpose(predict_val[i]),formatting="%9.6f",no_of_space=20) print("hit list = ") print(hit_list) print("percentage correct = ", 100* np.sum(hit_list)/len(hit_list))

*kero version 0.6.3*

This post seeks to illustrate the difference between Convolutional Neural Network (CNN) and deep neural network (DNN) and hopes to add a little bit more clarity in the CNN process.

Figure 1. Comparison between a regular deep neural network with the convolutional neural network.

CNN is a variant of DNN with the constraint that the input is an image, or image-like. More technically,

- In a DNN, every layer is fully connected to the layer before. This means every neuron in layer n+1 is affected by every neuron in layer n via linear combination (with bias, and then activation).
- For CNN, such as in VGG architecture, only a few layers near the end are fully connected. Using the receptive field, a single neuron in layer n+1 is connected to some small number of neurons in the previous layer, corresponding to a small region in the visual space.

The above is illustrated in figure 1. In DNN, each neuron is fully connected to the image. With a large number of neurons, there will be a large number of weights to compute, not to mention that there will be a lot more weights between neurons from one layer to the next. In CNN, on the other hand, each neuron will be “in charge of” a small region in the image.

You might have seen the illustration for VGG architecture like figure 2 (I took the images from here; do visit the original sources of the image). VGG is an implementation of CNN by the Visual Geometry Group, Oxford (official link here). Figure 3 illustrates the process of convolution in the first layer, while figure 4 illustrates the process through fully connected layers. In essence, both are performing linear sums weighted by filters in figure 3 and the usual weights in figure 4 (like DNN) respectively.

Figure 2. VGG architecture.

Figure 3. Illustration for convolutional layer.

Figure 4. Illustration for fully connected layers.

home > kero > Documentation

Perform regular stochastic mode of gradient descent. See Data Science, Review.

kero.multib.NeuralNetwork.py class NetworkUpdater: def update_wb_regular_stochastic(self, input_set, Y0_set, weights, biases, AF, verbose=False, verbose_feed_forward=False, verbose_compute_diff=False, verbose_delta_L=False, verbose_compute_delta=False) return weights_next, biases_next

**Arguments/Return**

input_set | list of numpy matrix [x]. Each x a column vector m x 1, m the number of neurons in input layer |

Y0_set | list of numpy matrix [Y0]. Each Y0 nx1, where n is the no of neurons in layer l=L. The true/observed values in the output layer corresponding to the input set. In another words, for each k=1,…,N, Y0_set[k] = f(x[k]) where f is the true function that our neural network is modelling and N the number of data points. |

weights | The collection of weights in the neural network.
weights is a list [w_l], where w_l is the collection of weights between the (l-1)-th and l-th layer, for l=2,3,…,L where l=1 is the input layer, l=2 the first hidden layer ad and l=L is the output layer. w_l is a matrix (list of list) so that w_l[i][j] is the weight between neuron j at layer l-1 and neuron i at layer l. |

biases | the collection of biases in the neural network.
biases is a list [b_l], where b_l is the collection of biases in the l-th layer for l=2,3,…,L |

AF | AF (activationFunction). Assume it is initiated. |

verbose verbose_feed_forward verbose_compute_diff verbose_delta_L verbose_compute_delta |
Bool False or integer
The larger the integer, the more information is printed. Set them to suitable integers for debugging.
All default=False |

return weights_next | Same as weights, but has undergone 1 gradient descent iteration. |

return biases_next | Same as biases, but has undergone 1 gradient descent iteration. |

**Example Usage 1**

Note that in order to use the following script, we need prep() function which is available in Deep Learning and Neural Network with kero PART 1.

testNNupdater2B.py

import testNNupdater2aux as taux import kero.multib.NeuralNetwork as nn import kero.utils.utils as ut import numpy as np import time print("--- test2B ---") # input_set : list of numpy matrix. # Y_set : list of numpy matrix. Output computed by NN # Y0_set : list of numpy matrix. True/observed output # the grand objective is to train NN so that Y_set is equal to Y0_set # ------------------------------------------- # this is a collection of a_l_set and z_l_set over all data points # z_l_set is the collection of z values over all layers, l=2,3,...L # and a_l_set is the corresponding activated values # Recall: a_l_set and z_l_set each is a list of numpy matrices out = taux.prep(print_data=False) input_set=out["input_set"] Y_set=out["Y_set"] Y0_set=out["Y0_set"] collection_of_fed_forward_a_l=out["collection_of_fed_forward_a_l"] collection_of_fed_forward_z_l=out["collection_of_fed_forward_z_l"] weights=out["weights"] biases=out["biases"] NeuralNetwork=out["NeuralNetwork"] a_L_set = Y_set # weights = [w_2,w_3,w_4] # biases = [b_2,b_3,b_4] nu = nn.NetworkUpdater() nu.set_settings(method="RegularStochastic", method_specific_settings={ "batch_size":4, "no_of_epoch":1, "shuffle_batch":True, }) nu.set_training_data(input_set,Y0_set) nu.set_neural_network(NeuralNetwork) L = len(weights) + 1 n = len(input_set) AF = nn.activationFunction(func = "Sigmoid") start = time.time() weights_next, biases_next = nu.update_wb_regular_stochastic(input_set, Y0_set, weights, biases, AF, verbose=31, verbose_feed_forward=False, verbose_compute_diff=False, verbose_delta_L=False, verbose_compute_delta=False) end = time.time() elapsed = end - start print("weights and biases:") for W, B in zip(weights_next, biases_next): print(" > ", np.matrix(W).shape," | ", np.matrix(B).shape) print("n (no of data points) = ",n) print("time taken [s] = ", elapsed) print("time taken at 10 k steps [s] = ", elapsed*1e4) print("time taken at 10 k steps [min] = ", elapsed*1e4/(60)) print("time taken at 10 k steps [hr] = ", elapsed*1e4/(3600)) print("time taken at 10 k steps [day] = ", elapsed*1e4/(3600*24))

The output is the following. At high verbose level, this function also prints the layer and its corresponding index in the list (for detailed debugging, if you are interested in tweaking the source code). We show that the dimensions of weight matrices and biases are indeed correct, as shown under “weights and biases”. The time taken in seconds, minutes, hours and days are computed should the process is prolonged 10,000 k, just a rough estimate for scaling up the use of this function.

---aux--- Initializing a Neural Network object. --- test2B --- Initializing a Neural Network object. -+ update_wb_regular_stochastic(). Layer (Output): 4 || i = 0 / 2 Layer: 3 || i = 1 / 2 Layer: 2 || i = 2 / 2 weights and biases: > (3, 3) | (3, 1) > (2, 3) | (2, 1) > (2, 2) | (2, 1) n (no of data points) = 24 time taken [s] = 0.015958786010742188 time taken at 10 k steps [s] = 159.58786010742188 time taken at 10 k steps [min] = 2.6597976684570312 time taken at 10 k steps [hr] = 0.04432996114095052 time taken at 10 k steps [day] = 0.0018470817142062718

*kero version: 0.6.2*

home > kero > Documentation

Compute the value of shown in Neural Network and Back Propagation.

kero.multib.NeuralNetwork.py class NetworkUpdater: def compute_delta_l_per_data_point(self, w_l_plus_1, delta_l_plus_1, z_l, AF, verbose=False, print_format="%6.8f"): return delta_l

**Arguments/Return**

w_l_plus_1 | numpy matrix. Matrix of size m x n, where m and n are the number of neurons in the (l+1)-th and l-th layers respectively. In the neural network, this is the weights between layer l and layer l+1. |

delta_l_plus_1 | numpy matrix. delta value from layer l+1. We are back-propagating using this function. |

z_l | numpy matrix. Vector of size m x 1, where m is the number of neurons in layer l. In the neural network this is the values at layer l before activation function. |

AF | AF (activationFunction). Assume it is initiated. |

verbose | Bool False or integer
The larger the integer, the more information is printed. Set them to suitable integers for debugging. Default=False |

print_format | String. Format for printing numpy matrices when verbose is beyond some value.
Default=”%6.8f” |

return delta_l | numpy matrix. Vector of size m x 1 where m is the number of neurons in layer l. |

**Example Usage 1**

See compute_delta_L_per_data_point().

*kero version: 0.6.2*

home > kero > Documentation

Compute the value of shown in Neural Network and Back Propagation.

kero.multib.NeuralNetwork.py class NetworkUpdater: def compute_delta_L_per_data_point(self, z_L, Y0, Y_set, Y0_set, AF, verbose=False, print_format="%6.8f"): return delta_L

**Arguments/Return**

z_L | numpy matrix. Vector of size m x 1, where m is the number of neurons in the output layer. In the neural network this is the values at the output layer before activation function. |

Y0 | numpy matrix. Observed/true output data. Vector of size m x 1, where m is the number of neurons in the output layer. |

Y_set | List of numpy matrix, [Y]. Each numpy matrix Y is a column vector. In a neural network, this is the values at the output layer predicted by the neural network. |

Y0_set | List of numpy matrix, [Y0]. Each numpy matrix Y is a column vector. There should be equal number of Y in Y_set as Y0 in Y0_set. |

AF | AF (activationFunction). Assume it is initiated. |

verbose | Bool False or integer
The larger the integer, the more information is printed. Set them to suitable integers for debugging. Default=False |

print_format | String. Format for printing numpy matrices when verbose is beyond some value.
Default=”%6.8f” |

return delta_L | numpy matrix. Vector of size m x 1 where m is the number of neurons in layer L or output layer. |

**Example Usage 1**

Note that in order to use the following script, we need prep() function which is available in Deep Learning and Neural Network with kero PART 1.

testNNupdater2A.py

import testNNupdater2aux as taux import kero.multib.NeuralNetwork as nn import kero.utils.utils as ut import numpy as np print("--- test2A ---") # input_set : list of numpy matrix. # Y_set : list of numpy matrix. Output computed by NN # Y0_set : list of numpy matrix. True/observed output # the grand objective is to train NN so that Y_set is equal to Y0_set # ------------------------------------------- # this is a collection of a_l_set and z_l_sedbt over all data points # z_l_set is the collection of z values over all layers, l=2,3,...L # and a_l_set is the corresponding activated values # Recall: a_l_set and z_l_set each is a list of numpy matrices out = taux.prep(print_data=False) input_set=out["input_set"] Y_set=out["Y_set"] Y0_set=out["Y0_set"] collection_of_fed_forward_a_l=out["collection_of_fed_forward_a_l"] collection_of_fed_forward_z_l=out["collection_of_fed_forward_z_l"] weights=out["weights"] biases=out["biases"] NeuralNetwork=out["NeuralNetwork"] a_L_set = Y_set # weights = [w_2,w_3,w_4] # biases = [b_2,b_3,b_4] nu = nn.NetworkUpdater() AF = nn.activationFunction(func = "Sigmoid") for i in range(len(input_set)): # for each data point z_l_set = collection_of_fed_forward_z_l[i] print("data point ", i," : ") # print(" z_L:") # ut.print_numpy_matrix(z_L,formatting="%6.2f",no_of_space=10) # print(" Y0:") # ut.print_numpy_matrix(Y0,formatting="%6.2f",no_of_space=10) # l = L = 4 z_L = z_l_set[-1 - 0] Y0 = Y0_set[i] delta_L = nu.compute_delta_L_per_data_point(z_L, Y0, a_L_set, Y0_set, AF, verbose=11, print_format="%6.8f") # l = 3 delta_l_plus_1 = delta_L w_l_plus_1 = weights[- 1 -0] # w_4 z_l = z_l_set[-1 -1] delta_l = nu.compute_delta_l_per_data_point(w_l_plus_1, delta_l_plus_1, z_l, AF, verbose=11, print_format="%6.8f") # l = 2 delta_l_plus_1 = delta_l w_l_plus_1 = weights[- 1 -1] # w_4 z_l = z_l_set[-1 -2] delta_l = nu.compute_delta_l_per_data_point(w_l_plus_1, delta_l_plus_1, z_l, AF, verbose=11, print_format="%6.8f") # print("------------------------------------------")

We have prepared 24 data points using prep(), and the following shows the output for the first 2 data points.

---aux--- Initializing a Neural Network object. --- test2A --- Initializing a Neural Network object. data point 0 : -+ compute_delta_L_per_data_point() delta_L = 0.11825302 0.12284915 -+ compute_differential_term_at_l(). np.array(w_l_plus_1).shape = (2, 2) delta_l = 0.00598236 0.00598236 -+ compute_differential_term_at_l(). np.array(w_l_plus_1).shape = (2, 3) delta_l = 0.00029088 0.00025446 0.00029721 data point 1 : -+ compute_delta_L_per_data_point() delta_L = 0.11825325 0.12284961 -+ compute_differential_term_at_l(). np.array(w_l_plus_1).shape = (2, 2) delta_l = 0.00598275 0.00598275 -+ compute_differential_term_at_l(). np.array(w_l_plus_1).shape = (2, 3) delta_l = 0.00029146 0.00025704 0.00029734 ...

*kero version: 0.6.3*