Convolutional Neural Network with VGG Architecture

home > ML Concepts

This post seeks to illustrate the difference between Convolutional Neural Network (CNN) and deep neural network (DNN) and hopes to add a little bit more clarity in the CNN process.


Figure 1. Comparison between a regular deep neural network with the convolutional neural network.

CNN is a variant of DNN with the constraint that the input is an image, or image-like. More technically,

  1. In a DNN, every layer is fully connected to the layer before. This means every neuron in layer n+1 is affected by every neuron in layer n via linear combination (with bias, and then activation).
  2. For CNN, such as in VGG architecture, only a few layers near the end are fully connected. Using the receptive field, a single neuron in layer n+1 is connected to some small number of neurons in the previous layer, corresponding to a small region in the visual space.

The above is illustrated in figure 1. In DNN, each neuron is fully connected to the image. With a large number of neurons, there will be a large number of weights to compute, not to mention that there will be a lot more weights between neurons from one layer to the next. In CNN, on the other hand, each neuron will be “in charge of” a small region in the image.

You might have seen the illustration for VGG architecture like figure 2 (I took the images from here; do visit the original sources of the image). VGG is an implementation of CNN by the Visual Geometry Group, Oxford (official link here). Figure 3 illustrates the process of convolution in the first layer, while figure 4 illustrates the process through fully connected layers. In essence, both are performing linear sums weighted by filters in figure 3 and the usual weights in figure 4 (like DNN) respectively.


Figure 2. VGG architecture.

conv layer

Figure 3. Illustration for convolutional layer.fc layer

Figure 4. Illustration for fully connected layers.