Chapter 6
Are you wondering how the module in chapter 0 works? Harness the knowledge you have acquired so far and let's take a look at how the interactive module in Chapter 0 works to recognize handwritten digits. Keep scrolling!
Scroll
In this chapter, we will explain how to make a practical network that will recognize handwritten digits, just like the one in Chapter 0. We will be using the MNIST dataset, a dataset of handwritten images from computer scientist Yann LeCun's website.
Each of the images from the MNIST dataset is a 28 pixel by 28 pixel image. These images will be used as training data to teach our feedforward neural network how to recognize handwritten digits.
Each pixel has a value from 0 to 1 to indicate how white the pixel is. 0 is black, 1 is white, and anywhere between them is a shade of gray. These 28 × 28 = 784 pixels are flattened into a single array to be fed into the input layer.
The activations from the input layer are propagated to the 500 nodes in the hidden layer, which transforms it to a new 500-number representation of the input. For a multilayer network to work well, we need to break linearity, and thus we use ReLU as an activation function to achieve that.
These 500 nodes feed their outputs to a final output layer of 10 nodes. We will interpret the outputs as being the probability that the input is each of the ten possible digits. We use softmax activation function on the output layer to do so.
Finally, the output layer will also have 10 nodes, each corresponding to a digit from 0 to 9.
Because of softmax, each final output node now contains a probability value that represents how likely the feedforward neural network thinks the input image represents the corresponding digit.
The highest output value (probability) signals the digit that the model thinks is the most probable; that digit is the final output classification of the input handwritten digit.
For training, we split the set of images from MNIST into batches of 100. During training, the model uses the current weights and biases to generate its best guess. This process is called forward propagation.
After seeing the answer and computing the loss, the training algorithm goes back into the network and updates the weights and biases using gradients to get closer to the ground truth outputs. This process is called backpropagation.
To train the model, we would need to first initialize its parameters (weights and biases). In this particular case, we will initialize them to random values clustered around 0, which is usually a good starting point.
To train the network, we are using Stochastic Gradient Descent (SGD) with a learning rate of 0.7. This learning rate was chosen through experiment, and it seemed to work best for our network. Experts often try out a few different values before settling on a reasonable one.
After going over the entire dataset for 2 to 3 times, our model is good enough to be used. It is able to achieve around 97% accuracy with only 2 runs over the dataset. On an average laptop, the training takes less than a minute.
This is the feedforward neural network that we used in our interactive module in
Chapter 0. For those who
want to experiment with the model more in depth, we have written easy-to-understand
code that you can play around with online.
Get Coding
In summary, knowledge gained in previous chapters was applied to build a functional neural network that can recognize handwritten digits. There are much more powerful things that AI can do. We hope that these chapters can serve as a starting point for you to explore more in the world of AI!
MNIST Dataset
Deep Learning: An MIT Press Book
A Neural Network Playground
Softmax: A hyped-up version of sigmoid that generates
the probabilities of a set of categories.
Forward Propagation: The process of making the neural
network generate a prediction based on input.
Backpropagation: The process of adjusting parameters
to improve the performance of neural networks.