When I studied neural networks during my undergraduate, they didn’t receive the focus that they enjoy now, becoming synonymous with artificial intelligence (AI). Loosely inspired by the workings of the brain, these statistical models were conceived by back in the 1940s for classifying data. But then they were underwent the so-called AI winter, receiving little attention from the broader research community, except for some notable exceptions.
But now neural networks are back with gusto under the term deep learning. To an outsider, which is most of the world, this term sounds perhaps a bit mysterious. To dispel all the mystery and shed light on these now ubiquitous models, Catherine F. Higham and Desmond J. Higham recently wrote the tutorial:
- 2019 – Higham and Higham, Deep Learning: An Introduction for Applied Mathematicians.
Recommended
I recommend this paper if you want understand the basics of deep learning. It looks at a simple feedforward (neural) network, which is also called a multilayer perceptron, but the paper uses neither term. This is an unadorned neural network without any of the fancy bells and whistles that they now possess.
And that’s all you need to understand the general gist of deep learning.
What the paper covers
When building up a neural network, a so-called activation function is needed. The working of the neural network involves this function being typically applied again and again (and again) to matrices.
The paper goes through the essential (for training neural networks) procedure of backpropagation, showing that it’s a clever and compact way based on the chain rule in calculus, for getting the derivatives of this model. These derivatives are then used in the gradient-based optimization method for fitting the model. (Optimizing functions and fitting statistical models often results imply the same thing.)
Obviously written for (applied) mathematicians, the paper attempts to clarify some of the confusing terms, which have arisen because much of AI and machine learning in general have been developed by computer scientists. For example, what is commonly called the sigmoid function may be better known as the logistic function. And what is linear in machine learning land is often actually affine.
Worked example with code
The paper includes a worked example with code (in MATLAB) of a simple 4-layer feedforward network (or perceptron). This model is then fitted or trained using a simple stochastic gradient descent method, hence the need for derivatives. For training, the so-called cost function is a root-square function, but now most neural networks use cost functions based on maximum likelihoods. For the activation function, the papers uses the sigmoid function, but often practitioners use the the recitified linear unit (ReLU) activation function is often used.
The problem is a simple binary classification problem, identifying in which of the two regions points lie in a two-dimensional (Cartesian) square.
In the code, the neural network is hard coded, so if you want to modify the structure of the network, you’ll have to work a bit. Such coding practices are usually frowned upon, but the authors have done so for brevity and clarity purposes.
The rest of the paper
The next section of the paper involves using a pre-written MATLAB library (but not an official one) that applies a convolution neural network. (The word convolution is used, but the neural networks actually hinge upon filters.) These networks are become essential for treating image data, being now found on (smart)phones and computers for identifying contents of photos. There is less to gain here in terms of intuition on how neural networks work.
The last section of the paper details what the authors didn’t cover, which includes regularization methods (to prevent overfitting) and why the steepest descent method works, despite it being advised against outside of this field.
Code one up yourself
If you want to learning how these networks work, I would strongly suggest coding up one yourself (preferably first without looking at the code given by Higham and Higham).
Of course if you want to develop a good, functioning neural network, you wouldn’t use the code here. There are libraries such as TensorFlow or (Py)Torch.
Further reading
There is just too much literature, especially on the internet, covering this topic. For a readable (lots of words), recent book, I recommend this book:
- Goodfellow, Bengio, Courville – Deep Learning – MIT Press.
As I said, the book is recent, being published in 2016. It’s a fast (too fast?) moving field, but this text gets you up to speed with the main research topics in neural learning — sorry — deep learning.