ML One
Lecture 07
Introduction to (artificial) neural network
+
Multi-Layer Perceptron
Welcome ๐ฉโ๐ค๐งโ๐ค๐จโ๐ค
By the end of this lecture, we'll have learnt about:
The theoretical:
- Neurons in biological neural network
- Neurons in artificial neural network (ANN)
- Neurons grouped in layers in artificial neural network
- Use function and matrix multiplication to describe what happens between layers in ANN
- A simple neural network - Multilayer Perceptron (MLP)
The practical:
- MLP implemented in python
First of all, don't forget to confirm your attendence on
Seats App!
Recap
Today we are going to see how dots (adding/multiplying matrices, functions) are connected!!!
Scalar, vector and matrix ๐งโ๐จ
- how to describe their shapes
-- number of rows x number of columns
Scalar, vector and matrix ๐งโ๐จ
- how to multiply a row vector and a column vector?
-- dot product which results in a scalar
-- the shape rule: these two vectors have to be of the same length.
Scalar, vector and matrix ๐งโ๐จ
- how to multiply two matrices?
-- the shape rule:
-- the shapes of the two matrices should be: M x K and K x N
-- the shape of the product matrix would be: M x N
Functions ๐งโ๐จ
- A function relates an input to an output.
-- Chain functions together to make a new function
-- Function graphs
-- Exp, sigmoid, quadratic, relu, sine, tanh (and they have characteristics)
Functions ๐งโ๐จ
-- Function graphs
-- Exp, sigmoid, quadratic, relu, sine, tanh
end of recap
Artificial neural network is fun, computationally capable and made up of smaller components including neurons.
We'll meet quite a few new terms today - they are easy concepts, just have faith in perceptual adaptation through repetition!
Let's forget about math for now
the story starts from real biological neuron (a simulation) ๐ค
As human we have roughly 86 (some said 100) billion neurons. A neuron is an electrically excitable cell that fires electric signals across a neural network. [wikipedia]
It is the fundamental unit of the brain and nervous system.
The cells are responsible for receiving sensory input, for sending motor commands to our muscles, and for transforming and relaying the electrical signals at every step in between.
Neurons are connected in some structure.
Connected neurons communicate with each other via electrical impulses. โก๏ธ
one neuron with dendrites, axon and transmitters
Think of your happiest moment in memory, and
this is probably what was going on in your brain during that moment.
Recap of the simulated neural process:
-- A neuron is charged by signals from other connected neurons.
-- We can refer to the level of accumulated charges in one neuron as its activation.
-- A neuron receives different levels of signals from different neurons.
-- Once a neuron is sufficiently charged, it fires off a charge to the next neurons.
The myth of grandma neuron โญ๏ธ:
A hypothetical neuron that has high activation
when a person "sees, hears, or otherwise sensibly discriminates" a specific entity, such as their grandmother.
But does the grandma neuron actually look like a grandma? ๐
Nope, the information it carries is encoded as its conditional activation,
which can be loosely depicted as a number which increases when you see your grandma and decreases when you doesn't see your grandma.
What are the mathsy parts in the neural process? ๐งฎ
Recap of the simulated neural process:
-- A neuron is charged by signals from other connected neurons.
-- We can refer to the level of accumulated charges in one neuron as its activation value.
-- There are usually different levels of signals emitted from different neurons.
-- Once a neuron is sufficiently charged, it fires off a signal to the next neurons.
let's do something quite interdisciplinary
--- extracting maths ideas from dat biology class --->
๐ก๐พ๐งช๐งฎ๐ก
Maths extraction 00
-- Numberify each neuron's activation:
a number that representing how much electrical charge a neuron receives and fires โ๏ธ
Maths extraction 01
-- View the charging process from arithemetic:
accumulation, addition โ
Maths extraction 02
-- A neuron is NOT firing immediately whatever charges it receives,
instead it waits till being sufficiently charged and firing:
a sense of thresholding ๐ช
hint hint function: relu, sigmoid
Maths extraction 03
-- A bird eye view of neuron connectivity:
there is a hierarchical process where neurons are both the receiver of preceding neurons and transmitter of next ones,
recall how function chaining works? It it routing one function's output to be the next function's input. โ๏ธ
Maths extraction 04
-- Numberify the connection strength:
Not every two neurons are connected with equal strength.
Perhaps we can use a number per each connection, referred to as weight, to depict the different stength? ๐
โ๏ธ๐ซ๏ธโ๏ธ
Introducing now: (artificial) neural networks, the anatomy
finally!!! ๐ฅ
โ ๏ธ Note:
For this lecture, we are not looking at what do the numbers actually mean and how to interpret them.
We are only looking at how this neural process portrayed computationally.
Today's road map:
1. Introduction to neurons: what does it do
2. Introduction to layers: what does it do and what's the computation underneath
3. Introduction to MLP: a combination of what we have introduced so far
Starting from (artificial) neuron {
One neuron holds a single number indicating its activation ๐ชซ๐
on whiteboard
Neurons are grouped in layers, reflecting the hierarchical structure (the order is from left to right) ๐๏ธ
let's draw another two layers of neurons because I want to
Connectivity between !consecutive! layers
ps: it is actually connectivity between neurons in !consecutive! layers
A neuron receives signals(numbers, or activations) from all neurons in the previous layer, let's draw out the links๐
and so does every single neurons!
โ ๏ธ note: neurons inside the same layer are NOT connected
Different connection strengths: every link indicates a different connection strength ๐
that is to say every link also indicates a number, let's call it a weight
Note that a weight is different from an activation that is stored in each neuron
One activation is contextualised in one single neuron,
whereas one weight is contextualised in the link between two connected neurons
/*end of (artificial) neuron */
}
Now that we know what neurons are and that they are grouped in layers ๐ฅฐ
let's look from the perspective of layers and build our first multilayer perceptron by hands ๐ค
layers and Multilayer perceptron MLP {
aka vanilla neural networks,
aka fully connected feedforward neural network, (don't memorise this, MLP sounds way cooler, but this lengthy name has some meanings we'll see shortly )
Let's contexturise MLP in an example image classification task.
summoning the "hello world" in ML: MNIST, handwritten digits recognition
It is a dataset comprising 28*28 images of handwritten digits. Each image is labelled with its digit class (from 0 to 9).
The task is to, take an input image and output its digit class.
Since an MLP is characterised by its layer types, let's introduce layers.
Through the lens of layers{
Neurons are holding numbers (activations), so in one layer there is a vertical layout of one column of numbers. Does this one vertical column of numbers sound familiar?
Neuron activations in one layer forms a vector, let's call this "layer vector" or "layer activation vector"
There are different types of layers:
First we have input and output layers
Input layer is where the input data is loaded (e.g. one neuron holds one pixel's grayscale value)
The number of neurons in input layer is pre-defined by the specific task and data.
๐ถ๏ธ๐ถ๏ธ๐ถ๏ธ How many neurons there should be in the input layer for MNIST (which is a dataset of 28*28 images)? hint: one neuron for one pixel
28*28 = 784
Because it has to be a vertical col vector for MLP, the flattening giant has stepped over...
For instance, a 2*2 image/matrix after flattening becomes a 4*1 col vector
๐ถ๏ธ๐ถ๏ธ What is the shape of the input layer vector in MNIST (which is a dataset of 28*28 images)?
784*1
๐ถ๏ธ๐ถ๏ธ๐ถ๏ธ What if we have a dataset of small images of size 20*20? How many neurons should we put in the input layer and what should the new shape of the input layer vector be?
400*1
Output layer is where the output is held
For classification tasks, the output is categorical and how do we encode categorical data?
One-hot encoding: it depends on how many classes are there
Another way to interpret one-hot encoding output: each neuron holds the "probability" of the output belonging to that class
It is just another number container anyway ๐คช
๐ถ๏ธ๐ถ๏ธ What is the shape of the output layer vector for MNIST?
10*1 (10 classes of digits)
๐ถ๏ธ๐ถ๏ธ๐ถ๏ธ What if my task changes to recognise if the digit is zero or non-zero? How many neurons should we put in the output layer and what should the new shape of the output layer vector be?
2*1 (only 2 classes of digits!)
The number of neurons in the input and output layer are determined by the task and the dataset.
Next:
Hidden layers: any layer inbetween input and output layers ๐
How many neurons should we put in each hidden layer? Is it pre-defined by the task and the dataset like input/output layers?
No, it is all up to you woo hoo ! It is part of the fun neural net designing process ๐ฅฐ
Here i choose...
Let's connect these layers following our previous connection rule: only consecutives layers are *directly* linked
ATTENTION ๐ฟ
The last piece of puzzle ๐งฉ
Recall the biological process of charging, accumulation and firing
Let's simulate the ANN process from biological analogies, from input layer to output layer
โ ๏ธ Note:
For this lecture, we are not looking at what do the numbers actually mean and how to interpret them.
We are only looking at how this neural process is portrayed computationally.
Recall that each link has a number ("weight") for connection strength
Activations in each layer's activation vector are computed using previous layer's activation vector and the corresponding connection weights
For example, to caculate the first neuron's activation in the first hidden layer: input layers' neurons activations multiplied with corresponding connection weights, and sums up
Wait did that look just like a dot product process? ๐ฟ
Indeed, we can simulate the "charging" and "accumulating" process using matrix multiplication โ๏ธ๐
A layer's weights matrix: made of weight of every connection link this layer has with *previous* layer ๐งฎ
Demonstration of first hidden layer's weights matrix multipled with input layer's activation vector on whiteboard ๐ค
๐ถ๏ธ๐ถ๏ธ๐ถ๏ธ๐ถ๏ธ What is the shape of the weights matrix?
๐ถ๏ธ๐ถ๏ธ๐ถ๏ธ๐ถ๏ธ What is the shape of the weights matrix?
# of neurons in THIS layer
x
# of neurons in PREVIOUS layer
For the "wait till sufficiently charging or thresholding" part, let's introduce bias vector and activation function โ๏ธ
๐ถ๏ธ๐ถ๏ธ Recall the graph of ReLU function, what is its activate zone? aka the range of input that relates to non-zero output
๐ถ๏ธ๐ถ๏ธ Recall the graph of ReLU function, what is its activate zone? aka the range of input that relates to non-zero output
from 0 to โ !
๐ถ๏ธ๐ถ๏ธ๐ถ๏ธ๐ถ๏ธ The effect of the bias vector:
What if I want to have an activate zone from 1 to โ?
Add a bias of -1 to the input.
(you can take some time ponder and wonder here later)
Clarify the input and output of the activation:
The raw activation vector after matrix multiplication and bias vector addition is the input to the activation function (e.g. ReLU).
The activation function output is the actual activation (fires to the next layer)
Demonstration of layer vector added with bias vector on whiteboard (adding or removing extra difficulty for neuron activation to reach "active zone" in activation function)
๐ถ๏ธ๐ถ๏ธ๐ถ๏ธ๐ถ๏ธ What is the shape of the bias vector?
# of neurons in THIS layer
x
1
Demonstration of layer vector wrapped with activation function on whiteboard
Puzzle almost finished!
/*end of through the lens of layers */
}
Let's write down what just happened using function expression ๐ค๐
1. wrap each layer's charging(aka weight matrix multiplication) and thresholding(bias vector and act. function) process as a function
-- function input: a vector, previous layer's activation vector
-- function output: a vector, this layer's activation vector
-- the function body: input multiplied by this layer's weights matrix, added with bias vector, wrapped with activation function
V_output =
ReLU(WeightsMat * V_input + Bias)
Next, how to connect different layers using function expression?
Function chaining!!! demonstration on whiteboard โโโ
Puzzle finished, recall that a model is roughly a big function?
MLP is a model, and a function. Let's writedown the final BIG function for this neural network
๐๐ฅ๐
Done! โ๏ธ๐ค
Note that I made up all the numbers in the weights matrices and bias vectors during demo ๐
In practice, these numbers are learned through training process.
We'll leave the training process to next week's lecture.
The process we talk about today with assumed weights matrices and bias vector is the forward pass of neural network.
aka how information(activations) is propagated from input to outputโฉ.
The training process will be about how information is propagated backwardsโช,
from ouput to input to find the proper numbers in weights matrices and bias vectors to make the neural network work.
That's quite a lot, congrats! ๐
Here is a
video from 3B1B that explains MLP from another perspective, very nice.
Next, we are going to:
- take a look at how MLP can be implemented in python with help from NumPy and Pytorch(a very popular deep learning library in Python)!
Alert: you are going to see quite advanced python and neural network programming stuff, we are not expected to understand them all at the moment.
Let's take a look at how some ideas we talk about today are reflected in the code,
especially how we set up a layer by specifying how many neurons it should have.
Let's take a look at the notebook!
- 1. Make sure you have saved a copy to your GDrive or opened in playground. ๐
- 2. Most parts are out of the range of the content we have covered so far.
- 3. We only need to take a look at the several lines in the "Defining the Model" section.
- 4. IMPORTANT: In practice, we just need to specify the number of neurons in each layer and all the computation is left to computers.
Today we have looked at:
- Neurons as a number container (activation)
- Neurons are grouped in layers
- Layers are connected Hierarchically (from left to righ)
- Input layer
- Output layer
- Hidden layer
- Weights matrix
- Bias vector
- Activation function
- Write the MLP into one big function
We'll see you next Thursday same time and same place!