Big

ML One
Lecture 07
Introduction to (artificial) neural network
+
Multi-Layer Perceptron

Welcome 👩‍🎤🧑‍🎤👨‍🎤

By the end of this lecture, we'll have learnt about:
The theoretical:
- Neurons in biological neural network
- Neurons in artificial neural network (ANN)
- Neurons grouped in layers in artificial neural network
- Use function and matrix multiplication to describe what happens between layers in ANN
- A simple neural network - Multilayer Perceptron (MLP)
The practical:
- MLP implemented in python

First of all, don't forget to confirm your attendence on Seats App!

introduing "repeated exposure": the amazingness in our built-in perceptual adaptation effect

Recap

Today we are going to see how dots (adding/multiplying matrices, functions) are connected!!!

Scalar, vector and matrix 🧑‍🎨
- how to describe their shapes
-- number of rows x number of columns

Scalar, vector and matrix 🧑‍🎨
- how to multiply a row vector and a column vector?
-- dot product which results in a scalar
-- the shape rule: these two vectors have to be of the same length.

Scalar, vector and matrix 🧑‍🎨
- how to multiply two matrices?
-- the shape rule:
-- the shapes of the two matrices should be: M x K and K x N
-- the shape of the product matrix would be: M x N

Functions 🧑‍🎨
- A function relates an input to an output.
-- Chain functions together to make a new function
-- Function graphs
-- Exp, sigmoid, quadratic, relu, sine, tanh (and they have characteristics)

Functions 🧑‍🎨
-- Function graphs
-- Exp, sigmoid, quadratic, relu, sine, tanh

end of recap

Artificial neural network is fun, computationally capable and made up of smaller components including neurons.

We'll meet quite a few new terms today - they are easy concepts, just have faith in perceptual adaptation through repetition!

Let's forget about math for now

the story starts from real biological neuron (a simulation) 🤘

As human we have roughly 86 (some said 100) billion neurons. A neuron is an electrically excitable cell that fires electric signals across a neural network. [wikipedia]

It is the fundamental unit of the brain and nervous system.
The cells are responsible for receiving sensory input, for sending motor commands to our muscles, and for transforming and relaying the electrical signals at every step in between.

Neurons are connected in some structure.
Connected neurons communicate with each other via electrical impulses. ⚡️

one neuron with dendrites, axon and transmitters

when did you last have a biology lesson?

Think of your happiest moment in memory, and this is probably what was going on in your brain during that moment.

Recap of the simulated neural process:
-- A neuron is charged by signals from other connected neurons.
-- We can refer to the level of accumulated charges in one neuron as its activation.
-- A neuron receives different levels of signals from different neurons.
-- Once a neuron is sufficiently charged, it fires off a charge to the next neurons.

The myth of grandma neuron ⭐️:
A hypothetical neuron that has high activation
when a person "sees, hears, or otherwise sensibly discriminates" a specific entity, such as their grandmother.

But does the grandma neuron actually look like a grandma? 😜
Nope, the information it carries is encoded as its conditional activation,
which can be loosely depicted as a number which increases when you see your grandma and decreases when you doesn't see your grandma.

What are the mathsy parts in the neural process? 🧮

Recap of the simulated neural process:
-- A neuron is charged by signals from other connected neurons.
-- We can refer to the level of accumulated charges in one neuron as its activation value.
-- There are usually different levels of signals emitted from different neurons.
-- Once a neuron is sufficiently charged, it fires off a signal to the next neurons.

let's do something quite interdisciplinary
--- extracting maths ideas from dat biology class ---> 💡👾🧪🧮💡

Maths extraction 00
-- Numberify each neuron's activation:
a number that representing how much electrical charge a neuron receives and fires ☝️

Maths extraction 01
-- View the charging process from arithemetic:
accumulation, addition ➕

Maths extraction 02
-- A neuron is NOT firing immediately whatever charges it receives,
instead it waits till being sufficiently charged and firing:
a sense of thresholding 🪜
hint hint function: relu, sigmoid

Maths extraction 03
-- A bird eye view of neuron connectivity:
there is a hierarchical process where neurons are both the receiver of preceding neurons and transmitter of next ones,
recall how function chaining works? It it routing one function's output to be the next function's input. ⛓️

Maths extraction 04
-- Numberify the connection strength:
Not every two neurons are connected with equal strength.
Perhaps we can use a number per each connection, referred to as weight, to depict the different stength? 🔋

☁️🌫️☁️

Introducing now: (artificial) neural networks, the anatomy
finally!!! 🔥

⚠️ Note:
For this lecture, we are not looking at what do the numbers actually mean and how to interpret them.
We are only looking at how this neural process portrayed computationally.

Today's road map:
1. Introduction to neurons: what does it do
2. Introduction to layers: what does it do and what's the computation underneath
3. Introduction to MLP: a combination of what we have introduced so far

Starting from (artificial) neuron {

One neuron holds a single number indicating its activation 🪫🔋

on whiteboard

Neurons are grouped in layers, reflecting the hierarchical structure (the order is from left to right) 🏘️

let's draw another two layers of neurons because I want to

Connectivity between !consecutive! layers

ps: it is actually connectivity between neurons in !consecutive! layers

A neuron receives signals(numbers, or activations) from all neurons in the previous layer, let's draw out the links🔗

and so does every single neurons!

⚠️ note: neurons inside the same layer are NOT connected

Different connection strengths: every link indicates a different connection strength 🔌

that is to say every link also indicates a number, let's call it a weight

Note that a weight is different from an activation that is stored in each neuron

One activation is contextualised in one single neuron,
whereas one weight is contextualised in the link between two connected neurons

/*end of (artificial) neuron */
}

Now that we know what neurons are and that they are grouped in layers 🥰

let's look from the perspective of layers and build our first multilayer perceptron by hands 🤑

layers and Multilayer perceptron MLP {

aka vanilla neural networks,
aka fully connected feedforward neural network, (don't memorise this, MLP sounds way cooler, but this lengthy name has some meanings we'll see shortly )

Let's contexturise MLP in an example image classification task.

summoning the "hello world" in ML: MNIST, handwritten digits recognition

It is a dataset comprising 28*28 images of handwritten digits. Each image is labelled with its digit class (from 0 to 9).

The task is to, take an input image and output its digit class.

Since an MLP is characterised by its layer types, let's introduce layers.

Through the lens of layers{

Neurons are holding numbers (activations), so in one layer there is a vertical layout of one column of numbers. Does this one vertical column of numbers sound familiar?

Neuron activations in one layer forms a vector, let's call this "layer vector" or "layer activation vector"

There are different types of layers:

First we have input and output layers

Input layer is where the input data is loaded (e.g. one neuron holds one pixel's grayscale value)

The number of neurons in input layer is pre-defined by the specific task and data.

🌶️🌶️🌶️ How many neurons there should be in the input layer for MNIST (which is a dataset of 28*28 images)? hint: one neuron for one pixel

28*28 = 784

Because it has to be a vertical col vector for MLP, the flattening giant has stepped over...

For instance, a 2*2 image/matrix after flattening becomes a 4*1 col vector

🌶️🌶️ What is the shape of the input layer vector in MNIST (which is a dataset of 28*28 images)?

784*1

🌶️🌶️🌶️ What if we have a dataset of small images of size 20*20? How many neurons should we put in the input layer and what should the new shape of the input layer vector be?

400*1

Output layer is where the output is held

For classification tasks, the output is categorical and how do we encode categorical data?

One-hot encoding: it depends on how many classes are there

Another way to interpret one-hot encoding output: each neuron holds the "probability" of the output belonging to that class

It is just another number container anyway 🤪

🌶️🌶️ What is the shape of the output layer vector for MNIST?

10*1 (10 classes of digits)

🌶️🌶️🌶️ What if my task changes to recognise if the digit is zero or non-zero? How many neurons should we put in the output layer and what should the new shape of the output layer vector be?

2*1 (only 2 classes of digits!)

The number of neurons in the input and output layer are determined by the task and the dataset.

Next:
Hidden layers: any layer inbetween input and output layers 😅

How many neurons should we put in each hidden layer? Is it pre-defined by the task and the dataset like input/output layers?

No, it is all up to you woo hoo ! It is part of the fun neural net designing process 🥰

Here i choose...

Let's connect these layers following our previous connection rule: only consecutives layers are *directly* linked

ATTENTION 💿

The last piece of puzzle 🧩

Recall the biological process of charging, accumulation and firing

Let's simulate the ANN process from biological analogies, from input layer to output layer

⚠️ Note:
For this lecture, we are not looking at what do the numbers actually mean and how to interpret them.
We are only looking at how this neural process is portrayed computationally.

Recall that each link has a number ("weight") for connection strength

Activations in each layer's activation vector are computed using previous layer's activation vector and the corresponding connection weights

For example, to caculate the first neuron's activation in the first hidden layer: input layers' neurons activations multiplied with corresponding connection weights, and sums up

Wait did that look just like a dot product process? 💿

Indeed, we can simulate the "charging" and "accumulating" process using matrix multiplication ✌️😎

A layer's weights matrix: made of weight of every connection link this layer has with *previous* layer 🧮

Demonstration of first hidden layer's weights matrix multipled with input layer's activation vector on whiteboard 🤘

🌶️🌶️🌶️🌶️ What is the shape of the weights matrix?

🌶️🌶️🌶️🌶️ What is the shape of the weights matrix?
# of neurons in THIS layer
x
# of neurons in PREVIOUS layer

For the "wait till sufficiently charging or thresholding" part, let's introduce bias vector and activation function ✌️

🌶️🌶️ Recall the graph of ReLU function, what is its activate zone? aka the range of input that relates to non-zero output

🌶️🌶️ Recall the graph of ReLU function, what is its activate zone? aka the range of input that relates to non-zero output
from 0 to ∞ !

🌶️🌶️🌶️🌶️ The effect of the bias vector:
What if I want to have an activate zone from 1 to ∞?
Add a bias of -1 to the input.
(you can take some time ponder and wonder here later)

Clarify the input and output of the activation:
The raw activation vector after matrix multiplication and bias vector addition is the input to the activation function (e.g. ReLU).
The activation function output is the actual activation (fires to the next layer)

Demonstration of layer vector added with bias vector on whiteboard (adding or removing extra difficulty for neuron activation to reach "active zone" in activation function)

🌶️🌶️🌶️🌶️ What is the shape of the bias vector?
# of neurons in THIS layer
x
1

Demonstration of layer vector wrapped with activation function on whiteboard

Puzzle almost finished!

/*end of through the lens of layers */
}

Let's write down what just happened using function expression 🤘😎

1. wrap each layer's charging(aka weight matrix multiplication) and thresholding(bias vector and act. function) process as a function

-- function input: a vector, previous layer's activation vector
-- function output: a vector, this layer's activation vector

-- the function body: input multiplied by this layer's weights matrix, added with bias vector, wrapped with activation function

V_output =
ReLU(WeightsMat * V_input + Bias)

Next, how to connect different layers using function expression?

Function chaining!!! demonstration on whiteboard ⛓⛓⛓

Puzzle finished, recall that a model is roughly a big function?

MLP is a model, and a function. Let's writedown the final BIG function for this neural network

🎉🥂🎊

Done! ✌️😤

Note that I made up all the numbers in the weights matrices and bias vectors during demo 🙂

In practice, these numbers are learned through training process.
We'll leave the training process to next week's lecture.
The process we talk about today with assumed weights matrices and bias vector is the forward pass of neural network.
aka how information(activations) is propagated from input to output⏩.
The training process will be about how information is propagated backwards⏪,
from ouput to input to find the proper numbers in weights matrices and bias vectors to make the neural network work.

That's quite a lot, congrats! 🎉

Here is a video from 3B1B that explains MLP from another perspective, very nice.

Next, we are going to:
- take a look at how MLP can be implemented in python with help from NumPy and Pytorch(a very popular deep learning library in Python)!

Alert: you are going to see quite advanced python and neural network programming stuff, we are not expected to understand them all at the moment.
Let's take a look at how some ideas we talk about today are reflected in the code,
especially how we set up a layer by specifying how many neurons it should have.

A prepared google colab notebook
1. click on the link and open this google colab notebook

Let's take a look at the notebook!

- 1. Make sure you have saved a copy to your GDrive or opened in playground. 🎉
- 2. Most parts are out of the range of the content we have covered so far.
- 3. We only need to take a look at the several lines in the "Defining the Model" section.
- 4. IMPORTANT: In practice, we just need to specify the number of neurons in each layer and all the computation is left to computers.

Today we have looked at:
- Neurons as a number container (activation)
- Neurons are grouped in layers
- Layers are connected Hierarchically (from left to righ)
- Input layer
- Output layer
- Hidden layer
- Weights matrix
- Bias vector
- Activation function
- Write the MLP into one big function

We'll see you next Thursday same time and same place!