Youtube: But what is a neural network? | Deep learning chapter 1

Key Takeaway

A neural network is a function made of layers of neurons

Each neuron computes a weighted sum of its inputs, adds a bias, and passes the result through an activation function

The network learns by adjusting those weights and biases — nothing more

This chapter answers:

What is a neural network, structurally and mathematically?
Neural networks ⊂ Machine Learning ⊂ AI

Introduce

Neural Network is a mathematical model that learns patterns from data to solve complex tasks.

A neural network can be viewed as a function that takes in numbers as input and produces numbers as output.
During training, it automatically adjusts its weights and biases so that its predictions become more accurate over time.
Common applications include:
- image recognition
- speech recognition
- language translation

The Challenge of Image Recognition

Background:
- Recognizing handwritten digits, such as 3, is easy for humans.
- However, to a computer, such an image is simply a 28 × 28 grid of pixels.
Challenge:
- Because handwriting styles vary greatly from person to person, it is very hard to write a fixed set of if-else rules that can correctly identify every digit.
Solution:
- Neural networks solve this by learning through multiple layers of abstraction, breaking complex visual patterns into simpler features, much like the human brain does.

Structure of the Network

A neural network usually has:
- Input layer:
  - Each input unit represents one pixel value.
- Hidden layer(s):
  - Hidden layers process the input step by step and learn useful features.
- Output layer:
  - If the task is digit recognition, the output layer often has 10 neurons, representing digits 0 to 9.
Weight and Biases:
- Each layer has its own weights and biases, but the whole network usually has one final cost.
Cost:
- A neural network usually produces one final cost for a training example. This cost is computed from the final output, and the final output depends on the weights and biases of every layer.

What is neuron?

Simple explanation：A neuron holds a number, output val called activation between 0 and 1.
More precisely, a neuron is a small computational unit that computes and outputs a value.

A neuron computes weighted sum, adds a bias, and then applies an activation function.
During training, the network adjusts its weights and biases to improve its predictions.
The output of a neuron is called its activation
- A high activation means the neuron is strongly responding.
- A low activation means it is not responding much.

Matrix Form of a Neural Network Layer

Each neuron is still computed individually. The matrix formula is just a compact way to write all neurons in the layer at once.

For one neuron separately:
- linear combination before activation: $z = w_{1} a_{1} + w_{2} a_{2} + \dots + w_{n} a_{n} + b$
- It takes $z$ as input and produces the final neuron output: $a = σ (z)$
For a whole layer: $a^{(l + 1)} = σ (W^{(l)} a^{(l)} + b^{(l)})$

What does each symbol mean?

1. `a` activation

$a^{(l)}$ Activation vector of layer $l$ .
It contains the outputs of all neurons in the current layer.

For example:

if layer $l$ has 3 neurons: $a^{(l)} = 0.2 0.7 0.1$
This means the three neurons in the current layer output 0.2, 0.7, and 0.1.

2. `W` weight

$W^{(l)}$ weight matrix connecting layer $l$ to layer $l + 1$ .
weight = importance of an input
If the current layer has 3 neurons and the next layer has 2 neurons, $W^{(l)} has shape 2 \times 3$
- 2 rows → one row for each neuron in the next layer
- 3 columns → one column for each neuron in the current layer

Example: $W^{(l)} = [0.5 0.8 - 0.2 0.4 0.1 - 0.6]$

3. `b` bias

$b^{(l)}$ bias vector for the next layer.
Bias is an extra number added at the end of the weighted sum.
If the next layer has 2 neurons: $b^{(l)} = [0.1 - 0.3]$
Each neuron in the next layer has its own bias.

4. σ activation function

Activation function, such as:
- sigmoid
- ReLU: stands for Rectified Linear Unit.
It is applied element-wise to the vector.
For example, if $z = [1.2 - 0.5]$ , then $σ (z) = [σ (1.2) σ (- 0.5)]$

Sigmoid and ReLU (activation functions)

Function	Formula	Output range	Meaning
Sigmoid	$σ (x) = \frac{1}{1 + e ^{- x}}$	$(0, 1)$	squash into 0 to 1
ReLU	$ReLU (x) = max (0, x)$	$[0, \infty)$	keep positive, cut off negative

input → weighted sum + bias → activation function → output
More precisely: nonlinear activation function
Activation functions add nonlinearity, so the network can learn complex patterns.

Sigmoid
- $σ (x) = \frac{1}{1 + e ^{- x}}$
- It squashes the pre-activation value $z = W x + b$ into a value between 0 and 1.
ReLU
- $ReLU (x) = max (0, x)$
- if x < 0, output = 0
- if x ≥ 0, output = x

Why Are Layers Useful?

Layers help the network turn low-level features into high-level features.
the network can gradually transform the representation, making the final classification easier.
In image recognition, earlier layers may capture edges or curves, while later layers combine them into shapes and finally into whole digits.
- Pixels → edges/curves → shapes → digit

Example

Example 1: Layer to Layer Calculate

Each neuron in the next layer computes a weighted sum of all activations from the previous layer, adds a bias, and then applies an activation function.

3 neurons layer transfer to 2 nuerons layer

Layer to layer:
- Each connection between neurons has its own weight.
- Each neuron has its own bias.

Suppose

Suppose the current layer has 3 neurons: $a^{(l)} = 123$
Weight matrix: $W^{(l)} = [0.1 0.4 0.2 0.5 0.3 0.6]$
Bias vector: $b^{(l)} = [0.5 - 0.5]$

Step 1: Compute the linear part

$z^{(l + 1)} = W^{(l)} a^{(l)} + b^{(l)}$
First do the matrix multiplication: $W^{(l)} a^{(l)} = [0.1 0.4 0.2 0.5 0.3 0.6] 123$
- First row: $0.1 (1) + 0.2 (2) + 0.3 (3) = 0.1 + 0.4 + 0.9 = 1.4$
- Second row: $0.4 (1) + 0.5 (2) + 0.6 (3) = 0.4 + 1.0 + 1.8 = 3.2$
- So: $W^{(l)} a^{(l)} = [1.4 3.2]$
Now add the bias: $z^{(l + 1)} = [1.4 3.2] + [0.5 - 0.5] = [1.9 2.7]$

Step 2: Apply the activation function

If we use sigmoid: $a^{(l + 1)} = σ (z^{(l + 1)})$
That means: $a^{(l + 1)} = [σ (1.9) σ (2.7)]$
Sigmoid formula: $σ (x) = \frac{1}{1 + e ^{- x}}$
Approximate values:
- $σ (1.9) \approx 0.87$
- $σ (2.7) \approx 0.94$
So: $a^{(l + 1)} \approx [0.87 0.94]$

How to view each neuron separately

First neuron in the next layer

It uses the first row of the weight matrix:
- $z_{1} = 0.1 (1) + 0.2 (2) + 0.3 (3) + 0.5 = 1.9$
- $a_{1} = σ (1.9)$

Second neuron in the next layer

It uses the second row of the weight matrix:
- $z_{2} = 0.4 (1) + 0.5 (2) + 0.6 (3) - 0.5 = 2.7$
- $a_{2} = σ (2.7)$

Example 2: Weight

Learning → finding the right weights and biases

Notes

Explorer

DL1 Neural Network Introduction

Introduce

The Challenge of Image Recognition

Structure of the Network

What is neuron?

Matrix Form of a Neural Network Layer

What does each symbol mean?

1. `a` activation

2. `W` weight

3. `b` bias

4. σ activation function

Sigmoid and ReLU (activation functions)

Why Are Layers Useful?

Example

Example 1: Layer to Layer Calculate

Suppose

Step 1: Compute the linear part

Step 2: Apply the activation function

How to view each neuron separately

First neuron in the next layer

Second neuron in the next layer

Example 2: Weight

Table of Contents

Graph View

Table of Contents

Backlinks

Notes

Explorer

DL1 Neural Network Introduction

Introduce

The Challenge of Image Recognition

Structure of the Network

What is neuron?

Matrix Form of a Neural Network Layer

What does each symbol mean?

1. a activation

2. W weight

3. b bias

4. σ activation function

Sigmoid and ReLU (activation functions)

Why Are Layers Useful?

Example

Example 1: Layer to Layer Calculate

Suppose

Step 1: Compute the linear part

Step 2: Apply the activation function

How to view each neuron separately

First neuron in the next layer

Second neuron in the next layer

Example 2: Weight

Graph View

Table of Contents

Backlinks

1. `a` activation

2. `W` weight

3. `b` bias