Core Idea
Cost tells us how bad the current parameters are.
Gradient tells us how to change the parameters to make the cost smaller.
1. Single-Parameter Case
This is the simplest case, only for understanding the idea.
-
Suppose
- Assume this cost function is smallest when
- is the parameter
- is the cost
- the gradient is the derivative:
-
If , then
-
The derivative is , so at ,
-
This means:
- current cost = 4
- current gradient = -4
-
The update rule is
-
Since the current gradient is , the slope is negative at .
- A negative gradient means that increasing will reduce the cost.
- So gradient descent moves in the opposite direction of the gradient.
-
Using the update rule :
- if , then
-
This means:
- the parameter moves from to
- it moves closer to , which is the value that minimizes the cost
- after the update, the cost becomes smaller
2. Multi-Parameter Case
This is the real neural network case.
- The cost depends on many parameters:
- So we use partial derivatives instead of a single ordinary derivative.
- Partial Derivative measures how a function changes with respect to one variable while keeping all other variables fixed.
- The gradient vector is built one component at a time:
- : change only , keep others fixed
- : change only , keep others fixed
- : change only , keep others fixed
Then we stack them together:
At the current parameter values, the cost function gives one cost value, which is a single number.
From this one cost function, we compute a gradient vector by measuring how that cost changes with respect to each parameter, one parameter at a time.
- cost = a scalar: one number that measures how bad the current parameters are
- The cost function produces one scalar value.
- scalar: a single numerical value
- gradient = a vector: a collection of partial derivatives showing how that same cost changes with respect to each parameter
3. Relationship Between Cost and Gradient
- cost tells us how bad the current parameters are
- gradient tells us how each parameter should change to reduce the cost
In short:
- cost = current error
- gradient = direction for improvement
4. Training Process
- Randomly initialize weights and biases
- Run a forward pass
- Compute the cost
- Compute the gradient for each parameter
- Update parameters using gradient descent
For all parameters together:
5. Important Distinction
Do not confuse parameter values with gradient values.
For example:
- parameter values:
- gradient values:
The parameters are the current values inside the model.
The gradient values tell us how sensitive the cost is to those parameters.
6. Final Summary
-
single parameter → derivative
-
multiple parameters → gradient vector
-
Cost measures current performance.
-
Gradient shows how to adjust the parameters to make the cost smaller.