Backpropagation

Step 1: Feed Forward

Inputs and Weights:
Given inputs: X1 = 0.5 and X2 = 0.3. Weights are denoted as W1, W2, W3, W4, W5, and W6.
Compute the weighted sum for hidden layer node 1:
Z1 = X1 * W1 + X2 * W3 = 0.47.
Apply the activation function (sigmoid) for hidden layer node 1:
H1 = sigmoid(Z1) = 0.615.
Compute the weighted sum for hidden layer node 2:
Z2 = X1 * W2 + X2 * W4 = 0.582.
Compute the weighted sum for the output layer node:
Z3 = H1 * W5 + H2 * W6 = 0.6.
Apply the activation function (sigmoid) for the output node:
O1 = sigmoid(Z3) = 0.645.

Step 2: Cost Calculation

Compute the Mean Squared Error (MSE):
MSE is given by: C = (1/n) * Σ (y - ŷ)^2
Assuming ŷ = 0.645 and C = 0.126, solving for y gives: y ≈ 1.000 or 0.290.

Step 3: Backpropagation and Gradient Descent

In the backpropagation stage, you aim to update the weights of the neural network to minimize the cost function using the gradient descent algorithm. Here's a summary of the process:

Gradient Descent Update Rule:
Wnew = Wold - η * (∂C/∂W)
Computing Gradients Using the Chain Rule:
The gradient of C with respect to W is computed using calculus, leveraging the chain rule. For instance, for a weight W5 in the output layer:
- Calculate the derivative of C with respect to the output node activation O1:
- Calculate the derivative of the activation function with respect to Z3:
- Calculate the derivative of Z3 with respect to W5:
- Apply the chain rule to compute the gradient of C with respect to W5:

What Differentiation Means

Differentiating the cost function with respect to weights and biases means calculating how much the cost function would change if you made a small change in those parameters. This helps in adjusting weights to minimize the cost function efficiently.

Back to Blog