Given inputs: X1 = 0.5 and X2 = 0.3. Weights are denoted as W1, W2, W3, W4, W5, and W6.
Z1 = X1 * W1 + X2 * W3 = 0.47.
H1 = sigmoid(Z1) = 0.615.
Z2 = X1 * W2 + X2 * W4 = 0.582.
Z3 = H1 * W5 + H2 * W6 = 0.6.
O1 = sigmoid(Z3) = 0.645.
Compute the Mean Squared Error (MSE):
MSE is given by: C = (1/n) * Σ (y - ŷ)^2
Assuming ŷ = 0.645 and C = 0.126, solving for y gives: y ≈ 1.000 or 0.290.
In the backpropagation stage, you aim to update the weights of the neural network to minimize the cost function using the gradient descent algorithm. Here's a summary of the process:
Wnew = Wold - η * (∂C/∂W)
The gradient of C with respect to W is computed using calculus, leveraging the chain rule. For instance, for a weight W5 in the output layer:
∂C/∂O1 = -(y - O1).
∂O1/∂Z3 = O1 * (1 - O1).
∂Z3/∂W5 = H1.
∂C/∂W5 = ∂C/∂O1 * ∂O1/∂Z3 * ∂Z3/∂W5 = -(y - O1) * O1 * (1 - O1) * H1.
Differentiating the cost function with respect to weights and biases means calculating how much the cost function would change if you made a small change in those parameters. This helps in adjusting weights to minimize the cost function efficiently.
Back to Blog