In MSE/SSE loss function, y_true - y_pred or y_pred - y_true

Something bugging me is different sources teaching backpropagation define the MSE loss function differently. Some do y_true-y_pred, others y_pred-y_true.
Which order you define it affects whether a negative sign comes out after differentiating by y_pred.

This makes me confused when updating gradients should it be + or - the dLoss/dWeight.
I’m confused about the math. It seems writing both ways gives the same loss since it’s squared, but only 1 of them is actually correct in backpropagation, and i don’t know which one or how to derive that.

Hey, Han.

Have you revisited the chain rule?

Here’s a simplified example of what may be going on. Consider the functions J_0,J_1, g_0, g_1 from \mathbb R^2 to \mathbb R defined by the following expressions, for all real numbers x and y:

\begin{align} J_0(x,y) &= (x-y)^2\\ J_1(x,y) &= (y-x)^2\\ g_0(x,y) &=x-y\\ g_1(x,y) &=y-x \end{align}

Now let f(x) be x^2 for all real numbers x. Recall that \forall x\in \mathbb R\left(f'(x)=2x\right).

We can look at J_0 and J_1 as f\circ g_0 and f\circ g_1, respectively. Or, in other symbols:

\begin{align} J_0(x,y) &= f(g_0(x,y)),\\ J_1(x,y) &= f(g_1(y,x)),\\ \end{align}

for all (x,y) in \mathbb R^2.

Now let us compute the partial derivatives of these functions with respect to the first variable:

\begin{align} \color{orange}{\frac{\partial J_0}{\partial x}(x,y)} &= f'(g_0(x,y))\dfrac{\partial g_0}{\partial x}(x,y) \tag{Chain rule}\\ &= 2g_0(x,y)\cdot 1\\ &= 2(x-y)\\ \end{align}
\begin{align} \frac{\partial J_1}{\partial x}(x,y) &= f'(g_1(x,y))\dfrac{\partial g_1}{\partial x}(x,y) \tag{Chain rule}\\ &= 2g_1(x,y)\cdot (-1)\\ &= 2(x-y)\\ &= \color{orange}{\frac{\partial J_0}{\partial x}(x,y)} \end{align}

Hope this helps.

Thanks i get it now. Totally missed the part about f(g(x,y)) producing another negative

1 Like