Hello, guys! I would start a new thread, but I think is worth to use this one to make another question on the cosequences of the use of fillna() method.
In this exercise:
We have to check if filling the missing values of a column with its mean, would change the column mean it self. In the proposed case, the mean keeps the same. In the next part of the exercicise, it was decided to drop the rows with missing values, because to keep’em would affect the distribution.
But If we decided to keep the mean? I thought that it would affects another kinds of analysis where we would like to compare the trend of specific values among different series.
For example, If I would like to check the differences between the evolution of happiness scores in each country per year, I would end up having values that were originally missing, but were replaced by the mean of the entire series. And it would be reflected on a plot with different lines representing the trend for each country.
Is this question really worth of our attention during a data cleaning process? Or is a factor that most of the cases won’t cause any bad impacts in our EDA?
Wish you all a nice weekend!