Unbale to understand the content for visualizing missing data

Screen Link: https://app.dataquest.io/m/370/working-with-missing-data/5/visualizing-missing-data-with-plots

While learning how we can visualize the null values or the creation between columns that have a null value. My entire attention went into understanding the new line of codes that I saw for the very first time. Like, Creating a triangular mask

    # create a triangular mask to avoid repeated values and make
    # the plot easier to read
    missing_corr = missing_corr.iloc[1:, :-1]
    mask = np.triu(np.ones_like(missing_corr), k=1)

Can someone please help me in understanding how I should try to get the core concept and if these concepts like a triangular mask would be covered in future or not.

4 Likes

Hi @hora.amit,

This is mistake on our end, we haven’t introduced this term before. I will get this issue logged. Triangular mask is basically removing the repeated value in a correlation table:

https://app.dataquest.io/m/370/working-with-missing-data/5/visualizing-missing-data-with-plots

Without it, the values in bottom triangular area will appear in the upper triangular area.

Best,
Sahil

2 Likes

Can you please explain why we are doing this:

missing_corr = missing_corr.iloc[1:, :-1]

It seems like we are excluding the 1st row and last column. But why?

2 Likes

Same here, very confused with this step :S

Hi @priyankamaran.e24, @NicoGuglielmo,

We are excluding them to avoid plotting correlations of the same columns (1.0). Here is what the plot will look like if we don’t exclude the first row and last column.

For making it easier to understand what is going on. Let’s print a subset of the correlation dataframe:
print(missing_corr.iloc[0:5, 0:5])

vehicle_1 vehicle_2 vehicle_3 vehicle_4 vehicle_5
vehicle_1 1.000000 0.151516 0.019972 0.008732 0.004425
vehicle_2 0.151516 1.000000 0.131813 0.057631 0.029208
vehicle_3 0.019972 0.131813 1.000000 0.437214 0.221585
vehicle_4 0.008732 0.057631 0.437214 1.000000 0.506810
vehicle_5 0.004425 0.029208 0.221585 0.506810 1.000000

In the above table, you will notice that the first row is the same as the first column and the last row is the same as the last column. If we remove, the first row and last column, that problem is solved.

vehicle_1 vehicle_2 vehicle_3 vehicle_4
vehicle_2 0.151516 1.000000 0.131813 0.057631
vehicle_3 0.019972 0.131813 1.000000 0.437214
vehicle_4 0.008732 0.057631 0.437214 1.000000
vehicle_5 0.004425 0.029208 0.221585 0.506810

Best,
Sahil

3 Likes

Can you please answer this question?