Working with Missing Data - 9. Missing Data in the "Location" Columns

In the worked example the following code is used:
sorted_location_data = location_data.sort_values(loc_cols)
plot_null_matrix(sorted_location_data)

I’ve tried to reproduce plot_null_matrix outside of the DataQuest platform and I get a NameError, name plot_null_matrix is not defined - I went searching for another example of this method and couldn’t find anything. So I tried the following:

sns.heatmap(sorted_location_data.isnull(), cbar=False)

Which appears to produce the same result.

Also for anyone interested there’s a module called missingno which has a few more useful features, such as the bar chart of the number of missing values in each column and the dendrogram generated from the correlation of missing value locations.

Hi @otto.roberson,

As we created plot_null_matrix() function in the mission backend, the code is not available from the mission. Here is the code for that function:

def plot_null_matrix(df, figsize=(18,15)):
    # initiate the figure
    plt.figure(figsize=figsize)
    # create a boolean dataframe based on whether values are null
    df_null = df.isnull()
    # create a heatmap of the boolean dataframe
    sns.heatmap(~df_null, cbar=False, yticklabels=False)
    plt.xticks(rotation=90, size='x-large')
    plt.show()

Best,
Sahil

Thanks Sahil for sharing the function!

I usually try to run the code in the worked examples as it keeps me actively participating and practicing. I find that it helps to understand and retain the material. As constructive feedback it might be useful to update the mission so that the code is included, to avoid any confusion about where the results come from- although in this case I was able to reproduce a similar result following my own research.

1 Like