What is null correlation?

Screen url: https://app.dataquest.io/m/370/working-with-missing-data/5/visualizing-missing-data-with-plots

As far as I know, correlation is used between two numerical columns. However, columns like vehicle_1', vehicle_2… and cause_vehicle_1, all contain nominal data. Then, how is it possible to calculate the correlation of nominal data?

Also, what is a null correlation, I’m familiar with “correlation” but never heard of “null correlation”? what is it?

Update: While searching the web for “null correlation”, I encountered the “nullity correlation matrix”. Can some please explain what does it mean how to interpret it?

2 Likes

Hello @prateek, which mission are you referring to?
Kindly provide us with the link.

1 Like

I’ve updated the question please take a look.

@prateek

False equals 0 and True equals 1.

missing_corr = df[cols_with_missing_vals].isnull()

The above dataframe contains True and False values.

Thanks that makes sense.
Can you shed some light on how to interpret the null correlation matrix like the one below?

Specifically, what does it mean when we say:

  • the nullity correlation is very close to 0.
  • the nullity correlation is very close to 1.
1 Like

hi @prateek
I am not @Sahil by the way. Similarly we can say this is not a nullity correlation.

If you can understand from this, great! if not no worries me neither :stuck_out_tongue: Please take it up once you understand matrices and vectors - should come up in further course in DS track.

I also found a twin of your question here

Coming to interpreting the data (not sure what’s the reference for the nullity correlation though), it’s explained in the next mission itself.

In simplest terms, if vehicle_3 is missing, high chances are that cuase_vehicle_3 data would also be missing based on the 0.96 correlation.

hope that helps.

3 Likes

@Rucha

Thanks for the reply.

Correct me if I am wrong.

  • A perfect positive correlation (r = 1) means that both the values are missing.
  • A perfect negative correlation (r = -1) means that if one variable is missing then the other one is not.

Coming to interpreting the data (not sure what’s the reference for the nullity correlation though), it’s explained in the next mission itself.

Where is it explained? Can you provide me the link?

Perhaps the content of the same mission, page 6 - https://app.dataquest.io/m/370/working-with-missing-data/6/analyzing-correlations-in-missing-data

@Rucha

You didn’t comment on my conclusion:

Correct me if I am wrong.

  • A perfect positive correlation ( r = 1 ) means that both the values are missing.
  • A perfect negative correlation ( r = -1 ) means that if one variable is missing then the other one is not.

Also, I still don’t understand what’s the purpose behind creating the following plot.

Can you explain it in easy terms?

Hi @prateek

Yup for the conclusion.

It’s just to help understand the correlation table better using visualization.

A dark blue box would indicate a high +ve correlation whereas, a dark red box would indicate a high -ve correlation. The light-colored boxes would indicate a weaker correlation (based on color, either -ve or +ve)

It’s just a visual aid to understand if the columns are somehow related to each other.

Consider the off_street and on_street columns showing -0.99 correlation for missing values.
The dark res box, enables/ helps us to understand, that if the collision happened off the street, we won’t find data for the on_street column. The value will be present for the off_street column but not for the latter.