As far as I know, correlation is used between two numerical columns. However, columns like vehicle_1', vehicle_2… and cause_vehicle_1, all contain nominal data. Then, how is it possible to calculate the correlation of nominal data?
Also, what is a null correlation, I’m familiar with “correlation” but never heard of “null correlation”? what is it?
Update: While searching the web for “null correlation”, I encountered the “nullity correlation matrix”. Can some please explain what does it mean how to interpret it?
It’s just to help understand the correlation table better using visualization.
A dark blue box would indicate a high +ve correlation whereas, a dark red box would indicate a high -ve correlation. The light-colored boxes would indicate a weaker correlation (based on color, either -ve or +ve)
It’s just a visual aid to understand if the columns are somehow related to each other.
Consider the off_street and on_street columns showing -0.99 correlation for missing values.
The dark res box, enables/ helps us to understand, that if the collision happened off the street, we won’t find data for the on_street column. The value will be present for the off_street column but not for the latter.