Unable to understand pandas.crosstab()

,

I was going through https://app.dataquest.io/m/26/clustering-basics/7/exploring-the-clusters chapter and it talks about crosstab() function. The crosstab() of
is_smoker = [0,1,1,0,0,1]

and

has_lung_cancer = [1,0,1,0,1,0]

is

has_lung_cancer 0 1
smoker
0 1 2
1 2 1

I’m unable to understand the concept. Can anyone explain this or give a better example?

Thank you for your question. I want to make a python code for better intuition of crosstab concept. I have converted your question to a pandas dataframe.


It defines the cross-frequency for each individual occurrence. If you look at the data under the Lung Cancer and the Smoker columns, you will find two unique values 0 and 1 for both columns. Now crosstab() module finds the frequency of the occurrence of (0, 0) at the same in both columns. Again, (0, 1), (1 , 1). You can do it for multiple columns as well.

4 Likes

Thank you so much for the great explanation. Helped me understand well.

1 Like

@siddhantjawa18. you are welcome.