Entropy formula

Screen Link:

Please explain the 2/5 and 3/5 in the entropy formula. Also, for prob_0 and prob_1 in the solution, why is it

.shape[0] / income.shape[0]

Hi vroomvroom

From previous screen we have following data:
As per the entropy formula, we iterate through each unique value in the high_income column. In this case the unique values are 0 and 1.

Hence, probability of 0 would be 2/5,
and probability of 1 would be 3/5.

In the solution also,
prob_0 is the probability of 0 in the high_income column of income data set.
prob_1 is the probability of 1 in the high_income column of income data set.

income[income["high_income"] == 0] will return all the rows where high_income is 0.
income[income["high_income"] == 0].shape[0] is the number of rows in which high_income is 0.

income.shape[0] is the number rows in income dataset.

So, income[income["high_income"] == 0].shape[0] / income.shape[0] will give us prob_0; which is (number of rows with 0 as high_income value) / (number of rows in the dataset).

Is it clear now?

1 Like