Screen Link:
Please explain the 2/5 and 3/5 in the entropy formula. Also, for prob_0 and prob_1 in the solution, why is it
.shape[0] / income.shape[0]
Screen Link:
Please explain the 2/5 and 3/5 in the entropy formula. Also, for prob_0 and prob_1 in the solution, why is it
.shape[0] / income.shape[0]
Hi vroomvroom
From previous screen we have following data:
As per the entropy formula, we iterate through each unique value in the high_income
column. In this case the unique values are 0
and 1
.
Hence, probability of 0
would be 2/5
,
and probability of 1
would be 3/5
.
In the solution also,
prob_0
is the probability of 0
in the high_income
column of income
data set.
prob_1
is the probability of 1
in the high_income
column of income
data set.
income[income["high_income"] == 0]
will return all the rows where high_income
is 0
.
income[income["high_income"] == 0].shape[0]
is the number of rows in which high_income
is 0.
income.shape[0]
is the number rows in income
dataset.
So, income[income["high_income"] == 0].shape[0] / income.shape[0]
will give us prob_0
; which is (number of rows with 0
as high_income
value) / (number of rows in the dataset).
Is it clear now?
Thanks.