# Entropy formula

Please explain the 2/5 and 3/5 in the entropy formula. Also, for prob_0 and prob_1 in the solution, why is it

``````.shape / income.shape
``````

From previous screen we have following data: As per the entropy formula, we iterate through each unique value in the `high_income` column. In this case the unique values are `0` and `1`.

Hence, probability of `0` would be `2/5`,
and probability of `1` would be `3/5`.

In the solution also,
`prob_0` is the probability of `0` in the `high_income` column of `income` data set.
`prob_1 ` is the probability of `1` in the `high_income` column of `income` data set.

`income[income["high_income"] == 0] ` will return all the rows where `high_income` is `0`.
`income[income["high_income"] == 0].shape ` is the number of rows in which `high_income` is 0.

`income.shape ` is the number rows in `income` dataset.

So, `income[income["high_income"] == 0].shape / income.shape ` will give us `prob_0`; which is (number of rows with `0` as `high_income` value) / (number of rows in the dataset).

Is it clear now?
Thanks.

1 Like