Question about the notation in Information Gain formula

Hi guys,

So I’m learning about decision trees, and to pick the feature to split on, we find the one with the most information gain. The formula for information gain is this:

As explained in the DQ course:

We’re computing information gain (IG) for a given target variable (T), as well as a given variable we want to split on (A).

To compute it, we first calculate the entropy for T. Then, for each unique value v in the variable A, we compute the number of rows in which A takes on the value v, and divide it by the total number of rows. Next, we multiply the results by the entropy of the rows where A is v. We add all of these subset entropies together, then subtract from the overall entropy to get information gain.

Also, @the_doctor did a great job explaining it with an example here.

Here comes my question:

First, I’m no expert on mathematical notations. I’m confused with the part that describes the weight in the formula – |Tv|/|T| , it’s not very intuitive to use the notation for target T here. To my understanding, it seems that |Av|/|A| would be more appropriate.

I would really appreciate some clarification on this one. Thanks ahead!

Hi @veratsien,

I also was confusing about this way of explaining Entropy and IG in this mission. So searching related information I found out, in my opinion, a better explanation here

I hope this info be helpful for you.


1 Like

@Daniel_H Thank you for sharing! This is a really good read and additional learning material on this module. You should share it under the mission screen tag in the Resources category!