Hi guys,
So I’m learning about decision trees, and to pick the feature to split on, we find the one with the most information gain. The formula for information gain is this:
As explained in the DQ course:
We’re computing information gain (IG) for a given target variable (T), as well as a given variable we want to split on (A).
–
To compute it, we first calculate the entropy for T. Then, for each unique value v in the variable A, we compute the number of rows in which A takes on the value v, and divide it by the total number of rows. Next, we multiply the results by the entropy of the rows where A is v. We add all of these subset entropies together, then subtract from the overall entropy to get information gain.
Also, @the_doctor did a great job explaining it with an example here.
Here comes my question:
First, I’m no expert on mathematical notations. I’m confused with the part that describes the weight in the formula – |Tv|/|T|
, it’s not very intuitive to use the notation for target T
here. To my understanding, it seems that |Av|/|A|
would be more appropriate.
I would really appreciate some clarification on this one. Thanks ahead!