I don't understand the dictionary for the decision tree (the printed version as well)

Decision tree dictionary is really unclear for me. Even when we print out each step I still can’t quite get it, for example:
age > 37.5
age > 25.0
age > 22.5
Leaf: Label 0
Leaf: Label 1
Leaf: Label 1
age > 55.0
age > 47.5
Leaf: Label 0
Leaf: Label 1
Leaf: Label 0

so Age > 37.5 and the next node age >25? Aren’t all entries with age > 37.5 automatically have a higher age then 25? I think I’m missing something really simple here but so far it doesn’t make any sense.
Thanks!

1 Like

Hi @VitaliyNechay,

You can think of the tree as a very long IF ELSE tree.

So, for the first/node, we set the condition to “age > 37.5”. If the age is above 37.5, the values will be assigned to the right node, else assign those with age less than 37.5 to the left node.

In the algorithm, you handle the ones which do not pass the node condition first so there’s going to be a heavy emphasis on the left which processes all values that “failed” the node condition.


The steps are something like this, node by node:

Node 1: This is the root node so it has all the values (all ages). The median of all ages is 37.5. Set node condition to “age > 37.5”; any value less than 37.5 will go to the left while those above 37.5 go to the right. Process the left first.

Node 2 (Left of Node 1): The values that are less than 37.5 are 20, 25, and 35. The median for the 3 numbers is 25 so the condition for this node is “age > 25”. Pass 20 and 25 to the left and, 35 to the right. Process the left first.

Node 3 (Left of Node 2): We now only have 20 and 25 with a median of 22.5. Set “age > 22.5” as the condition. Pass 20 to the left and 25 to the right. Process the left first.

Node 4 (Left of Node 3): There’s only one value here: 20. Thus, we assign the node with the a high-income label. The row with age 20 has a high-income value of “0”. Since there’s only one value in this node, go back to the previous node to process its right node.

Node 5 (Right of Node 3): This node comes from Node 3 and the value available is 25. Since that’s the only value available, assign the node with a high-income label which in this case the row with age 25 has a high-income label of “1”. Since there’s only one value in this node, go back to the previous node to process its right node. Because Node 3 has processed both its left and right nodes, go back another step above it which is Node 2. Process Node 2’s right.

Node 6 (Right of Node 2): This node comes from Node 2 and the value available is 35. Since that’s the only value available, assign the node with a high-income label which in this case the row with age 35 has a high-income label of “1”. Since there’s only one value in this node, go back to the previous node to process its right node.

Node 7 (Right of Node 1): Now we’re back at the initial node. Repeat the whole process again but now with those age values above 37.5.

And so forth.


I’m on my phone right now but the above is the best explanation I can muster for a somewhat visual topic. I can make a visual step-by-step picture that’s based on the decision tree itself if you need more guidance, but I need my computer which could take awhile before I have access to it.

And, one thing worth noting is that the decision tree (the one in the form of a Python dictionary) is not exactly linear (in terms of the structure) and it does not follow a proper step-by-step approach.

1 Like