You are right that there are some inconsistencies in the content here. We are currently working hard on updating the DE path to match our new standards. Improving this mission is part of our roadmap but as you can imagine, it takes some time to go over the content and improve it.
Your understanding of it seems to be correct, to be one the safe side, I will explain it in other words.
A node in a b-tree has two components:
- A list of keys
- A list of children nodes
For a node with keys 4, 15, 37 we want to have one child for:
- keys lower than 4
- keys between 4 and 15
- keys between 15 and 37
- keys higher than 37
As you can see, we need four children in this case. In general, if a node has k keys, then we will need k + 1 children to represents all ranges of keys.
Here is a figure illustrating this:
If you think about it, this is a generalization of what happened with binary trees. In a binary tree, we had a single key per node. Therefore we only needed to represent two ranges of values in the children: the values less than that key and the values larger than that key. When allowing more keys in a node, we need more children to represent the ranges of keys in between.
In terms of implementation, the child at position i in the
children array of a node will represent the keys that lie between the keys at indexes i - 1 and i. Let’s see a concrete example:
In the above figure, you see that the child at index 3 will represent keys that lie between
keys[3 - 1] = keys = 37 and
keys = 55.
When we add a new key, we start from the root and search of a leaf where that key fits. This search is similar to the one used for binary trees. The only difference here is that, since a node represents several ranges, we need to iterate over the keys to find the right one.
The following figure shows where keys 5, 12 and 20 would be inserted in an example b-tree.
Starting at the root node, we compare the new key with the keys until we find the range were the new key fits. Then we go to the child that represents that key and continue the search in the same way until we reach a leaf node (a node with no children).
The resulting three of inserting these values would be:
Now, in order to have efficient operations, we need to limit the number of keys a node can hold. Otherwise, we would simply insert everything on the root and get a list structure rather than a tree. This is the t factor mentioned in the mission.
When we create a b-tree, we set a factor t and set a rule that no node can contain 2t keys or more. When a node reaches 2t keys, we split it into two giving the middle value to the parent node. Imagine that our b-tree from the previous example has t = 2. Then when a node reaches 2t = 4 keys we will split it.
Let’s add the value 14 to the b-tree to see this.
This result in a node with four keys: 11, 12, 13, 14 which violates our b-tree condition:
To fix it, we will take the middle key and bring it to the parent node. Here there are two middle keys: 12 and 13. We can choose any of the in them. Let’s say we move 13 up.
This operation will result in the following b-tree:
The mission mentions a minimum number of keys that a non-root node can contain. This value is t - 1. But this actually is more of a consequence of the implementation than a restriction. Since we split a node when it reaches 2t keys, the resulting nodes will have at least t - 1 keys. This is a simpler way of thinking about it.
Note than when pushing a value to its parent, it might be the case that this node also becomes over populated. In this case a new split needs to happen. This process goes on going up the tree and fixing the nodes along the way.
I hope this helps you better understanding b-tree. I will be happy to answer any further questions that you might have.