Trees, B-Trees, B+Trees and H-Trees

J

Jarret W. Buse

Guest
B+Tree

B-Trees are used to help search various data. Some file systems use B+Tree to search directories, extent descriptors, file allocation and even file retrieval. In the future, B+Tree may be used to search through more types of data within the file systems.

First, let's look at a Tree, and we'll use letters to represent files (C, F, L, N, P and Z). A Tree is a tree structure (upside down as most people say), which consists of one root node, children nodes and leaf nodes. Each node contains a key and pointer. The key is the search item, such as a file name. The data pointer for the key points to the actual data. So, looking at Figure 1, let's assume File L is entered first in the Tree.

Figure 1.JPG

FIGURE 1​

The next File added is File C. File C is added under the root node and to the left since the value is less than the root (C<Z). Greater values go to the right, so if File Z is added, it goes to the right of the root as shown in Figure 2.

Figure 2.JPG

FIGURE 2

We have 3 files left to add, so let's add File F, N and P as shown in Figure 3.

Figure 3.JPG

FIGURE 3

In this case, the Tree is not balanced; that is, there are more nodes to the right of the root than to the left. In some cases, this may be fine. If you access File L more than the others, the search will occur very quickly. Of course with the number of files on some file systems being over 10,000, this Tree would be quite large.

From Figure 3, Node L is the root node. Node C is a child node (as is Node Z and N). Nodes F and P are Leaf Nodes. Leaf Nodes are the nodes on the end that have no child nodes. Node C is considered a parent of Nodes F. Node L (root) is the parent of Node C and Z.

In an extreme case, the root could be File Z, followed by a child node of File P, then N, L, F and finally C as shown in Figure 4. The layout would produce a very unbalanced tree that most likely would cause lengthy searches.

Figure 4.JPG

FIGURE 4

Now, B-Tree uses more keys within a node, otherwise referred to as the order. If a B-Tree node contains five pointers, it is a B-Tree of order 5. Let's look at an example of a B-Tree root as shown in Figure 5.

Figure 5.JPG

FIGURE 5

The dots represent the pointers while the numbers represent the key values (file names, etc.). Notice that for this node, each Key has a partnered Pointer: Key-1 and Pointer-1, Key-2 and Pointer-2, etc. If you look at Figure 5, you notice there is an extra Pointer (Pointer-0). The tree works in a similar way as a regular tree. If the search item we are looking for is less than Key-1, Pointer-0 is followed. If the search item is greater than or equal to Key-1 and less than Key-2, we follow Pointer-1. If the search item is greater than or equal to Key-2 and less than Key-3, we follow Pointer-2.

For example, if we were searching for number 1, we would follow Pointer-0. If we were searching for 12, we would follow Pointer-2. By following these pointers, we are led to another node to perform the same task and so on until we reach a leaf which contains the search value. The search value points to our file we are searching for or the search item is not found and an error message is returned.

Take a look at Figure 6 for the following example.

Figure 6.JPG

FIGURE 6

Let's say we are searching for 5. We start at the root node and determine if the search value is less than Key-1 (3). Since it is not, we see if the search value is between Key-1 (3) and Key-2 (9), which it is so we follow Pointer-1. At this node, we check if the search value is less than Key-1 (5), it is not. Our search value is equal to Key-1 (5) so we follow Pointer-1 and find two leaf nodes (5 and 6). The first value matches our search, so we use the value in the leaf node to get to the file for which we have been searching.

Now, let's say we are searching for File-18. We start at the root and follow Pointer-2 since our search value is greater than or equal to Key-2 (9). At the next node, we have three key values to check: 10, 15 and 22. We know that 18 is greater than 10 and 15, so we can skip Pointer-0 and Pointer-1, and we follow Pointer-2. At the leaf nodes, we find two leaves, 15 and 20. File-18 does not exist and a message can alert the user that there is no such file.

NOTE: Be aware that these searches typically are extremely fast. You can see how a tree can allow for faster searching than going through a whole sequential file.

Now to move on to the more difficult parts: insertion and deletion. For Figure 6, let's add File-12. The first thing that is done when doing an insertion or deletion is to search for the entry that is being inserted or deleted. This is done for specific reasons:

  • Insertion - the entry must not already exist
  • Deletion - the entry must already exist
If the entry to be inserted exists, or the entry to delete does not exist, a message is generated. In the case of an indexed file listing for a file system, if a file exists that we are copying to the harddisk, we get a query asking to overwrite the file (it was found in the B-Tree). If we want to delete a file that does not exist, we get an error that the file does not exist.

Now, to get to the details of inserting File-12; if you look at Figure 6, we follow Pointer-2 to the next node. File-12 is greater than Key-1 (9) and less than Key-2 (15), so we follow Pointer-1. Now we find two leaf nodes (10 and 14). File-12 should be inserted between the two as shown in Figure 7. File-12 is placed between the leaves since it goes there, and since it falls between Leaf-10 and Leaf-14, no entries need to be made in a node.

Figure 7.JPG

FIGURE 7

Now, let's look at the possibility of adding File-8. Looking at Figure 7, we can see the File-8 would be searched for, from the root, down Pointer-1. File-8 does not fit in this node anywhere, so another key must be made: Key-3 (8) as shown in Figure 8.

Figure 8.JPG

FIGURE 8

Now we can try another. Let's add File-30. Following Pointer-2 from the root, we get to a node that has one space left, and File-30 is added there as shown in Figure 9.

Figure 9.JPG

FIGURE 9

What happens if we want to add File-40? Well, if the B-tree is an order 5, we can only have five pointers per node. By adding File-40, we would create a node with more than five pointers. To accomplish the insertion, we take the node that is full and remove half the keys and pointer pairs. These entries will then be placed in a new node. All associated leaves will be moved as well, as shown in Figure 10.

NOTE: Each node must contain at least 2 keys (the root is the exception).

Figure 10.JPG

FIGURE 10

The keys (22 and 30) are moved to a new node. The largest leaf value (20) is added to the previous node Key-3 so we now have a new high end for the node. Key-2 will now become File-40 when inserting the new key. The first Key (30) of the new node must be placed in the root and a pointer associated with it. As you can see, File-22 and File-27 are placed in leaf nodes with Pointer-0 pointing to it.

NOTE: When something changes, the effect can "ripple" up to the root node. This is extremely true for large B-Trees which may have fuller nodes.

Looking at Figure 10, if one of the following entries were to be deleted (6, 9, 12, 14, 22, or 27), these could be removed with no further actions. A search would be performed and once the entry was found, the leaf would be deleted.

NOTE: If File-22 were removed, then the root would consist of entries 3, 9 and now 27 (the smallest leaf in the node).

For example, to delete File-9 from the B-Tree would result in Figure 11.

Figure 11.jpg

FIGURE 11

Since the smallest file is now File-10, the root node would change from 9 to 10.

Now, let’s look at what happens if we remove File-5. File-5 can easily be removed, but it is also a key. Here the key would be removed as well. Keys (7 and 8) to the right of the key would be moved to the left. Any leaf nodes not being removed (6) would be moved to be with the leaves (3) to the left as well as shown in Figure 12.

Figure 12.jpg

FIGURE 12

If we removed File-30, the same would happen, but the key in the root would change to the new Key-1 for the node as shown in Figure 13.

Figure 13.jpg

FIGURE 13

If we also removed File-40, the last node would be removed as well as Key-3 in the root node as shown in Figure 14. The leaves 22 and 27 can be moved to the left node.

Figure 14.jpg

FIGURE 14

The difference between the B-Tree and B+Tree is that a B+Tree allows for data to be stored in the leaves only, while a B-Tree can store data in the Nodes. B+Trees can also store keys with the same data to allow for redundant data, but B-Trees cannot do this.

Note: Another type of Tree is the H-Tree. The H-Tree is the same as a B+Tree except that the keys are not a file name, directory name or whatever is being searched, but the keys are hashes. A hash is made of the key being placed into the H-Tree.
 

Attachments

  • slide.jpg
    slide.jpg
    8.6 KB · Views: 123,407


Hello,

I think there is a mistake.
If you go back to figure 10 and look for File-22:
- from the root node, we find that 22 is between 10 and 30, so we follow Pointer-2,
- from there we cannot find File-22.

I suspect that the root node should have been updated with 22 instead of 30 when File-40 has been added.
Am I wrong ?
 
Hello,

I think there is a mistake.
If you go back to figure 10 and look for File-22:
- from the root node, we find that 22 is between 10 and 30, so we follow Pointer-2,
- from there we cannot find File-22.

I suspect that the root node should have been updated with 22 instead of 30 when File-40 has been added.
Am I wrong ?


You are correct, you have a good eye for B-Trees. I'll have to update the figures. After writing the article and looking at those nodes for so long, everything looks right and then starts to look wrong. The numbers mess with you after a while. Thanks.

I just update the figures and the text.
 

Members online


Top