Home Practices Page 4 - Trees

# Putting it Together - Practices

Trees are remarkably useful and powerful data structures, with many applications. Mohamed El Dawy explains.

Rating:  / 23
February 01, 2005

SEARCH DEV SHED

TOOLS YOU CAN USE

So, what is good about this weird representation we have? First, given a word to find, it is easy to search for it. How? Just start at the root. See if it is the word you need. If it isn't, decide whether to go left or right. This is easy, because we know all the elements to the left come alphabetically before the element at the root, and all those to the right come alphabetically after the one at the root. So, with a single comparison, we can decide whether to go left or right.

Even better, because the same condition applies recursively, we can repeat the same at each step, till we either find the element we are looking for, or we hit a null pointer, where we know the element does not exist. This is as good as binary search (assuming the tree is balanced), because with every comparison, you reject half the remaining elements (by deciding to go left, you reject all the elements to the right, or vice versa), just as you would in a binary search.

So searching is easy. What about insertion? Insertion is pretty easy too. In fact, it is similar to searching. To insert a new word, we take the same steps we used to search for it (start at the root, decide to go left or right by comparing the word you are trying to insert with the word at the root, and repeat) When we find a null pointer, we insert a new node with the new word at this place.

So, insertion and searching are both easy. This is not something to take lightly.  As we saw earlier, neither linked lists nor arrays could offer the same kind of behavior. By the way, deletion is easy too, as is range searching (listing all words between 2 words). But both are outside the scope of this article.

So, let's try to pull everything together. We start by writing the source code necessary to implement searching and insertion. Let's start with searching, since it is the easier of the two.

class wordMeaningPair
{
String word;
String meaning;
wordMeaningPair left;
wordMeaningPair right;
}

class Dictionary
{
wordMeaningPair dict;

String lookup(String word)
{//given a word, return the meaning of it. Or null if not found
wordMeaningPair srchNode=dict; //start at top
while(srchNode!=null)
{
if(srchNode.word.compareTo(word)==0)
return srchNode.meaning;
if(srchNode.word.compareTo(word)<0)
srchNode=srchNode.right;
else
srchNode=srchNode.left;
}
return null;
}
}

All we do is look at the node, and decide whether to go left or right based on the comparison. If we find the word we are looking for, we return it immediately (this is the return statement inside the loop). If the while loop runs till the end (a null pointer is found), we know the word is not there, and we return null.

Insertion is only slightly trickier. Let's have a look at the code. This method should be a member of the Dictionary class above.

void insert(String word, String meaning)
{
wordMeaningPair newWord=new wordMeaningPair();
newWord.word=new String(word);
newWord.meaning=new String(meaning);
newWord.left=newWord.right=null;
wordMeaningPair srchNode=dict;
wordMeaningPair prev=null;
if(dict==null)
{
dict=newWord;
return;
}

while(srchNode!=null)
{
prev=srchNode;
if(srchNode.word.compareTo(word)>0)
srchNode=srchNode.right;
else
srchNode=srchNode.left;
}
if(word.compareTo(prev.word)>0)
prev.right=newWord;
else
prev.left=newWord;
}

What does this code do? Actually, it's very similar to the searching code. First, the code creates a new node, then it inserts the word and its meaning into the node. Up till that point, the node is totally isolated from the tree! It is not connected to anything. So the next thing we need to do is get it into place.

First, if the whole tree is empty (dictionary is null), we simply create a new tree containing the one newly created node, and return immediately. (This means that we let dict point to newWord, and return).

Next, we use the same searching mechanism. We start at the root, and compare the word we are trying to insert to the word at the root, and decide whether to move left or right. We keep doing this until we hit a null pointer.

The only new part of the code is the introduction of the prev reference. What is that? This is simple. If we keep following the nodes of the tree, as we did in the searching example, we will end up with a null pointer, and nothing to do! Which is pretty funny, but totally useless.

Instead, what we actually want to happen is that, when we meet a null pointer, the node found immediately before this node is made to point to the newly created node.

Take a look at the figure below. It will make things clearer.

Fig 9. Nodes visited before inserting a new node

When we try to insert a new element with a key of six, the code will pass through the three highlighted nodes (exactly as the searching code did). It will end up at the right link of the node "5." At this time, root is null, but prev points to the node "5," so we change the "right" reference of "5" to make it point to the newly created node "6."

Now, trees are great. But, is life really that good? Unfortunately, the answer is no. Sometimes trees cause all sorts of problems. Searching and insertion to a tree is fast and efficient, but only when the tree is nearly balanced. If the tree is not balanced, insertion and searching become unbelievably inefficient.

Unfortunately, the balancing of a tree depends on the order of inserting elements. Let's imagine you have 7 elements 1,2,3,4,…,7. If you inserted them in that exact order, you will end up with a tree that looks like this…

Fig 10. Trees can behave really badly if you aren't careful

Ewww.. that was pretty bad, wasn't it? Now, this looks similar to a linked list, which is not a very good thing.

Is there a way we can prevent trees from skewing this way? Luckily, there is! But explaining this will get complex. I will give an overview near the end. Meanwhile, let's turn our attention to a new problem. In this case, the data is actually best represented as a tree. This is the situation when you're dealing with an Internet browser.

 >>> More Practices Articles          >>> More By Mohamed Saad