# Basic Algorithms: Lecture 16

================ Start Lecture #16 ================

#### The new methods: closest Key/Item Before/After

Extend the data structure by storing at each node a bit saying whether this node is the left or right child of its parent.

Problem set #3, problem 2a: How can we eliminate the bit just introduced? That is give a Θ(1) algorithm leftOrRight(T,v) that determines whether node v of binary tree T is a left or right child (it signals an error if v is the root).

With the new bit (or with your solution to problem 2a) we can find the closestKeyBefore (and friends).

Look at the figure to the right. What should we do if we arrived at 10 and are looking for the closest key before 5?
Ans: Get the internal node that occurs just before 10 in a preorder traversal.

What if we want the closest key before 15?
Ans: It is node 10.

What if we arrived at 30 and wanted closest key before 25?
Ans: It is the node 20.

What if we wanted the closest key before 35?
Ans: It is the node 30?

What if we wanted the closest key after something?
Ans: It is one of 10, 20, 30, or the next internal node after 30 in a preorder traversal.

To state this in general, assume that we are searching for a key k that is not present in a binary search tree T. The search will end at a leaf l. The last key comparison we made was between k and key(v), with v the parent of l.

Theorem:

1. If k<key(v) and T.left(v) is a leaf, then key(w)<k<key(v), where w is the node preceding v in a preorder traversal (i.e. w is the next smaller node than v).

2. If key(v)<k and T.right(v) is a leaf, then key(v)<k<key(w), where w is the node following v in a preorder traversal (i.e., w is the next larger node than v).

Proof: We just prove the first; the second is similar.

We know k<key(v) and w contains the next smaller key. Note that w must be exactly as shown to the right (the dashed line can be zero length, i.e. x could be v). Since the search for k reached v, we went right at w so k>key(w) as desired.

Corollary: Finding the closest Key/Item Before/After simply requires a search followed by nextInternalInPreorder or prevInternalInPreorder.

Problem set #3, problem 2b: Write the algorithm
nextInternalInPreorder(tree T, internal node v) that finds the next internal node after the internal node v in a preorder traversal (signal an error if v is the last internal node in a preorder traversal). You should use the extra bit added above to determine if a node is a left or right child.

Problem set #3, problem 2c (end of problem 2): Write an algorithm for the method closestKeyBefore(k).

### 3.1.4 Insertion in a Binary Search Tree

To insert an item with key k, first execute w←TreeSearch(k,T.root()). Recall that if w is internal, k is already in w, and if w is a leaf, k "belongs" in w. So we proceed as follows.

• If w is a leaf, replace w with an internal node containing k (having two leaves as children).
• If w is internal and duplicate keys are not permitted, signal an error.
• If w is internal and duplicate keys are permitted, call w=TreeSearch(k,T.leftChild(v)) or w=TreeSearch(k,T.rightChild(v)) and proceed as above.

Draw examples on the board showing both cases (leaf and internal returned).

Once again we perform a constant amount of work per level of the tree implying that the complexity is O(height).

### 3.1.5 Removal in a Binary Search Tree

This is the trickiest part, especially in one case as we describe below. The key concern is that we cannot simply remove an item from an internal node and leave a hole as this would make future searches fail. The beginning of the removal technique is familiar: w=TreeSearch(k,T.root()). If w is a leaf, k is not present, which we signal.

If w is internal, we have found k, but now the fun begins. Returning the element with key k is easy, it is the element stored in w. We need to actually remove w, but we cannot leave a hole. There are three cases.

1. If we are lucky both of w's children are leaves. Then we can simply replace w with a leaf. (Recall that leaves do not contain items.) This is the trivial case

2. The next case is when one child of w is a leaf and the other, call it z, is an internal node. In this case we can simply replace w by z; that is have the parent of w now point to z. This removes w as desired and also removes the leaf child of w, which is OK since leaves do not contain items. This is the easy case.

3. Note that the above two cases can be considered the same. In both cases we notice that one child of w is a leaf and replace w by the other child (and its descendents, if any).

4. Now we get to the difficult case: both children of w are internal nodes. What we will do is to replace the item in w with the item that has the next highest key.

• First we must find the item with the next highest key. But that is simply the next item in the inorder traversal. So we go right and then keep going left until we get a leaf. The parent of this leaf is the item we seek. Call the parent y.

• Store the item in y in the node w. This removes the old item of w, which we wanted to do.
• Does the tree still have its items in the correct order? That is are parents still bigger than (or equal to if we permit duplicate keys) all of the left subtree and smaller than all of the right subtree?
• Yes. The only new parent is the item y which has now moved to node w. But this is the item right after the old item in w. Since it came from the right subtree it is bigger than the left and since it was the smallest in the right, it is smaller than the right.

• But what about the old node y? It's left child is a leaf so it is the easy or trivial case and we just replace y by the other child and its descendants.

It is again true that the complexity is O(height) but this is not quite as easy to see as before. We start with a TreeSearch, which is Θ(height). This gets us to w. The most difficult case is the last one where w has two internal children. We spend a non-constant time on node w because we need to find the next item. But this operation is only O(height) and we simply descend the tree deeper.

### 3.1.6 Performance of Binary Search Trees

Time complexity of the binary search tree ADT. We use h for the height and s for the number of elements in the tree.
MethodTime
size, isEmptyO(1)
findElement, insertItem, removeElementO(h)
findAllElements, removeAllElementsO(h+s)

We have seen that findElement, insertItem, and removeElement have complexity O(height). It is also true, but we will not show it, that one can implement findAllElements and removeAllElements in time O(height+numberOfElements). You might think removeAllElements should be constant time since the resulting tree is just a root so we can make it in constant time. But removeAllElements must also return an iterator that when invoked must generate each of the elements removed.

#### Comments on average vs worst case behavior

In a sense that we will not make precise, binary search trees have logarithmic performance since `most' trees have logarithmic height.

Nonetheless we know that there are trees with height Θ(n). You produced several of them for problem set #2. For these trees binary search takes linear time, i.e., is slow. Our goal now is to fancy up the implementation so that the trees are never very high. We can do this since the trees are not handed to us. Instead they are build up using our insertItem method.

Allan Gottlieb