# Scapegoat tree

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Igal Galperin and Ronald L. Rivest. It provides worst-case O(log "n") lookup time, and O(log "n") amortized insertion and deletion time.

Unlike other self-balancing binary search trees that provide worst case O(log "n") lookup time, scapegoat trees have no additional per-node overhead compared to a regular binary search tree. [ Citation | first1=Igal | last1=Galperin | first2=Ronald L. | last2=Rivest | title=Scapegoat trees | journal=Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms | pages=pp. 165-174 | year=1993 | url=http://portal.acm.org/citation.cfm?id=313676 ] This makes scapegoat trees easier to implement and, due to data structure alignment, can reduce node overhead by up to one-third.

Theory

A binary search tree is said to be weight balanced if half the nodes are on the left of the root, and half on the right.An α-weight-balanced is therefore defined as meeting the following conditions: size(left) <= α*size(node) size(right) <= α*size(node)Where size can be defined recursively as: function size(node) if node = nil return 0 else return size(node->left) + size(node->right) + 1 end

An α of 1 therefore would describe a linked list as balanced, where as an α of 0.5 would only match almost complete binary trees.

A binary search tree that is α-weight-balanced must also be α-height-balanced, that is height(tree) <= log1/α(NodeCount)

Scapegoat trees are not guaranteed to keep α-weight-balance at all times, but are always loosely α-height-balance in that height(scapegoat tree) <= log1/α(NodeCount) + 1

This makes scapegoat trees similar to red-black trees in that they both have restrictions on their height. They differ greatly though in their implementations of determining where the rotations (or in the case of scapegoat trees, rebalances) take place. Whereas red-black trees store additional 'color' information in each node to determine the location, scapegoat trees find a scapegoat which isn't α-weight-balanced to perform the rebalance operation on. This is loosely similar to AVL trees, in that the actual rotations depend on 'balances' of nodes. The means of determining the balance differs greatly though, as AVL trees check the balance value on every insertion/deletion it is typically stored in each node, scapegoat trees are able to calculate it only as needed, which is only when a scapegoat needs to be found.

Unlike most other self-balancing search trees, scapegoat trees are entirely flexible as to their balancing. They support any α such that 0.5 <= α < 1. A high α value results in fewer balances, making insertion quicker but lookups and deletions slower, and vice versa for a low α. Therefore in practical applications, an α can be chosen depending on how frequently these actions should be performed.

Operations

Insertion

Insertion is implemented very similarly to an unbalanced binary search tree, however with a few key changes.

When finding the insertion point, the depth of the new node must also be recorded. This is implemented via a simple counter that gets incremented during each iteration of the lookup, effectively counting the number of edges between the root and the inserted node. If this node violates the α-height-balance property (defined above), a rebalance is required.

To rebalance, an entire subtree rooted at a scapegoat undergoes a balancing operation. The scapegoat is defined as being an ancestor of the inserted node which isn't α-weight-balanced. There will always be at least one such ancestor. Rebalancing any of them will restore the α-height-balanced property.

One way of finding a scapegoat, is to climb from the new node back up to the root and select the first node that isn't α-weight-balanced.

Climbing back up to the root requires O(log "n") storage space, usually allocated on the stack, or parent pointers. This can actually be avoided by pointing each child at its parent as you go down, and repairing on the walk back up.

To determine whether a potential node is a viable scapegoat, we need to check its α-weight-balanced property. To do this we can go back to the definition: size(left) <= α*size(node) size(right) <= α*size(node)However a large optimisation can be made by realising that we already know two of the three sizes, leaving only the third having to be calculated.

Consider the following example to demonstrate this. Assuming that we're climbing back up to the root: size(parent) = size(node) + size(brother) + 1But as: size(inserted node) = 1.The case is trivialized down to: size [x+1] = size [x] + size(brother) + 1Where x = this node, x + 1 = parent and size(brother) is the only function call actually required.

Once the scapegoat is found, a standard binary search tree rebalance operation is performed.

As rebalance operations take O(n) time dependent on the number of nodes of the subtree, insertion has a worst case performance of O(n) time, however amortized has O(log "n") average time.

ketch of proof for cost of insertion

Define the Imbalance of a node "v" to be the absolute value of the difference in size between its left node and right node minus 1, or 0, whichever is greater. In other words:

I("v") = max(|left("v") - right("v")| - 1, 0)

Immediately after rebuilding a subtree rooted at "v", I("v") = 0.

Lemma: Immediately before rebuilding the subtree rooted at "v", I("v") = &Omega;(|"v"|)

Proof of lemma:

Let "v0" be the root of a subtree immediately after rebuilding. h("v0") = log(|"v0"| + 1). If there are Ω(|"v0"|) degenerate insertions (that is, where each inserted node increases the height by 1), then I("v") = Ω(|"v0"|), h("v") = h("v0") + Ω(|"v0"|) and log(|"v"|) ≤ log(|"v0"| + 1) + 1.

Since I("v") = Ω(|"v"|) before rebuilding, there were Ω(|"v"|) insertions into the subtree rooted at "v" that did not result in rebuilding. Each of these insertions can be performed in O(log "n") time. The final insertion that causes rebuilding costs O(|"v"|). Using aggregate analysis it becomes clear that the amortized cost of an insertion is O(log "n"):

$\left\{Omega \left(|v|\right) O\left(log\left\{n\right\}\right) + O\left(|v|\right) over Omega \left(|v|\right)\right\} = O\left(log\left\{n\right\}\right)$

The deletion operation

Scapegoat trees are unusual in that deletion is easier than insertion. To enable deletion, scapegoat trees need to store an additional value with the tree data structure. This property, which we will call MaxNodeCount simply represents the highest achieved NodeCount. It is set to NodeCount whenever the entire tree is rebalanced, and after insertion is set to max(MaxNodeCount, NodeCount).

To perform a deletion, we simply remove the node as you would in a simple binary search tree, but if NodeCount <= MaxNodeCount / 2then we rebalance the entire tree about the root, remembering to set MaxNodeCount to NodeCount.

This gives deletion its worst case performance of O(n) time, however it is amortized to O(log "n") average time.

ketch of proof for cost of deletion

Suppose the scapegoat tree has "n" elements and has just been rebuilt (in other words, it is a complete binary tree). At most "n"/2 - 1 deletions can be performed before the tree must be rebuilt. Each of these deletions take O(log "n") time (the amount of time to search for the element and flag it as deleted). The "n"/2 deletion causes the tree to be rebuilt and takes O(log "n") + O(n) (or just O(n)) time. Using aggregate analysis it becomes clear that the amortized cost of a deletion is O(log "n"):

$\left\{sum_\left\{1\right\}^$n over 2 O(log{n}) + O(n) over {n over 2 = n over 2}O(log{n}) + O(n) over {n over 2 = O(log{n})

Lookup

Lookup is not modified from a standard binary search tree, and has a worst-case time of O(log "n"). This is in contrast to splay trees which have a worst-case time of O("n"). The reduced node overhead compared to other self-balancing binary search trees can further improve locality of reference and caching.

References

*

ee also

* splay tree

* [http://people.ksp.sk/~kuko/bak/index.html Scapegoat Tree Applet] by Kubo Kovac
* [http://cg.scs.carleton.ca/~morin/teaching/5408/refs/gr93.pdf Scapegoat Trees: the original publication describing scapegoat trees]
* [http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-700.pdf On Consulting a Set of Experts and Searching (full version paper)]

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Scapegoat (disambiguation) — A scapegoat is a person unfairly blamed for some misfortune, or an actual goat used in a Jewish ritual. It may also refer to:;Songs: * Scapegoat , on the 1982 album Under the Flag by Fad Gadget * Scapegoat , from the debut album Soul of a New… …   Wikipedia

• Tree (data structure) — A simple unordered tree; in this diagram, the node labeled 7 has two children, labeled 2 and 6, and one parent, labeled 2. The root node, at the top, has no parent. In computer science, a tree is a widely used data structure that emulates a… …   Wikipedia

• Scapegoat Wilderness — Infobox protected area | name = Scapegoat Wilderness iucn category = Ib caption = locator x = 70 locator y = 25 location = Montana, USA nearest city = Missoula, MT lat degrees = 47 lat minutes = 07 lat seconds = 0 lat direction = N long degrees …   Wikipedia

• Binary tree — Not to be confused with B tree. A simple binary tree of size 9 and height 3, with a root node whose value is 2. The above tree is unbalanced and not sorted. In computer science, a binary tree is a tree data structure in which each node has at… …   Wikipedia

• Radix tree — In computer science, a radix tree (also patricia trie or radix trie) is a space optimized trie data structure where each node with only one child is merged with its child. The result is that every internal node has at least two children. Unlike… …   Wikipedia

• Hash tree — A binary hash tree In cryptography and computer science Hash trees or Merkle trees are a type of data structure[citation needed] which contains a tree of summary information about a larger piece of da …   Wikipedia

• Cover tree — The cover tree is a type of data structure in computer science that is specifically designed to facilitate the speed up of a nearest neighbor search. It is a refinement of the Navigating Net data structure, and related to a variety of other data… …   Wikipedia

• Splay tree — A splay tree is a self balancing binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, look up and removal in O(log(n)) amortized time. For many… …   Wikipedia

• Dancing tree — For the film Dancing tree, see Dancing tree (film) In computer science, a dancing tree is a tree data structure, which is similar to B+ tree. Invented by Hans Reiser, for use by the Reiser4 file system. As opposed to self balancing binary search… …   Wikipedia

• Metric tree — This article is about the data structure. For the type of metric space, see Real tree. A metric tree is any tree data structure specialized to index data in metric spaces. Metric trees exploit properties of metric spaces such as the triangle… …   Wikipedia