Package-merge algorithm

The Package-Merge Algorithm is an "O(nL)"-time algorithm for finding anoptimal length-limited Huffman code for a given distribution on a given alphabet of size "n", where no code word is longer than "L". It is a greedy algorithm, and a generalization of Huffman's original algorithm. Package-merge works by reducing the code construction problem to the binary Coin Collector's problem.L. L. Larmore and D. S. Hirschberg. A fast algorithm for optimal length-limited Huffman codes. Journal of the Association for Computing Machinery, V 37 No. 3:464--473, 1990.]

The Coin Collector's Problem

Suppose a coin collector has a number of coins of various denominations, each of which has a numismatic value. The coin collector has run outof money and needs to use some of his coin collection to buy something of cost "N". He wishes to select a subset of coins from his collection of minimum numismatic value whose denominations total "N". The binary version of this problem is that all denominations are powers of 2, that is, 1, 1/2, 1/4, etc. dollars.

Description of the Package-Merge Algorithm

Assume that the largest denomination is 1 dollar, and that N is an integer. (The algorithm works even if these assumptions do not hold, by trivial modifications.) The coin collector first separates his coins into lists, one for each denomination, sorted by numismatic value. He then packages the smallest denomination coins in pairs, starting from the pair of smallest total numismatic value. If there is one coin left over, it will be the coin of highest numismatic value of that denomination, and it is set aside and ignored henceforth. These packages are then merged into the list of coins of the next smallest denomination, again in order of numismatic value. The items in that list are then packaged in pairs, and merged into the next smallest list, and so forth. Finally, there is a list of items, each of which is a 1 dollar coin or a package consisting of two or more smaller coins whose denominations total 1 dollar. They are also sorted in order of numismatic value. The coin collector then selects the least value N of them. Note that the time of the algorithm is linear in the number of coins.

Reduction of Length-Limited Huffman Coding to the Coin Collector's Problem

Let "L" be the maximum length any code word is permitted to have.Let "p"1, …, "pn" be the frequencies of thesymbols of the alphabet to be encoded. We first sort the symbols so that "p""i" ≤ "p""i"+1. Create "L" coins for each symbol, of denominations 2−1, …, 2−"L", each of numismatic value "pi". Use the package-merge algorithm to select the set of coins of minimum numismatic value whose denominations total "n" − 1. Let "hi" be the number of coins of numismatic value "pi" selected. The optimal length-limited Huffman code will encode symbol "i" with a bit string of length "hi". The canonical Huffman code can easily be constructed by a simple bottom-up greedy method, given that the "hi" are known, and this can be the basis for fast data compressionA. Moffat and A. Turpin. On the implementation of minimum redundancy prefix codes. IEEE Transactions on Communications. V 45 No. 10:1200--1207, Oct 1997. ( [http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=634683|link] )] .

Performance Improvements and Generalizations

With this reduction, the algorithm is "O(nL)"-time and "O(nL)"-space. However, the original paper, "A fast algorithm for optimal length-limited Huffman codes," shows how this can be improved to "O(nL)"-time and "O(n)"-space. The idea is to run the algorithm a first time, only keeping enough data to be able to determine two equivalent subproblems that sum to half the size of the original problem. This is done recursively, resulting in an algorithm that takes about twice as long but requires only linear space.

Many other improvements have been made to the Package-Merge Algorithm to reduce the multiplicative constant and to make it faster in special cases, such as those problems having repeated "pi"sI. H. Witten and A. Moffat and T. Bell. "Managing Gigabytes". Second Edition. Morgan Kaufmann Publishers, 1999. ISBN 978-1-55860-570-1.] . The Package-Merge approach has also been adapted to related problems such as alphabetic codingL. L. Larmore and T. M. Przytycka. A Fast Algorithm for Optimal Height-Limited Alphabetic Binary-Trees. SIAM Journal on Computing, V 23 No. 6:1283--1312, 1994.] .

Methods involving graph theory have been shown to have better asymptotic space complexity than the Package-Merge Algorithm, but these have not seen as much practical application.

External links

* Michael B. Baer " [http://132.236.180.11/pdf/cs.IT/0602085 Twenty (or so) Questions: $D$-ary Length-Bounded Prefix Coding.] " (PDF)
* Alistair Moffat and Andrew Turpin " [http://www.cs.mu.oz.au/~alistair/abstracts/dcc95.pm.html Space-Efficient Construction of Optimal Prefix Codes.] "
* An implementation of the package-merge algorithm " [http://www.koders.com/delphi/fid65738C1383AB6F1E0D556659A91CB5F85B09C941.aspx?s=algorithm] "

References


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Merge — See Help:Merging for the usage of Merge in Wikipedia. Contents 1 Concepts 2 Computer science 3 Music …   Wikipedia

  • Huffman coding — Huffman tree generated from the exact frequencies of the text this is an example of a huffman tree . The frequencies and codes of each character are below. Encoding the sentence with this code requires 135 bits, as opposed of 288 bits if 36… …   Wikipedia

  • List of mathematics articles (P) — NOTOC P P = NP problem P adic analysis P adic number P adic order P compact group P group P² irreducible P Laplacian P matrix P rep P value P vector P y method Pacific Journal of Mathematics Package merge algorithm Packed storage matrix Packing… …   Wikipedia

  • Lawrence L. Larmore — Professor Lawrence L. Larmore is a theoretical computer scientist, and a professor at University of Nevada Las Vegas. He is best known for his work with competitive analysis of online algorithms, particularly for the k server problem. His… …   Wikipedia

  • List of algorithms — The following is a list of the algorithms described in Wikipedia. See also the list of data structures, list of algorithm general topics and list of terms relating to algorithms and data structures.If you intend to describe a new algorithm,… …   Wikipedia

  • Software tools for molecular microscopy — There are a large number of software tools or software applications that have been specifically developed for the field sometimes referred to as molecular microscopy or cryo electron microscopy or cryoEM. Several special issues of the Journal of… …   Wikipedia

  • Quadratic sieve — The quadratic sieve algorithm (QS) is a modern integer factorization algorithm and, in practice, the second fastest method known (after the general number field sieve). It is still the fastest for integers under 100 decimal digits or so, and is… …   Wikipedia

  • diff — This article is about the file comparison utility. For other uses, see DIFF (disambiguation). Diffs redirects here. For the American punk rock group, see The Diffs. In computing, diff is a file comparison utility that outputs the differences… …   Wikipedia

  • Compiz — Screenshot showing the Cube plugin for Compiz on Fedora. Dev …   Wikipedia

  • Mathematics and Physical Sciences — ▪ 2003 Introduction Mathematics       Mathematics in 2002 was marked by two discoveries in number theory. The first may have practical implications; the second satisfied a 150 year old curiosity.       Computer scientist Manindra Agrawal of the… …   Universalium


Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.