 Pattern matching

In computer science, pattern matching is the act of checking some sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact. The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations (if any) of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence (i.e., search and replace).
Sequence patterns (e.g., a text string) are often described using regular expressions and matched using techniques such as backtracking.
Tree patterns are used in some programming languages as a general tool to process data based on its structure, e.g., Haskell, ML and the symbolic mathematics language Mathematica have special syntax for expressing tree patterns and a language construct for conditional execution and value retrieval based on it. For simplicity and efficiency reasons, these tree patterns lack some features that are available in regular expressions.
Often it is possible to give alternative patterns that are tried one by one, which yields a powerful conditional programming construct. Pattern matching sometimes include support for guards.
Term rewriting and Graph rewriting languages rely on pattern matching for the fundamental way a program evaluates into a result.
Contents
History
The first computer programs to use pattern matching were text editors. At Bell Labs, Ken Thompson extended the seeking and replacing features of the QED editor to accept regular expressions. Early programming languages with pattern matching constructs include SNOBOL from 1962, SASL from 1976, NPL from 1977, and KRC from 1981. The first programming language with treebased pattern matching features was Fred McBride's extension of LISP, in 1970.^{[1]}
See also: Regular expression#HistoryPrimitive patterns
The simplest pattern in pattern matching is an explicit value or a variable. For an example, consider a simple function definition in Haskell syntax (function parameters are not in parentheses but are separated by spaces, = is not assignment but definition):
f 0 = 1
Here, 0 is a single value pattern. Now, whenever f is given 0 as argument the pattern matches and the function returns 1. With any other argument, the matching and thus the function fail. As the syntax supports alternative patterns in function definitions, we can continue the definition extending it to take more generic arguments:
f n = n * f (n1)
Here, the first
n
is a single variable pattern, which will match absolutely any argument and bind it to name n to be used in the rest of the definition. In Haskell (unlike at least Hope), patterns are tried in order so the first definition still applies in the very specific case of the input being 0, while for any other argument the function returnsn * f (n1)
with n being the argument.The wildcard pattern (often written as
_
) is also simple: like a variable name, it matches any value, but does not bind the value to any name.Tree patterns
More complex patterns can be built from the primitive ones of the previous section, usually in the same way as values are built by combining other values. The difference then is that with variable and wildcard parts, a pattern doesn't build into single value, but matches a group of values that are the combination of the concrete elements and the elements that are allowed to vary within the structure of the pattern.
A tree pattern describes a part of a tree by starting with a node and specifying some branches and nodes and leaving some unspecified with a variable or wildcard pattern. It may help to think of the abstract syntax tree of a programming language and algebraic data types.
In Haskell, the following line defines an algebraic data type
Color
that has a single data constructorColorConstructor
that wraps an integer and a string.data Color = ColorConstructor Integer String
The constructor is a node in a tree and the integer and string are leaves in branches.
When we want to write functions to make
Color
an abstract data type, we wish to write functions to interface with the data type, and thus we want to extract some data from the data type, for example, just the string or just the integer part ofColor
.If we pass a variable that is of type Color, how can we get the data out of this variable? For example, for a function to get the integer part of
Color
, we can use a simple tree pattern and write:integerPart (ColorConstructor theInteger _) = theInteger
As well:
stringPart (ColorConstructor _ theString) = theString
The creations of these functions can be automated by Haskell's data record syntax.
Filtering data with patterns
Pattern matching can be used to filter data of a certain structure. For instance, in Haskell a list comprehension could be used for this kind of filtering:
[A xA x < [A 1, B 1, A 2, B 2]]
evaluates to
[A 1, A 2]
Pattern matching in Mathematica
In Mathematica, the only structure that exists is the tree, which is populated by symbols. In the Haskell syntax used thus far, this could be defined as
data SymbolTree = Symbol String [SymbolTree]
An example tree could then look like
Symbol "a" [Symbol "b" [], Symbol "c" []]
In the traditional, more suitable syntax, the symbols are written as they are and the levels of the tree are represented using [], so that for instance
a[b,c]
is a tree with a as the parent, and b and c as the children.A pattern in Mathematica involves putting "_" at positions in that tree. For instance, the pattern
A[_]
will match elements such as A[1], A[2], or more generally A[x] where x is any entity. In this case,
A
is the concrete element, while_
denotes the piece of tree that can be varied. A symbol prepended to_
binds the match to that variable name while a symbol appended to_
restricts the matches to nodes of that symbol.The Mathematica function
Cases
filters elements of the first argument that match the pattern in the second argument:Cases[{a[1], b[1], a[2], b[2]}, a[_] ]
evaluates to
{a[1], a[2]}
Pattern matching applies to the structure of expressions. In the example below,
Cases[ {a[b], a[b, c], a[b[c], d], a[b[c], d[e]], a[b[c], d, e]}, a[b[_], _] ]
returns
{a[b[c],d], a[b[c],d[e]]}
because only these elements will match the pattern
a[b[_],_]
above.In Mathematica, it is also possible to extract structures as they are created in the course of computation, regardless of how or where they appear. The function
Trace
can be used to monitor a computation, and return the elements that arise which match a pattern. For example, we can define the Fibonacci sequence asfib[01]:=1 fib[n_]:= fib[n1] + fib[n2]
Then, we can ask the question: Given fib[3], what is the sequence of recursive Fibonacci calls?
Trace[fib[3], fib[_]]
returns a structure that represents the occurrences of the pattern
fib[_]
in the computational structure:{fib[3],{fib[2],{fib[1]},{fib[0]}},{fib[1]}}
Declarative programming
In symbolic programming languages, it is easy to have patterns as arguments to functions or as elements of data structures. A consequence of this is the ability to use patterns to declaratively make statements about pieces of data and to flexibly instruct functions how to operate.
For instance, the Mathematica function
Compile
can be used to make more efficient versions of the code. In the following example the details do not particularly matter; what matters is that the subexpression {{com[_], Integer}} instructsCompile
that expressions of the formcom[_]
can be assumed to be integers for the purposes of compilation:com[i_] := Binomial[2i, i] Compile[{x, {i, _Integer}}, x^com[i], {{com[_], Integer}}]
Mailboxes in Erlang also work this way.
The CurryHoward correspondence between proofs and programs relates MLstyle pattern matching to case analysis and proof by exhaustion.
Pattern matching and strings
By far the most common form of pattern matching involves strings of characters. In many programming languages, a particular syntax of strings is used to represent regular expressions, which are patterns describing string characters.
However, it is possible to perform some string pattern matching within the same framework that has been discussed throughout this article.
Tree patterns for strings
In Mathematica, strings are represented as trees of root StringExpression and all the characters in order as children of the root. Thus, to match "any amount of trailing characters", a new wildcard ___ is needed in contrast to _ that would match only a single character.
In Haskell and functional programming languages in general, strings are represented as functional lists of characters. A functional list is defined as an empty list, or an element constructed on an existing list. In Haskell syntax:
[]  an empty list x:xs—an element x constructed on a list xs
The structure for a list with some elements is thus
element:list
. When pattern matching, we assert that a certain piece of data is equal to a certain pattern. For example, in the function:head (element:list) = element
we assert that the first element of
head
's argument is called element, and the function returns this. We know that this is the first element because of the way lists are defined, a single element constructed onto a list. This single element must be the first. The empty list would not match the pattern at all, as an empty list does not have a head (the first element that is constructed).In the example, we have no use for
list
, so we can disregard it, and thus write the function:head (element:_) = element
The equivalent Mathematica transformation is expressed as
head[element , ]:=element
Example string patterns
In Mathematica, for instance,
StringExpression["a", ]
will match a string that has two characters and begins with "a".
The same pattern in Haskell:
['a', _]
Symbolic entities can be introduced to represent many different classes of relevant features of a string. For instance,
StringExpression[LetterCharacter, DigitCharacter]
will match a string that consists of a letter first, and then a number.
In Haskell, guards could be used to achieve the same matches:
[letter, digit]  isAlpha letter && isDigit digit
The main advantage of symbolic string manipulation is that it can be completely integrated with the rest of the programming language, rather than being a separate, special purpose subunit. The entire power of the language can be leveraged to built up the patterns themselves or analyze and transform the programs that contain them.
SNOBOL
Main article: SNOBOLSNOBOL (String Oriented Symbolic Language) is a computer programming language developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky.
SNOBOL4 stands apart from most programming languages by having patterns as a firstclass data type (i.e. a data type whose values can be manipulated in all ways permitted to any other data type in the programming language) and by providing operators for pattern concatenation and alternation. Strings generated during execution can be treated as programs and executed.
SNOBOL was quite widely taught in larger US universities in the late 1960s and early 1970s and was widely used in the 1970s and 1980s as a text manipulation language in the humanities.
Since SNOBOL's creation, newer languages such as Awk and Perl have made string manipulation by means of regular expressions fashionable. SNOBOL4 patterns, however, subsume BNF grammars, which are equivalent to Contextfree grammars and more powerful than regular expressions ^{[2]}
See also
 AIML for an AI language based on matching patterns in speech
 AWK language
 Coccinelle pattern matches C source code
 glob (programming)
 Pattern recognition for fuzzy patterns
 PCRE Perl Compatible Regular Expressions, a common modern implementation of string pattern matching ported to many languages
 REBOL parse dialect for pattern matching used to implement language dialects
 Tom (pattern matching language)
 SNOBOL for a programming language based on one kind of pattern matching
 Unification, a similar concept in Prolog
References
 The Mathematica Book, chapter Section 2.3: Patterns
 The Haskell 98 Report, chapter 3.17 Pattern Matching.
 Python Reference Manual, chapter 6.3 Assignment statements.
 The Pure Programming Language, chapter 4.3: Patterns
 ^ http://www.cs.nott.ac.uk/~ctm/view.ps.gz
 ^ Gimpel, J. F. 1973. A theory of discrete patterns and their implementation in SNOBOL4. Commun. ACM 16, 2 (Feb. 1973), 91100. DOI=http://doi.acm.org/10.1145/361952.361960
External links
 A Gentle Introduction to Haskell: Patterns
 Views: An Extension to Haskell Pattern Matching
 Nikolaas N. Oosterhof, Philip K. F. Hölzenspies, and Jan Kuper. Application patterns. A presentation at Trends in Functional Programming, 2005
 JMatch: the Java programming language extended with pattern matching
 An incomplete history of the QED Text Editor by Dennis Ritchie  provides the history of regular expressions in computer programs
 The Implementation of Functional Programming Languages, pages 53103 Simon Peyton Jones, published by Prentice Hall, 1987.
 Nemerle, pattern matching.
 Erlang, pattern matching.
 Prop: a C++ based pattern matching language, 1999
 Temur Kutsia. Flat Matching. Journal of Symbolic Computation 43(12):858—873. Describes in details flat matching in Mathematica.
Categories: Pattern matching
 Conditional constructs
 Articles with example Haskell code
Wikimedia Foundation. 2010.
Look at other dictionaries:
Pattern matching — Pat tern match ing, n. [See {pattern}.] a technique in automated data analysis, usually performed on a computer, by which a group of characteristic properties of an unknown object is compared with the comparable groups of characteristics of a set … The Collaborative International Dictionary of English
Pattern matching — (engl. für Musterabgleich) oder musterbasierte Suche ist ein Begriff für symbolverarbeitende Verfahren, die anhand eines vorgegebenen Musters diskrete Strukturen oder Teilmengen einer diskreten Struktur identifizieren. Inhaltsverzeichnis 1… … Deutsch Wikipedia
Pattern Matching — (engl. für Musterabgleich) oder musterbasierte Suche ist ein Begriff für symbolverarbeitende Verfahren, die anhand eines vorgegebenen Musters diskrete Strukturen oder Teilmengen einer diskreten Struktur identifizieren. Inhaltsverzeichnis 1… … Deutsch Wikipedia
Pattern Matching — [dt. »Musterabgleich«], OCR … UniversalLexikon
pattern matching — palyginimas su pavyzdžiu statusas T sritis automatika atitikmenys: angl. pattern matching vok. Vergleich mit dem Muster, m rus. сопоставление с образцом, n; сравнение с эталоном, n pranc. concordance par exemple, f … Automatikos terminų žodynas
Pattern matching — Сопоставление с образцом, отождествление … Краткий толковый словарь по полиграфии
pattern matching — ● ►en loc. m. Version anglaise de filtrage, avec un filtre du premier sens. Sinon, pour un filtre de la deuxième définition, les anglais et les anglo saxons en général disent Filter ou Filtering … Dictionnaire d'informatique francophone
Compressed pattern matching — In computer science Compressed Pattern Matching or CPM is the process of searching for pattern in compressed data with little or no decompression. Searching in a compressed string is faster than searching an uncompressed string and requires less… … Wikipedia
Tom (pattern matching language) — Infobox Software name = Tom paradigm = Pattern matching caption = developer = latest release version = 2.5 latest release date = 2007 07 09 latest preview version = latest preview date = operating system = Cross platform platform = genre =… … Wikipedia
Pattern recognition — is a sub topic of machine learning. It is the act of taking in raw data and taking an action based on the category of the data .citation neededdate=September 2008 Most research in pattern recognition is about methods for supervised learning and… … Wikipedia