Conditional random field

A conditional random field (CRF) is a statistical modelling method often applied in pattern recognition. More specifically it is a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between observations and construct consistent interpretations. It is often used for labeling or parsing of sequential data, such as natural language text or biological sequences^{[1]} and in computer vision^{[2]} . Specifically, CRFs find applications in shallow parsing^{[3]} , named entity recognition^{[4]} and gene finding, among other tasks, being an alternative to the related hidden Markov models. In computer vision, CRFs are often used for object recognition and image segmentation.
Contents
Description
Lafferty, McCallum and Pereira (2001) define a CRF on observations and random variables as follows:
Let G = (V,E) be a graph such that , so that is indexed by the vertices of G. Then is a conditional random field in case, when conditioned on , the random variables obey the Markov property with respect to the graph: , where w∼v means that w and v are neighbors in G.
What this means is that a CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets and , the observed and output variables, respectively; the conditional distribution is then modeled.
Inference
For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an MRF and the same arguments hold.^{[5]} However there exist special cases for which exact inference is feasible:
 If the graph is a chain or a tree, message passing algorithms yield exact solutions. The algorithms used in these cases are analogous to the forwardbackward and Viterbi algorithm for the case of HMMs.
 If the CRF only contains pairwise potentials and the energy is submodular, combinatorial min cut/max flow algorithms yield exact solutions.
If exact inference is impossible, several algorithms can be used to obtain approximate solutions. These include:
 Loopy belief propagation
 Alpha expansion
 Mean field inference
 Linear programming relaxations
Parameter Learning
Learning the parameters θ is usually done by maximum likelihood learning for p(Y_{i}  X_{i};θ). If all nodes have exponential family distributions and all nodes are observed during training, this optimization is convex.^{[5]} It can be solved for example using gradient descent algorithms QuasiNewton methods, such as the LBFGS algorithm. On the other hand, if some variables are unobserved, the inference problem has to be solved for these variables. This is intractable to do exact in general graphs, so approximations have to be used.
Examples
In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables X represents a sequence of observations and Y represents a hidden (or unknown) state variable that needs to be inferred given the observations. The Y_{i} are structured to form a chain, with an edge between each Y_{i − 1} and Y_{i}. As well as having a simple interpretation of the Y_{i} as "labels" for each element in the input sequence, this layout admits efficient algorithms for:
 model training, learning the conditional distributions between the Y_{i} and feature functions from some corpus of training data.
 inference, determining the probability of a given label sequence Y given X.
 decoding, determining the most likely label sequence Y given X.
The conditional dependency of each Y_{i} on X is defined through a fixed set of feature functions of the form f(i,Y_{i − 1},Y_{i},X), which can informally be thought of as measurements on the input sequence that partially determine the likelihood of each possible value for Y_{i}. The model assigns each feature a numerical weight and combines them to determine the probability of a certain value for Y_{i}.
Linearchain CRFs have many of the same applications as conceptually simpler hidden Markov models (HMMs), but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence.
Notably in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence X at any point during inference, and the range of the feature functions need not have a probabilistic interpretation.
Higherorder CRFs and semiMarkov CRFs
CRFs can be extended into higher order models by making each Y_{i} dependent on a fixed number o of previous variables Y_{i − o},...,Y_{i − 1}. Training and inference are only practical for small values of o (such as o ≤ 5),^{[citation needed]} since their computational cost increases exponentially with o. Largemargin models for structured prediction, such as the structured Support Vector Machine can be seen as an alternative training procedure to CRFs.
There exists another generalization of CRFs, the semiMarkov conditional random field (semiCRF), which models variablelength segmentations of the label sequence Y.^{[6]} This provides much of the power of higherorder CRFs to model longrange dependencies of the Y_{i}, at a reasonable computational cost.
Software
This is a partial list of software that implement generic CRF tools.
 GCO CRFs with submodular energy functions (C++, Matlab)
 GRMM General CRFs (Java)
 CRFall General CRFs (Matlab)
 Sarawagi's CRF Linearchain CRFs (Java)
 HCRF library Hiddenstate CRFs (C++, Matlab)
 Wapiti Fast linearchain CRFs (C)
 CRFSuite Fast restricted linearchain CRFs (C)
 CRF++ Linearchain CRFs (C++)
 Monte Python Linearchain CRFs (Python)
This is a partial list of software that implement CRF related tools.
 Conrad CRF based gene predictor (Java)
 Stanford NER Named Entity Recognizer (Java)
 BANNER Named Entity Recognizer (Java)
See also
 Graphical model
 Markov random field
 Maximum entropy Markov model (MEMM)
References
 ^ Lafferty, J., McCallum, A., Pereira, F. (2001). "Conditional random fields: Probabilistic models for segmenting and labeling sequence data". Proc. 18th International Conf. on Machine Learning. Morgan Kaufmann. pp. 282–289. http://www.cis.upenn.edu/~pereira/papers/crf.pdf.
 ^ He, X. and Zemel, R.S. and CarreiraPerpinñán, M.A. (2004). "Multiscale conditional random fields for image labeling". IEEE Computer Society. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.3.7826&rep=rep1&type=pdf.
 ^ Sha, F., Pereira, F. (2003). "shallow parsing with conditional random fields". http://portal.acm.org/ft_gateway.cfm?id=1073473&type=pdf&CFID=4684435&CFTOKEN=39459323.
 ^ Settles, B. (2004). "Biomedical named entity recognition using conditional random fields and rich feature sets". Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications. pp. 104107. http://acl.ldc.upenn.edu/coling2004/W1/pdf/21.pdf.
 ^ ^{a} ^{b} Sutton, Charles; McCallum, Andrew (2010). "An Introduction to Conditional Random Fields". arXiv:1011.4088v1 [stat.ML].
 ^ Sarawagi, Sunita; William W. Cohen (2005). "SemiMarkov conditional random fields for information extraction". In Lawrence K. Saul, Yair Weiss, Léon Bottou (eds.). Advances in Neural Information Processing Systems 17. Cambridge, MA: MIT Press. pp. 1185–1192.
Further reading
 McCallum, A.: Efficiently inducing features of conditional random fields. In: Proc. 19th Conference on Uncertainty in Artificial Intelligence. (2003)
 Wallach, H.M.: Conditional random fields: An introduction. Technical report MSCIS0421, University of Pennsylvania (2004)
 Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In "Introduction to Statistical Relational Learning". Edited by Lise Getoor and Ben Taskar. MIT Press. (2006) Online PDF
 Klinger, R., Tomanek, K.: Classical Probabilistic Models and Conditional Random Fields. Algorithm Engineering Report TR072013, Department of Computer Science, Dortmund University of Technology, December 2007. ISSN 18644503. Online PDF
Categories: Theoretical computer science
 Machine learning
 Graphical models
Wikimedia Foundation. 2010.
Look at other dictionaries:
Conditional Random Field — Ein Conditional Random Field (CRF) ist ein ungerichtetes graphisches Modell. Oft werden CRFs zum Taggen von sequentiellen Daten verwendet. Das bedeutet, das CRF erhält eine Sequenz X als Eingabe und gibt eine gleichlange Sequenz Y aus. Im… … Deutsch Wikipedia
Random field — A random field is a generalization of a stochastic process such that the underlying parameter need no longer be a simple real, but can instead be a multidimensional vector space or even a manifold.At its most basic, discrete case, a random field… … Wikipedia
Markov random field — A Markov random field, Markov network or undirected graphical model is a set of variables having a Markov property described by an undirected graph. A Markov random field is similar to a Bayesian network in its representation of dependencies. It… … Wikipedia
Hardware random number generator — This SSL Accelerator computer card uses a hardware random number generator to generate cryptographic keys to encrypt data sent over computer networks. In computing, a hardware random number generator is an apparatus that generates random numbers… … Wikipedia
List of mathematics articles (C) — NOTOC C C closed subgroup C minimal theory C normal subgroup C number C semiring C space C symmetry C* algebra C0 semigroup CA group Cabal (set theory) Cabibbo Kobayashi Maskawa matrix Cabinet projection Cable knot Cabri Geometry Cabtaxi number… … Wikipedia
Campo aleatorio condicional — Un campo aleatorio condicional (Conditional Random Field o CRF en inglés) es un modelo estocástico utilizado habitualmente para etiquetar y segmentar secuencias de datos o extraer información de documentos. En algunos contextos también se les… … Wikipedia Español
Markov network — A Markov network, or Markov random field, is a model of the (full) joint probability distribution of a set mathcal{X} of random variables having the Markov property. A Markov network is similar to a Bayesian network in its representation of… … Wikipedia
Hidden Markov model — Probabilistic parameters of a hidden Markov model (example) x mdash; states y mdash; possible observations a mdash; state transition probabilities b mdash; output probabilitiesA hidden Markov model (HMM) is a statistical model in which the system … Wikipedia
Oneshot learning — is an object categorization problem of current research interest in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one shot… … Wikipedia
Activity recognition — aims to recognize the actions and goals of one or more agents from a series of observations on the agents actions and the environmental conditions. Since the 1980s, this research field has captured the attention of several computer science… … Wikipedia