Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees (see tree (graph theory) and tree data structure).

History

TAG originated in investigations by Joshi and his students into the family of adjunction grammars (AG),cite paper
last = Joshi
first = Aravind
coauthors = S. R. Kosaraju, H. Yamada
year = 1969
publisher = Proceedings Tenth Annual Symposium on Automata Theory, Waterloo, Canada
] the "string grammar" of Zellig Harris. AGs handle endocentric properties of language in a natural and effective way, but do not have a good characterization of exocentric constructions; the converse is true of rewrite grammars, or phrase-structure grammar (PSG). In 1969, Joshi introduced a family of grammars that exploits this complementarity by mixing the two types of rules. A few very simple rewrite rules suffice to generate the vocabulary of strings for adjunction rules. This family is distinct from the Chomsky hierarchy but intersects it in interesting and linguistically relevant ways.cite paper
last = Joshi
first = Aravind
title = Properties of Formal Grammars with Mixed Types of Rules and Their Linguistic Relevance
year = 1969
publisher = Proceedings Third International Symposium on Computational Linguistics, Stockholm, Sweden
]

Description

The rules in a TAG are trees with a special leaf node known as the "foot node", which is anchored to a word. There are two types of basic trees in TAG: "initial" trees (often represented as '$alpha$') and "auxiliary" trees (''). Initial trees represent basic valency relations, while auxiliary tree allow for recursion.cite book
last = Jurafsky
first = Daniel
coauthors = James H. Martin
title = Speech and Language Processing
year = 2000
pages = 354
publisher = Prentice Hall
location = Upper Saddle River, NJ
] Auxiliary trees have the root (top) node and foot node labeled with the same symbol.A derivation starts with an initial tree, combining via either "substitution" or "adjunction". Substitution replaces a frontier node with another tree whose top node has the same label. Adjunction inserts an auxiliary tree into the center of another tree.cite conference
last = Joshi
first = Aravind
coauthors = Owen Rambow
title = A Formalism for Dependency Grammar Based on Tree Adjoining Grammar
year = 2003
booktitle = Proceedings of the Conference on Meaning-Text Theory
url = http://www1.cs.columbia.edu/~rambow/papers/joshi-rambow-2003.pdf
] .The root/foot label of the auxiliary tree must match the label of the node at which it adjoins.

Other variants of TAG allow multi-component trees, trees with multiple foot nodes, and other extensions.

Complexity and application

Tree-adjoining grammars are often described as mildly context-sensitive, meaning that they possess certain properties that make them more powerful (in terms of weak generative capacity) than context-free grammars, but less powerful than indexed or context-sensitive grammars. Mildly context-sensitive grammars are conjectured to be powerful enough to model natural languages while remaining efficiently parseable in the general case.cite book
last = Joshi
first = Aravind
chapter = How much context-sensitivity is necessary for characterizing structural descriptions
year = 1985
publisher = Cambridge University Press
pages = 206–250
title = Natural Language Processing: Theoretical, Computational, and Psychological Perspectives
editor = D. Dowty, L. Karttunen, and A. Zwicky, (eds.)
location = New York, NY
]

Due to their formal weakness, TAG is often used in computational linguistics and natural language processing.

A TAG can describe the language of squares (in which some arbitrary string is repeated), and the language $\left\{a^n b^n c^n d^n | 1 le n \right\}$. This type of processing can be represented by an embedded pushdown automaton.

Languages with cubes (ie triplicated strings) or with more than four distinct character strings of equal length cannot be generated by tree-adjoining grammars.

For these reasons, languages generated by tree-adjoining grammars are referred to as "mildly context-sensitive language"s.

References

* [http://www.cis.upenn.edu/~xtag/ The XTAG project] , which uses a TAG for natural language processing.
* [http://www.let.rug.nl/~vannoord/papers/diss/diss/node59.html A tutorial on TAG]
* [http://www.computing.dcu.ie/~yguo/doc/talk/TAG.pdf Another tutorial] with focus on comparison with Lexical Functional Grammar and grammars extraction from Treebank
* [http://wiki.loria.fr/wiki/SemConst/Documentation#Background SemConst Documentation] A quick survey on Syntax and Semantic Interface problematic within the TAG framework.
* [http://sourcesup.cru.fr/tulipa/ The TuLiPa project] The Tübingen Linguistic Parsing Architecture (TuLiPA) is a multi-formalism syntactic (and semantic) parsing environment, designed mainly for Multi-Component Tree Adjoining Grammars with Tree Tuples
* [http://mgkit.gforge.inria.fr/ The Metagrammar Toolkit] which provides several tools to edit and compile MetaGrammars into TAGs. It also include a wide coverage French Metagrammars.
* [http://www.loria.fr/~azim/LLP2/help/fr/index.html LLP2] A Lexicalized Tree Adjoining Grammar parser which provides an easy to use graphical environment (page in french)

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Tree Adjoining Grammar — Tree adjoining grammars (TAG) sind Grammatiken, die von Aravind Joshi eingeführt wurden und in der Computerlinguistik und bei der Verarbeitung von natürlichen Sprachen verwendet werden. TAGs ähneln kontextfreien Grammatiken, verwenden aber einen… …   Deutsch Wikipedia

• Grammar framework — In theoretical linguistics, the following fundamental approaches towards constructing grammar frameworks for natural languages are distinguished:*Generative grammar: algorithmic (phrase structure grammars) **Transformational grammar (1960s)… …   Wikipedia

• Generative grammar — In theoretical linguistics, generative grammar refers to a particular approach to the study of syntax. A generative grammar of a language attempts to give a set of rules that will correctly predict which combinations of words will form… …   Wikipedia

• Formal grammar — In formal semantics, computer science and linguistics, a formal grammar (also called formation rules) is a precise description of a formal language ndash; that is, of a set of strings over some alphabet. In other words, a grammar describes which… …   Wikipedia

• Dependency grammar — Hybrid constituency/dependency tree from the Quranic Arabic Corpus Dependency grammar (DG) is a class of syntactic theories developed by Lucien Tesnière. It is distinct from phrase structure grammars, as it lacks phrasal nodes. Structure is… …   Wikipedia

• Transformational grammar — In linguistics, a transformational grammar, or transformational generative grammar (TGG), is a generative grammar, especially of a natural language, that has been developed in a Chomskyan tradition. Additionally, transformational grammar is the… …   Wikipedia

• Lexical functional grammar — (LFG) is a grammar framework in theoretical linguistics, a variety of generative grammar. The development of the theory was initiated by Joan Bresnan and Ronald Kaplan in the 1970s, in reaction to the direction research in the area of… …   Wikipedia

• Context-free grammar — In formal language theory, a context free grammar (CFG) is a formal grammar in which every production rule is of the form V → w where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty). The… …   Wikipedia

• Context-sensitive grammar — A context sensitive grammar (CSG) is a formal grammar in which the left hand sides and right hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context sensitive grammars are more general than… …   Wikipedia

• Controlled grammar — Controlled grammars[1] are a class of grammars that extend, usually, the context free grammars with additional controls on the derivations of a sentence in the language. A number of different kinds of controlled grammars exist, the four main… …   Wikipedia

### Share the article and excerpts

Do a right-click on the link above