 Nested word

In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finitestate acceptors for nested words, socalled nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic contextfree languages. Since their introduction in 2004, these concepts have triggered much research in that area.^{[1]}
Contents
Formal definition
To define nested words, we first need to define matching relation. As usual, for a nonnegative integer , we use the notation to denote the set , with the special case .
A matching relation ↝ of length is a subset of such that:
(i) all nesting edges are forward, that is, if i ↝ j then i<j;
(ii) nesting edges never have a finite position in common, that is, for , there is at most one position h such that h ↝ i, and there is at most one position j such that i ↝ j; and
(iii) nesting edges never cross, that is, we can't find i<i'≤j<j' such that both i ↝ j and i' ↝ j'.
i is referred to as a call position, if i ↝ j for some j, as a pending call if i ↝ ∞, as a return position, if h ↝ i for some h and as a "pending return" if ∞ ↝ i.A nested word of length over an alphabet Σ is a pair (w,↝), where w is a word of length over Σ (in the usual sense) and ↝ is a matching relation of length .
Encoding nested words into ordinary words
Nested words over the alphabet can be encoded into "ordinary" words over the tagged alphabet , in which each symbol a from Σ has three tagged counterparts: the symbol ⟨a for encoding a call position in a nested word labelled with a, the symbol a⟩ for encoding a return position labelled with a, and finally the symbol a itself for representing an internal position labelled with a. More precisely, let φ be the function mapping nested words over Σ to words over such that each nested word (,↝) is mapped to the word , where the letter x_{i} equals ⟨a, a, or a⟩, respectively, if w_{i} = a and i is a call position, an internal position, or a return position, respectively.
Example
For illustration, let n=(w,↝) be the nested word over an ternary alphabet with w=abaabccca and matching relation ↝ = {(∞,1),(2,∞),(3,4),(5,7),(8,∞)}. Then its encoding as word reads as φ(n) = a⟩⟨b⟨aa⟩⟨bcc⟩⟨ca.
Automata
Nested word automaton
A nested word automaton has a finite number of states, and operates in almost the same way as a deterministic finite automaton on classical strings: a classical finite automaton reads the input word from left to right, and the state of the automaton after reading the jth letter w_{j} depends on the state in which the automaton was before reading w_{j}.
In a nested word automaton, the position j in the nested word (w,↝) might be a return position; if so, the state after reading w_{j} will not only depend on the linear state in which the automaton was before reading w_{j}, but also on a hierarchical state propagated by the automaton at the time it was in the corresponding call position. In analogy to regular languages of words, a set L of nested words is called regular if it is accepted by some (finitestate) nested word automaton.
Visibly pushdown automaton
Nested word automata are an automaton model accepting nested words. There is an equivalent automaton model operating on (ordinary) words. Namely, the notion of a deterministic visibly pushdown automaton is a restriction of the notion of a deterministic pushdown automaton.
Following Alur and Madhusudan,^{[2]} a deterministic visibly pushdown automaton is formally defined as a 6tuple where
 is a finite set of states,
 is the input alphabet, which – in contrast to that of ordinary pushdown automata – is partitioned into three sets Σ_{c}, Σ_{r}, and Σ_{int}. The alphabet Σ_{c} denotes the set of call symbols, Σ_{r} contains the return symbols, and the set Σ_{int} contains the internal symbols,
 is a finite set which is called the stack alphabet, which containins a special symbol denoting the empty stack,
 is the transition function, which is partitioned into three parts corresponding to call transitions, return transitions, and internal transitions, namely
 , the call transition function
 , the return transition function
 , the internal transition function,
 is the initial state, and
 is the set of accepting states.
The notion of computation of a visibly pushdown automaton is a restriction of the one used for pushdown automata. Visibly pushdown automata only add a symbol to the stack when reading a call symbol , they only remove the top element from the stack when reading a return symbol and they do not alter the stack when reading an internal event . A computation ending in an accepting state is an accepting computation.
As a result, a visibly pushdown automaton cannot push to and pop from the stack with the same input symbol. Thus the language cannot be accepted by a visibly pushdown automaton for any partition of Σ, however there are pushdown automata accepting this language.
If a language L over a tagged alphabet is accepted by a deterministic visibly pushdown automaton, then L is called a visibly pushdown language.
Nondeterministic visibly pushdown automata
Nondeterministic visibly pushdown automata are as expressive as deterministic ones. Hence one can transform a nondeterministic visibly pushdown automaton into a deterministic one, but if the nondeterministic automaton had s states, the deterministic one may have up to states.^{[3]}
Decision problems
Let  A  be the size of the description of an automaton A, then it is possible to check if a word n is accepted by the automaton in time . In particular, the emptiness problem is solvable in time O(  A  ^{3}). If A is fixed, it is decidable in time and space O(d) where d is the depth of n in a streaming seeing. It is also decidable with space and time , and by a uniform boolean circuit of depth O(log l).^{[2]}
For two nondeterministic automata A and B, deciding whether the set of words accepted by A is a subset of the word accepted by B is EXPTIMEcomplete. It is also EXPTIMEcomplete to figure out if there is a word that is not accepted.^{[2]}
Languages
As the definition of visibly pushdown automata shows, deterministic visibly pushdown automata can be seen as a special case of deterministic pushdown automata; thus the set VPL of visibly pushdown languages over forms a subset of the set DCFL of deterministic contextfree languages over the set of symbols in . In particular, the function that removes matching relation from nested words transforms regular languages over nested words into contextfree languages.
Closure properties
The set of visibly pushdown languages is closed under the following operations:^{[3]}
 set operations:
 union
 intersection
 complement, thus giving rise to a boolean algebra.
 Kleene star
 concatenation
For the intersection operation, one can construct a VPA M simulating two given VPAs M_{1} and M_{2} by a simple product construction (Alur & Madhusudan 2004): For i = 1,2, assume M_{i} is given as . Then for the automaton M, the set of states is , the initial state is , the set of final states is , the stack alphabet is given by , and the initial stack symbol is (Z_{1},Z_{2}).
If M is in state (p_{1},p_{2}) on reading a call symbol , then M pushes the stack symbol (γ_{1},γ_{2}) and goes to state (q_{1},q_{2}), where γ_{i} is the stack symbol pushed by M_{i} when transitioning from state p_{i} to q_{i} on reading input .
If M is in state (p_{1},p_{2}) on reading an internal symbol a, then M goes to state (q_{1},q_{2}), whenever M_{i} transitions from state p_{i} to q_{i} on reading a.
If M is in state (p_{1},p_{2}) on reading a return symbol , then M pops the symbol (γ_{1},γ_{2}) from the stack and goes to state (q_{1},q_{2}), where γ_{i} is the stack symbol popped by M_{i} when transitioning from state p_{i} to q_{i} on reading .
Correctness of the above construction crucially relies on the fact that the push and pop actions of the simulated machines M_{1} and M_{2} are synchronized along the input symbols read. In fact, a similar simulation is no longer possible for deterministic pushdown automata, as the larger class of deterministic contextfree languages is no longer closed under intersection.
In contrast to the construction for concatenation shown above, the complementation construction for visibly pushdown automata parallels the standard construction^{[4]} for deterministic pushdown automata.
Moreover, the class of visibly pushdown languages is closed under
 prefix closure
 suffix closure
 reversal.
Relation to other language classes
Alur & Madhusudan (2004) point out that the visibly pushdown languages are more general than the parenthesis languages suggested in McNaughton (1967). As shown by Reghizzi & Mandrioli (2009), the VPL in turn are strictly contained in the class of languages described by operatorprecedence grammars, which were introduced by Floyd (1963). In comparison to conjunctive grammars, a generalization of contextfree grammars, Okhotin (2011) shows that the linear conjunctive languages form a superclass of the visibly pushdown languages. The following table puts the family of visibly pushdown languages in relation to other language families in the Chomsky hierarchy.
Automata theory: formal languages and formal grammars Chomsky hierarchy Type0—Type1———Type2——Type3—Grammars (no common name)Linear contextfree rewriting systems etc.Treeadjoining etc.Visibly pushdown—Languages Visibly pushdownMinimal automaton Thread automataVisibly pushdownEach category of languages is a proper subset of the category directly above it.  Any automaton and any grammar in each category has an equivalent automaton or grammar in the category directly above it. Other models of description
Visibly pushdown grammars
Visibly pushdown languages are exactly the languages that can be described by visibly pushdown grammars.^{[2]}
Visibly pushdown grammars can be defined as a restriction of contextfree grammars. A visibly pushdown grammars G is defined by the 4tuple:
where
 and are disjoint finite set; each element is called a nonterminal character or a variable. Each variable represents a different type of phrase or clause in the sentence. Each variable defines a sublanguage of the language defined by , and the sublanguages of are the one without pending calls or pending returns.
 is a finite set of terminals, disjoint from , which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar .
 is the start variable (or start symbol), used to represent the whole sentence (or program). It must be an element of .
 is a finite relation from to such that . The members of are called the (rewrite) rules or productions of the grammar. There are three kinds of rewrite rules. For , and
 and if then and
 and if then
Here, the asterisk represents the Kleene star operation and is the empty word.
Uniform Boolean circuits
The problem whether a word n is accepted by a given nested word automaton can be solved by uniform Boolean circuits of depth .^{[2]}
Logical description
Regular languages over nested words are exactly the set of languages described by Monadic SO_(complexity) with two unary predicates call and return, linear successor and the matching relation ↝.^{[2]}
Notes
 ^ Google Scholar search results for "nested words" OR "visibly pushdown"
 ^ ^{a} ^{b} ^{c} ^{d} ^{e} ^{f} Alur & Madhusudan (2009)
 ^ ^{a} ^{b} Alur & Madhusudan (2004)
 ^ Hopcroft & Ullman (1979), p. 238 f.
See also
References
 Floyd, R. W. (1963). "Syntactic Analysis and Operator Precedence". Journal of the ACM 10: 316. doi:10.1145/321172.321179.
 McNaughton, R. (1967). "Parenthesis Grammars". Journal of the ACM 14: 490. doi:10.1145/321406.321411.
 Alur, R.; Madhusudan, P. (2004). Visibly pushdown languages. pp. 202. doi:10.1145/1007352.1007390.
 Alur, R.; Arenas, M.; Barcelo, P.; Etessami, K.; Immerman, N.; Libkin, L. (2008). Grädel, Erich. ed. "FirstOrder and Temporal Logics for Nested Words". Logical Methods in Computer Science 4 (4). doi:10.2168/LMCS4(4:11)2008.
 Alur, R.; Madhusudan, P. (2009). "Adding nesting structure to words". Journal of the ACM 56 (3): 1–43. doi:10.1145/1516512.1516518.
 Stefano Crespi Reghizzi, Dino Mandrioli: Algebraic properties of structured contextfree languages: old approaches and novel developments, 7th International Conference on Words (WORDS 2009), arXiv:0907.2130.
 Okhotin, Alexander: Comparing linear conjunctive languages to subfamilies of the contextfree languages, 37th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2011).
External links
Categories: Words
 Formal languages
 Automata theory
Wikimedia Foundation. 2010.
Look at other dictionaries:
Nested — For a definition of the word nested , see the Wiktionary entry nested. This article is about the pop album. For other uses, see Nesting (disambiguation). Nested Studio album by … Wikipedia
nest — I UK [nest] / US noun [countable] Word forms nest : singular nest plural nests ** 1) a structure that birds make to keep their eggs and babies in build a nest: Ducks usually build their nests on the ground. a) a place that insects or small… … English dictionary
nest — nest1 [ nest ] noun count ** 1. ) a structure that birds make to keep their eggs and babies in: build a nest: Ducks usually build their nests on the ground. a ) a place that insects or small animals such as mice make to live in: an ants nest b )… … Usage of the words and phrases in modern English
Acronym and initialism — For acronyms used on Wikipedia, see Wikipedia:Acronyms. Acronyms and initialisms are abbreviations formed from the initial components in a phrase or name. These components may be individual letters (as in CEO) or parts of words (as in Benelux and … Wikipedia
Hierarchy — A hierarchy (Greek: hierarchia (ἱεραρχία), from hierarches, leader of sacred rites ) is an arrangement of items (objects, names, values, categories, etc.) in which the items are represented as being above, below, or at the same level as one… … Wikipedia
NonEnglish usage of quotation marks — A Non English usage of quotation marks Punctuation apostrophe ( … Wikipedia
Nesting (computing) — In computing science and informatics, the word nesting may denote several different constructions and activities where information is organized in layers or objects contain other similar objects. The rather general term is thus used in quite… … Wikipedia
Kilogram — Kg redirects here. For other uses, see Kg (disambiguation). Kilogram A computer generated image of the international prototype kilogram (IPK). The IPK is the kilogram. The IPK, which is roughly the size of a golf ball, sits here alongside a ruler … Wikipedia
Scribal abbreviation — Sigla redirects here. For the village in Poland, see Sigła. Text sample from an early 15th century Bible manuscript. Scribal abbreviations (sigla [plural], siglum and sigil [singular]) are the abbreviations used by ancient and mediæval scribes… … Wikipedia
arts, East Asian — Introduction music and visual and performing arts of China, Korea, and Japan. The literatures of these countries are covered in the articles Chinese literature, Korean literature, and Japanese literature. Some studies of East Asia… … Universalium