- Programming language specification
A programming language specification can take several forms, including the following:
* An explicit definition of the syntax and semantics of the language. While syntax is commonly specified using a formal grammar, semantic definitions may be written in
natural language(e.g., the C language), or a formal semantics (e.g., the Standard ML[cite book
last = Milner
first = R.
authorlink = Robin Milner
coauthors = M. Tofte, R. Harper and D. MacQueen.
title = The Definition of Standard ML (Revised)
publisher = MIT Press
year = 1997
id = ISBN 0-262-63181-4] and Scheme [cite web|first=Richard |last=Kelsey|coauthors=William Clinger and Jonathan Rees|title=Section 7.2 Formal semantics|work=Revised5 Report on the Algorithmic Language Scheme|url = http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-10.html#%_sec_7.2| year=1998|month=February|accessdate=2006-06-09] specifications).
* A description of the behavior of a translator for the language (e.g., the
C++and Fortran). The syntax and semantics of the language has to be inferred from this description, which may be written in natural or a formal language.
* A "model" implementation, sometimes written in the language being specified (e.g., the
Prolog). The syntax and semantics of the language are explicit in the behavior of the model implementation.
Syntaxin a programming language is usually described using a combination of
regular expressions to describe lexemes, and
context-free grammars to describe how lexemes may be combined to form a valid program.
Formulating a rigorous semantics of a large, complex, practical programming language is a daunting task even for experienced specialists, and the resulting specification can be difficult for anyone but experts to understand. The following are some of the ways in which programming language semantics can be described; all languages use at least one of these description methods, and some languages combine more than one:
Natural language: Description by human natural language.
* Formal semantics: Description by
* Reference implementations: Description by
* Test suites: Description by examples of programs and their expected behaviors. While few language specifications start off in this form, the evolution of some language specifications has been influenced by the semantics of a test suite (eg, in the past the specification of Ada has been modified to match the behavior of the
Ada Conformity Assessment Test Suite).
Most widely-used languages are specified using natural language descriptions of their semantics. This description usually takes the form of a "reference manual" for the language. These manuals can run to hundreds of pages. For example, the print version of "The Java Language Specification, 3rd Ed." is 596 pages long.
The imprecision of natural language as a vehicle for describing programming language semantics can lead to problems with interpreting the specification. For example, the semantics of Java threads were specified in English, and it was later discovered that the specification did not provide adequate guidance for implementors. [William Pugh. The Java Memory Model is Fatally Flawed. "Concurrency: Practice and Experience" 12(6):445-455, August 2000]
Formal semantics are grounded in mathematics. As a result, they can be more precise and less ambiguous than semantics given in natural language. However, supplemental natural language descriptions of the semnatics are often included to aid understanding of the formal definitions. For example, The
ISOStandard for Modula 2contains both a formal and a natural language definition on opposing pages.
Programming languages whose semantics are described formally can reap many benefits. For example:
* Formal semantics enable mathematical proofs of program correctness;
* Formal semantics facilitate the design of
type systems, and proofs about the soundness of those type systems;
* Formal semantics can establish unambiguous and uniform standards for implementations of the language.
Automatic tool support can help to realize many of these benefits. For example, an automated
theorem proveror theorem checker can increase a programmer's (or language designer's) confidence in the correctness of proofs about programs (or the language itself). The power and scalability of these tools varies widely: full formal verificationis computationally intensive, rarely scales beyond programs containing a few hundred linesFact|date=February 2007 and may require considerable manual assistance from a programmer; more lightweight tools such as model checkers require fewer resources and have been used on programs containing tens of thousands of lines; many compilers apply static type checks to any program they compile.
A reference implementation is a single implementation of a programming language that is designated as authoritative. The behavior of this implementation is held to define the proper behavior of a program written in the language. This approach has several attractive properties. First, it is precise, and requires no human interpretation: disputes as to the meaning of a program can be settled simply by executing the program on the reference implementation (provided that the implementation behaves deterministically for that program).
On the other hand, defining language semantics through a reference implementation also has several potential drawbacks. Chief among them is that it conflates limitations of the reference implementation with properties of the language. For example, if the reference implementation has a bug, then that bug must be considered to be an authoritative behavior. Another drawback is that programs written in this language may rely on quirks in the reference implementation, hindering portability across different implementations.
Nevertheless, several languages have successfully used the reference implementation approach. For example, the
Perlinterpreter is considered to define the authoritative behavior of Perl programs. In the case of Perl, the Open Sourcemodel of software distribution has contributed to the fact that nobody has ever produced another implementation of the language, so the issues involved in using a reference implementation to define the language semantics are moot.
Defining the semantics of a programming language in terms of a test suite involves writing a number of example programs in the language, and then describing how those programs ought to behave — perhaps by writing down their correct outputs. The programs, plus their outputs, are called the "test suite" of the language. Any correct language implementation must then produce exactly the correct outputs on the test suite programs.
The chief advantage of this approach to semantic description is that it is easy to determine whether a language implementation passes a test suite. The user can simply execute all the programs in the test suite, and compare the outputs to the desired outputs. However, when used by itself, the test suite approach has major drawbacks as well. For example, users want to run their own programs, which are not part of the test suite; indeed, a language implementation that could "only" run the programs in its test suite would be largely useless. But a test suite does not, by itself, describe how the language implementation should behave on any program not in the test suite; determining that behavior requires some extrapolation on the implementor's part, and different implementors may disagree. In addition, it is difficult to use a test suite to test behavior that is intended or allowed to be nondeterministic.
Therefore, in common practice, test suites are used only in combination with one of the other language specification techniques, such as a natural language description or a reference implementation.
Programming language reference
A few examples of official or draft language specifications:
* [http://www.schemers.org/Documents/Standards/ Scheme R5RS]
* [http://www.masswerk.at/algol60/report.htm Algol 60 report]
* [http://www.adahome.com/rm95/ Ada 95 reference manual]
* [http://java.sun.com/docs/books/jls/ Java language specification]
* [http://www.csci.csusb.edu/dick/c++std/cd2/index.html Draft C++ standard]
Wikimedia Foundation. 2010.
Look at other dictionaries:
Programming language — lists Alphabetical Categorical Chronological Generational A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that… … Wikipedia
Programming language reference — A programming language reference (or language reference manual) is an artifact that describes a programming language so that users and developers can understand the basic elements of and write computer programs in the target language.A… … Wikipedia
Programming language theory — (commonly known as PLT) is a branch of computer science that deals with the design, implementation, analysis, characterization, and classification of programming languages and programming language features. It is a multi disciplinary field, both… … Wikipedia
Dart (programming language) — Dart Appeared in 2011 Developer Google Preview release 0.05 (November 16, 2011; 6 days ago& … Wikipedia
Concatenative programming language — Programming paradigms Agent oriented Automata based Component based Flow based Pipelined Concatenative Concurr … Wikipedia
C Sharp (programming language) — The correct title of this article is C# (programming language). The substitution or omission of the # sign is because of technical restrictions. C# Paradigm(s) multi paradigm: structured, imperative … Wikipedia
Ada (programming language) — For other uses of Ada or ADA, see Ada (disambiguation). Ada Paradigm(s) Multi paradigm Appeared in 1980 Designed by MIL STD 1815/Ada 83: Jean Ichbiah Ada 95: Tucker Taft Ada 2005: Tucker Taft Stable release … Wikipedia
C (programming language) — C The C Programming Language (aka K R ) is the seminal book on C … Wikipedia
Java (programming language) — infobox programming language name = Java paradigm = Object oriented, structured, imperative year = 1995 designer = Sun Microsystems latest release version = Java Standard Edition 6 (1.6.0) latest release date = latest test version = latest test… … Wikipedia
Fourth-generation programming language — A fourth generation programming language (1970s 1990) (abbreviated 4GL) is a programming language or programming environment designed with a specific purpose in mind, such as the development of commercial business software. In the history of… … Wikipedia