Undefined behavior

In computer programming, undefined behavior is a feature of some programming languages—most famously C.[1] In these languages, to simplify the specification and allow some flexibility in implementation, the specification leaves the results of certain operations specifically undefined.

For example, in C the use of any automatic variable before it has been initialized yields undefined behavior, as do division by zero and indexing an array outside of its defined bounds (see buffer overflow). This specifically frees the compiler to do whatever is easiest or most efficient, should such a program be submitted. In general, any behavior afterwards is also undefined. In particular, it is never required that the compiler diagnose undefined behavior — therefore, programs invoking undefined behavior may appear to compile and even run without errors at first, only to fail on another system, or even on another date. When an instance of undefined behavior occurs, so far as the language specification is concerned anything could happen, maybe nothing at all.

Under some circumstances there can be specific restrictions on undefined behavior. For example, the instruction set specifications of a CPU might leave the behavior of some forms of an instruction undefined, but if the CPU supports memory protection then the specification will probably include a blanket rule stating that no user-accessible instruction may cause a hole in the operating system's security; so an actual CPU would be permitted to corrupt any or all user registers in response to such an instruction but would not be allowed to, for example, switch into supervisor mode.

Contents

Examples in C and C++

Attempting to modify a string literal causes undefined behavior:[2]

char * p = "wikipedia"; // in C++, this requires deprecated implicit conversion from const char[] to char*
p[0] = 'W'; // undefined behaviour

One way to prevent this is defining it as an array instead of a pointer.

char p[] = "wikipedia"; /* RIGHT */
p[0] = 'W';

In C++ one can use STL string as follows.

std::string s = "wikipedia"; /* RIGHT */
s[0] = 'W';

Division by zero results in undefined behavior:[3]

return x/0; // undefined behavior

Certain pointer operations may result in undefined behavior:[4]

int arr[4] = {0, 1, 2, 3};
int* p = arr + 5;  // undefined behavior

No return by main function may cause undefined behavior:

void main() /* undefined behavior */
{
}

The C Programming Language written by Kernighan and Ritchie cites the following examples of code that have undefined behavior in Section 2.12.

printf("%d %d\n", ++n, power(2, n));    /* WRONG */

and

a[i] = i++;

Risks of undefined behavior

HTML versions 4 and earlier left error handling undefined. Over time pages started relying on unspecified error-recovery implemented in popular browsers. This caused difficulties for vendors of less-popular browsers who were forced to reverse-engineer and implement bug compatible error recovery. This has led to de-facto standard that was much more complicated than it could have been if this behavior was specified from the start.

Compiler easter eggs

In some languages (including C), even the compiler is not bound to behave in a sensible manner once undefined behavior has been invoked. One instance of undefined behavior acting as an Easter egg is the behavior of early versions of the GCC C compiler when given a program containing the #pragma directive, which has implementation-defined behavior according to the C standard. ("Implementation-defined" is more restrictive than "undefined", requiring the implementation to document what it does.) In practice, many C implementations recognize, for example, #pragma once as a rough equivalent of #include guards — but GCC 1.21, upon finding a #pragma directive, would instead attempt to launch commonly distributed Unix games such as NetHack and Rogue, or start Emacs running a simulation of the Towers of Hanoi.[5]

References

  1. ^ Lattner, Chris (May 13, 2011). "What Every C Programmer Should Know About Undefined Behavior". LLVM Project Blog. LLVM.org. http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html. Retrieved May 24, 2011. 
  2. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §2.13.4 String literals [lex.string] para. 2
  3. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §5.6 Multiplicative operators [expr.mul] para. 4
  4. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §5.7 Additive operators [expr.add] para. 5
  5. ^ "A Pragmatic Decision" quotes the March 1988 issue of UNIX Review magazine, which referred to GCC version 1.17 but got the order wrong. "Everything2: #pragma" gives the correct order. The actual code is in file "cccp.c" in the GCC 1.21 distribution: http://www.oldlinux.org/Linux.old/gnu/gcc-1/

External links


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Undefined variable — An undefined variable in the source code of a computer program is a variable that is accessed in the code but has not been previously declared by that code.In some programming languages an implicit declaration is provided the first time such a… …   Wikipedia

  • Pointer (computing) — This article is about the programming data type. For the input interface (for example a computer mouse), see Pointing device. Pointer a pointing to the memory address associated with variable b. Note that in this particular diagram, the computing …   Wikipedia

  • Division by zero — This article is about the mathematical concept. For other uses, see Division by zero (disambiguation). The function y = 1/x. As x approaches 0 from the right, y approaches infinity. As x approaches 0 from the left, y approaches negative …   Wikipedia

  • C syntax — The syntax of the C programming language is a set of rules that specifies whether the sequence of characters in a file is conforming C source code. The rules specify how the character sequences are to be chunked into tokens (the lexical grammar) …   Wikipedia

  • Malloc — In computing, malloc is a subroutine provided in the C and C++ programming language s standard libraries for performing dynamic memory allocation. Rationale The C programming language manages memory either statically or automatically . Static… …   Wikipedia

  • Sequence point — A sequence point in imperative programming defines any point in a computer program s execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have… …   Wikipedia

  • Programming language — lists Alphabetical Categorical Chronological Generational A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that… …   Wikipedia

  • Type system — Type systems Type safety Inferred vs. Manifest Dynamic vs. Static Strong vs. Weak Nominal vs. Structural Dependent typing Duck typing Latent typing Linear typing Uniqueness typing …   Wikipedia

  • C (programming language) — C The C Programming Language[1] (aka K R ) is the seminal book on C …   Wikipedia

  • Printf — The class of printf functions (which stands for print formatted ) is a class of functions, typically associated with curly bracket programming languages, that accept a string parameter (called the format string) which specifies a method for… …   Wikipedia


Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.