Position-independent code

In computing, position-independent code (PIC) or position-independent executable (PIE) is machine instruction code that executes properly regardless of where in memory it resides. PIC is commonly used for shared libraries, so that the same library code can be loaded in a location in each program address space where it won't overlap any other uses of memory (for example, other shared libraries). PIC was also used on older computer systems lacking an MMU,Fact|date=November 2007 so that the operating system could keep applications away from each other even within the single address space of an MMU-less system.

Position-independent code can be copied to any memory location without modification and executed, unlike relocatable code, which requires special processing by a link editor or program loader to make it suitable for execution at a given location.Code must generally be written or compiled in a special fashion in order to be position independent. Instructions that refer to specific memory addresses, such as absolute branches, must be replaced with equivalent program counter relative instructions. The extra indirection may cause PIC code to be less efficient, although modern processors are designed to make this more tolerable.Fact|date=November 2007

History

In early computers, code was position-dependent: each program was built to be loaded into, and run from, a particular address. In order to run multiple jobs using separate programs at the same time, an operator had to carefully schedule the jobs so that no two simultaneous jobs would run programs that required the same load addresses. For example, if both the payroll program and the accounts receivable program were built to run at address 32K, the operator could not run both at the same time. Sometimes, an operator would keep multiple versions of a program around, each built for a different load address, to expand his options.

To make things more flexible, position-independent code was invented. Position-independent code could run from any address at which the operator chose to load it.

The invention of dynamic address translation (the function provided by an MMU) obsoleted position-independent code, because every job could have its own separate address 32K, so the programmer could build all programs to run at address 32K and they could still run all at the same time (each in its own address space). Because position-independent code is less efficient than position-dependent code, this was a better solution to the problem.

The next problem to be attacked was the memory waste that happens when the same code is loaded multiple times to be used by multiple simultaneous jobs. If two jobs run entirely identical programs, dynamic address translation provides a solution by allowing the system simply to map two different jobs' address 32K to the same bytes of real memory, containing the single copy of the program.

But more often, the programs are different and merely share a lot of common code. For example, the payroll program and the accounts receivable program probably both contain an identical sort subroutine. So designers invented shared modules (a shared library is a form of shared module). While the main payroll and accounts receivable programs get loaded into separate memory, the shared module gets loaded once and simply mapped into the two address spaces.

But this introduces a memory allocation problem similar to the one that position-independent code solved above: If a program can have one shared module, it can have lots of them. What if a single program, in a single address space, wants to use two shared modules, both built to run at the same address? The system cannot load both at the same time, so there is no way to load the program. To work around this, programmers made sure they never built two shared modules to run at the same address if they might both have to be used by the same program. Sometimes, they made multiple versions of a shared module, each built to run at a different address.

This is obviously not a desirable situation. It's a lot of manual work and wastes address space. Position-independent code comes to the rescue again, because if a shared module can run from any address, then the program loader can simply load it into any free address. The sort subroutine might run at address 32K in a payroll job, but 48K in a simultaneous accounts receivable job. Both addresses refer to the same real memory; there is only one copy of the sort subroutine in real memory.

Position-independent code has been used not only to coordinate the work of user-level applications, but within operating systems as well. The earliest paging systems did not use virtual memory address spaces; instead, the operating system would explicitly load individual modules of itself as needed, overwriting less needed ones (the memory available for the operating system was much smaller than the operating system). A module had to be capable of running in whatever memory was free at the time it was needed, so individual operating system modules were made of position-independent code.

The invention of virtual memory obsoleted that method, because the operating system could have a virtual address space so big that every module of the operating system could have its own permanent virtual address.

Technical details

Procedure calls inside a shared library are typically made through small procedure linkage table stubs, which then call the definitive function. This notably allows a shared library to inherit certain function calls from previously loaded libraries rather than using its own versions.

Data references from position-independent code are usually made indirectly, through global offset tables (GOTs), which store the addresses of all accessed global variables. There is one GOT per compilation unit or object module, and it is located at a fixed offset from the code (although this offset is not known until the library is linked). When a linker links modules to create a shared library, it merges the GOTs and sets the final offsets in code. It is not necessary to adjust the offsets when loading the shared library later.

Position independent functions accessing global data start by determining the absolute address of the GOT given their own current program counter value. This often takes the form of a fake function call in order to obtain the return value on stack (x86) or in a special register (PowerPC, SPARC, probably at least some other RISC processors, ESA/390), which can then be stored in a predefined standard register. Some processor architectures, like the Motorola 68000, Motorola 6809, ARM and the new AMD64 allow referencing data by offset from the program counter. This is specifically targeted at making position-independent code smaller, less register demanding and hence more efficient.

Windows DLLs

Microsoft Windows DLLs are not shared libraries in the Unix sense and do not use position independent code. This means they cannot have their routines overridden by previously loaded DLLs and require small tricks for sharing selected global data. Code has to be relocated after it has been loaded from disk, making it potentially non-shareable between processes; sharing mostly occurs on disk.

To alleviate this limitation, almost all Windows system DLLs are pre-mapped at different fixed addresses in such a way that there is no conflict. It is not necessary to relocate the libraries before using them and memory can be shared. Even pre-mapped DLLs still contain information which allows them to be loaded at arbitrary addresses if necessary.

A sharing technique Windows calls "memory mapping" (not to be confused with Memory-mapped I/O) is sometimes able to allow multiple processes to share an instance of a DLL loaded into memory. However, the reality is that Windows is not always able to share one instance of a DLL loaded by multiple processes. [cite web
title = The End of DLL Hell
author = Rick Anderson
month = January
year = 2000
work = Microsoft Developer Network
url = http://msdn2.microsoft.com/en-us/library/ms811694.aspx#dlldanger1_topic2
accessdate = 2007-04-26
quote = Sharing common DLLs does provide a significant savings on memory load. But Windows is not always able to share one instance of a DLL that is loaded by multiple processes.
] Windows requires each compiled program to know where in "its" address space each DLL will be accessed — there is no support for position independence.

A DLL specifies its "desired" base address when it is created (Visual C++ defaults to an offset of 0x10000000) but if multiple DLLs have the same desired base address, a program cannot relocate them all to that base offset and must specify new offsets when linking. When the Windows loader loads an executable into memory for execution, it checks to see if each DLL has already been loaded with the offset used when the executable was created (not the DLL). If the DLL is not already loaded with that offset, it is relocated to the base requested by the executable. Note that this will provide sharing across multiple processes of the "same" executable (e.g. if started in different accounts via Fast User Switching), but not necessarily across different programs that link to the same DLL.

Other platforms such as Mac OS X and Linux now support forms of prebinding as well. For Mac OS X the system is called prebinding. Under Linux, the system used is implemented via a program called prelink. This is vastly different from memory mapping.

Position-independent executables

Position-independent executables (PIE) are executable binaries made entirely from position-independent code. While some systems only run PIC executables, there are other reasons they are used. PIE binaries are used in some security-focused Linux distributions to allow PaX or Exec Shield to use address space layout randomization to prevent attackers from knowing where existing executable code is during a security attack using exploits that rely on knowing the offset of the executable code in the binary, such as return-to-libc attacks.

See also

* Dynamic linker

References

Further reading

*cite book
title = Linkers and Loaders
author = John R. Levine
chapter = Chapter 8: Loading and overlays
chapterurl = http://www.iecc.com/linker/linker08.html
month = October
year = 1999
publisher = Morgan-Kauffman
isbn = 1-55860-496-0

External links

* [http://www.gentoo.org/proj/en/hardened/pic-guide.xml Introduction to Position Independent Code]
* [http://www.gentoo.org/proj/en/hardened/pic-internals.xml Position Independent Code internals]
* [http://linux4u.jinr.ru/usoft/WWW/www_debian.org/Documentation/elf/node21.html Programming in Assembly Language with PIC]


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Code segment — In computing, a code segment, also known as a text segment or simply as text, is one of the sections of a program in an object file or in memory, which contains executable instructions. It has a fixed size and is usually read only. If the text… …   Wikipedia

  • Independent Order of Odd Fellows — This article is about the North American organization and its international off shoots. For other world wide Orders, see Odd Fellows. For the Australian financial services company, see IOOF (company). For other uses, see IOOF (disambiguation).… …   Wikipedia

  • Code 128 — Wikipedia encoded in Code 128 B Code 128 is a very high density barcode symbology. It is used for alphanumeric or numeric only barcodes. It can encode all 128 characters of ASCII and, by use of an extension character (FNC4), the Latin 1… …   Wikipedia

  • North Forest Independent School District — Administration building North Forest Independent School District is a school district based in northeast Houston, Texas. In 2009, the school district was rated academically unacceptable by the Texas Education Agency …   Wikipedia

  • Chess960 starting position — Chess960 is a chess variant in which the arrangement of pieces on the first rank is randomly generated. There are 960 possible starting positions, hence the name. The starting position can be generated before the game either by a computer program …   Wikipedia

  • Hamming code — In telecommunication, a Hamming code is a linear error correcting code named after its inventor, Richard Hamming. Hamming codes can detect and correct single bit errors. In other words, the Hamming distance between the transmitted and received… …   Wikipedia

  • Holiness code — The Holiness Code is a term used in biblical criticism to refer to Leviticus 17 26, and is so called due to its highly repeated use of the word Holy. It has no special traditional religious significance and traditional Jews and Christians do not… …   Wikipedia

  • United States Office of the Independent Counsel — was an independent prosecutor distinct from the Attorney General of the United States Department of Justice that provided reports to the Congress under 28 U.S.C. § 595. The office was terminated in 1999 and replaced by the U.S.… …   Wikipedia

  • Criticisms of The Da Vinci Code — The Da Vinci Code , a popular suspense novel by Dan Brown, generated a great deal of criticism and controversy after its publication in 2003. Many of the complaints centered on the book s speculations and alleged misrepresentations of core… …   Wikipedia

  • PaX — In computer security, PaX is a patch for the Linux kernel that implements least privilege protections for memory pages. The least privilege approach allows computer programs to do only what they have to do in order to be able to execute properly …   Wikipedia


Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.