Digital signal processor
Digital signal processing algorithms typically require a large number of mathematical operations to be performed quickly and repetitively on a set of data. Signals (perhaps from audio or video sensors) are constantly converted from analog to digital, manipulated digitally, and then converted again to analog form, as diagrammed below. Many DSP applications have constraints on latency; that is, for the system to work, the DSP operation must be completed within some fixed time, and deferred (or batch) processing is not viable.
Most general-purpose microprocessors and operating systems can execute DSP algorithms successfully, but are not suitable for use in portable devices such as mobile phones and PDAs because of power supply and space constraints. A specialized digital signal processor, however, will tend to provide a lower-cost solution, with better performance, lower latency, and no requirements for specialized cooling or large batteries.
The architecture of a digital signal processor is optimized specifically for digital signal processing. Most also support some of the features as an applications processor or microcontroller, since signal processing is rarely the only task of a system. Some useful features for optimizing DSP algorithms are outlined below.
By the standards of general-purpose processors, DSP instruction sets are often highly irregular. One implication for software architecture is that hand-optimized assembly-code routines are commonly packaged into libraries for re-use, instead of relying on advanced compiler technologies to handle essential algorithms.
Hardware features visible through DSP instruction sets commonly include:
- Hardware modulo addressing, allowing circular buffers to be implemented without having to constantly test for wrapping.
- A memory architecture designed for streaming data, using DMA extensively and expecting code to be written to know about cache hierarchies and the associated delays.
- Driving multiple arithmetic units may require memory architectures to support several accesses per instruction cycle
- Separate program and data memories (Harvard architecture), and sometimes concurrent access on multiple data busses
- Special SIMD (single instruction, multiple data) operations
- Some processors use VLIW techniques so each instruction drives multiple arithmetic units in parallel
- Special arithmetic operations, such as fast multiply–accumulates (MACs). Many fundamental DSP algorithms, such as FIR filters or the Fast Fourier transform (FFT) depend heavily on multiply–accumulate performance.
- Bit-reversed addressing, a special addressing mode useful for calculating FFTs
- Special loop controls, such as architectural support for executing a few instruction words in a very tight loop without overhead for instruction fetches or exit testing
- Deliberate exclusion of a memory management unit. DSPs frequently use multi-tasking operating systems, but have no support for virtual memory or memory protection. Operating systems that use virtual memory require more time for context switching among processes, which increases latency.
- Floating-point unit integrated directly into the datapath
- Pipelined architecture
- Highly parallel multiplier–accumulators (MAC units)
- Hardware-controlled looping, to reduce or eliminate the overhead required for looping operations
- DSPs often use special memory architectures that are able to fetch multiple data and/or instructions at the same time:
- Use of direct memory access
- Memory-address calculation unit
- Saturation arithmetic, in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold rather than wrapping around (maximum+1 doesn't overflow to minimum as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available.
- Fixed-point arithmetic is often used to speed up arithmetic processing
- Single-cycle operations to increase the benefits of pipelining
- Multiply–accumulate (MAC, including fused multiply–add, FMA) operations, which are used extensively in all kinds of matrix operations, such as convolution for filtering, dot product, or even polynomial evaluation (see Horner scheme)
- Instructions to increase parallelism: SIMD, VLIW, superscalar architecture
- Specialized instructions for modulo addressing in ring buffers and bit-reversed addressing mode for FFT cross-referencing
- Digital signal processors sometimes use time-stationary encoding to simplify hardware and increase coding efficiency.
Prior to the advent of stand-alone DSP chips discussed below, most DSP applications were implemented using bit-slice processors. The AMD 2901 bit-slice chip with its family of components was a very popular choice. There were reference designs from AMD, but very often the specifics of a particular design were application specific. These bit slice architectures would sometimes include a peripheral multiplier chip. Examples of these multipliers were a series from TRW including the TDC1008 and TDC1010, some of which included an accumulator, providing the requisite multiply–accumulate (MAC) function.
In 1978, Intel released the 2920 as an "analog signal processor". It had an on-chip ADC/DAC with an internal signal processor, but it didn't have a hardware multiplier and was not successful in the market. In 1979, AMI released the S2811. It was designed as a microprocessor peripheral, and it had to be initialized by the host. The S2811 was likewise not successful in the market.
In 1980 the first stand-alone, complete DSPs – the NEC µPD7720 and AT&T DSP1 – were presented at the International Solid-State Circuits Conference '80. Both processors were inspired by the research in PSTN telecommunications.
The Altamira DX-1 was another early DSP, utilizing quad integer pipelines with delayed branches and branch prediction.
The first DSP produced by Texas Instruments (TI), the TMS32010 presented in 1983, proved to be an even bigger success. It was based on the Harvard architecture, and so had separate instruction and data memory. It already had a special instruction set, with instructions like load-and-accumulate or multiply-and-accumulate. It could work on 16-bit numbers and needed 390 ns for a multiply–add operation. TI is now the market leader in general-purpose DSPs. Another successful design was the Motorola 56000.
About five years later, the second generation of DSPs began to spread. They had 3 memories for storing two operands simultaneously and included hardware to accelerate tight loops, they also had an addressing unit capable of loop-addressing. Some of them operated on 24-bit variables and a typical model only required about 21 ns for a MAC. Members of this generation were for example the AT&T DSP16A or the Motorola DSP56001.
The main improvement in the third generation was the appearance of application-specific units and instructions in the data path, or sometimes as coprocessors. These units allowed direct hardware acceleration of very specific but complex mathematical problems, like the Fourier-transform or matrix operations. Some chips, like the Motorola MC68356, even included more than one processor core to work in parallel. Other DSPs from 1995 are the TI TMS320C541 or the TMS 320C80.
The fourth generation is best characterized by the changes in the instruction set and the instruction encoding/decoding. SIMD extensions were added, VLIW and the superscalar architecture appeared. As always, the clock-speeds have increased, a 3 ns MAC now became possible.
Modern signal processors yield greater performance; this is due in part to both technological and architectural advancements like lower design rules, fast-access two-level cache, (E)DMA circuitry and a wider bus system. Not all DSP's provide the same speed and many kinds of signal processors exist, each one of them being better suited for a specific task, ranging in price from about US$1.50 to US$300
Texas Instruments produces the C6000 series DSP’s, which have clock speeds of 1.2 GHz and implement separate instruction and data caches. They also have an 8 MiB 2nd level cache and 64 EDMA channels. The top models are capable of as many as 8000 MIPS (instructions per second), use VLIW (very long instruction word), perform eight operations per clock-cycle and are compatible with a broad range of external peripherals and various buses (PCI/serial/etc). TMS320C6474 chips each have three such DSP's, and the newest generation C6000 chips support floating point as well as fixed point processing.
Freescale produces a multi-core DSP family, the MSC81xx. The MSC81xx is based on StarCore Architecture processors and the latest MSC8144 DSP combines four programmable SC3400 StarCore DSP cores. Each SC3400 StarCore DSP core has a clock speed of 1 GHz.
CEVA, Inc. produces and licenses three distinct families of DSPs. Perhaps the best known and most widely deployed is the CEVA-TeakLite DSP family, a classic memory-based architecture, with 16-bit or 32-bit word-widths and single or dual MACs . The CEVA-X DSP family offers a combination of VLIW and SIMD architectures, with different members of the family offering dual or quad 16-bit MACs. The CEVA-XC DSP family targets Software-defined Radio (SDR) modem designs and leverages a unique combination of VLIW and Vector architectures with 32 16-bit MACs.
Analog Devices produce the SHARC-based DSP and range in performance from 66 MHz/198 MFLOPS (million floating-point operations per second) to 400 MHz/2400 MFLOPS. Some models support multiple multipliers and ALUs, SIMD instructions and audio processing-specific components and peripherals. The Blackfin family of embedded digital signal processors combine the features of a DSP with those of a general use processor. As a result, these processors can run simple operating systems like μCLinux, velOSity and Nucleus RTOS while operating on real-time data.
NXP Semiconductors produce DSP's based on TriMedia VLIW technology, optimized for audio and video processing. In some products the DSP core is hidden as a fixed-function block into a SoC, but NXP also provides a range of flexible single core media processors. The TriMedia media processors support both fixed-point arithmetic as well as floating-point arithmetic, and have specific instructions to deal with complex filters and entropy coding.
Most DSP's use fixed-point arithmetic, because in real world signal processing the additional range provided by floating point is not needed, and there is a large speed benefit and cost benefit due to reduced hardware complexity. Floating point DSP's may be invaluable in applications where a wide dynamic range is required. Product developers might also use floating point DSP's to reduce the cost and complexity of software development in exchange for more expensive hardware, since it is generally easier to implement algorithms in floating point.
Generally, DSP's are dedicated integrated circuits; however DSP functionality can also be produced by using field-programmable gate array chips (FPGA’s).
- ^ Yovits, Marshall C. (1993). Advances in computers. 37. Academic Press. pp. 105–107. http://books.google.com.sg/books?id=vL-bB7GALAwC&pg=PA105.
- ^ Liptak, Béla G. (2006). Instrument Engineers' Handbook: Process control and optimization. 2. CRC Press. pp. 11–12. http://books.google.com/books?id=TxKynbyaIAMC&pg=PA11.
- Introduction to DSP - Processor tutorial
- DSP Discussion Groups
- DSP Online Book
- Pocket Guide to Processors for DSP - Berkeley Design Technology, INC
- Texas Instruments DSP Homepage
- Analog Devices Homepage
- Freescale Semiconductor Homepage
- CEVA, Inc. Homepage
- DSP-FPGA.com Magazine
CPU technologies Architecture ParallelismPipelineLevelThreads Types Components Power management
Wikimedia Foundation. 2010.
Look at other dictionaries:
Digital Signal Processor — [engl.], DSP … Universal-Lexikon
Digital Signal Processor — Processeur de signal numérique Un Digital Signal Processor ou DSP en anglais, soit « processeur de signal numérique », est un microprocesseur optimisé pour les calculs. Son application principale est le traitement du signal numérique… … Wikipédia en Français
Digital Signal Processor — DSP Grafikchip Ein Digitaler Signalprozessor (engl. digital signal processor, DSP) dient der kontinuierlichen Bearbeitung von digitalen Signalen (z. B. Audio oder Videosignale) durch die Digitale Signalverarbeitung. Zur Verarbeitung von analogen… … Deutsch Wikipedia
Digital signal processing — (DSP) is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing. DSP includes… … Wikipedia
Signal processor — A signal processor, in the realm of digital audio, is a device that modifies an audio signal, either electric or digital. It can be a piece of electronic hardware or computer software. A basic example of a signal processor is a high pass filter,… … Wikipedia
Digital Signal Synthesis — Ein Waveformgenerator ist eine wesentliche Baugruppe eines kohärenten Radargerätes. Der Waveformgenerator erzeugt im Radarsender das Sendesignal auf der Zwischenfrequenz. Das kann einmal nur eine durch die Hüllkurve eines Rechteckimpulses… … Deutsch Wikipedia
Ensoniq Signal Processor — (ESP)The Ensoniq ESP was used in many of the company s musical instruments and on their Soundscape Elite PC ISA sound card. It was used to enhance the synthesizer s audio samples with digital effects, enhancing the realism of the overall sound.… … Wikipedia
Quad Digital Audio Processor — The Quad Digital Audio Processor (QDAP) was a Digital Signal Processor (DSP) based printed circuit card designed at Computer Consoles Inc. (CCI) in Rochester, NY. The QDAP was a service circuit module developed as part of the companies digital… … Wikipedia
dbx Model 700 Digital Audio Processor — The dbx Model 700 Digital Audio Processor was a professional audio ADC/DAC combination unit, which digitized a stereo analog audio input into a bitstream, which was then encoded and encapsulated in an analog composite video signal, for recording… … Wikipedia
Dbx Model 700 Digital Audio Processor — The dbx Model 700 Digital Audio Processor was a professional audio ADC/DAC combination unit, which digitized a stereo analog audio input into a bitstream, which was then encoded and encapsulated in an analog composite video signal, for recording… … Wikipedia