Shotgun sequencing

In genetics, shotgun sequencing, also known as shotgun cloning, is a method used for sequencing long DNA strands. It is named by analogy with the rapidly-expanding, quasi-random firing pattern of a shotgun.

Since the chain termination method of DNA sequencing can only be used for fairly short strands (100 to 1000 basepairs), longer sequences must be subdivided into smaller fragments, and subsequently re-assembled to give the overall sequence. Two principal methods are used for this: chromosome walking, which progresses through the entire strand, piece by piece, and shotgun sequencing, which is a faster but more complex process, and uses random fragments.

In shotgun sequencing cite journal
last = Staden
first = R
coauthors =
title = A strategy of DNA sequencing employing computer programs.
journal = Nucleic Acids Research
volume = 6
issue = 7
pages = 2601-10
date = 1979
pmid = 461197
doi = 10.1093/nar/9.13.3015
] [cite journal
last = Anderson
first = S
coauthors =
title = Shotgun DNA sequencing using cloned DNase I-generated fragments
journal = Nucleic Acids Research
volume = 9
issue = 13
pages = 3015–27
date = 1981
pmid = 6269069
doi = 10.1093/nar/9.13.3015
] , DNA is broken up randomly into numerous small segments, which are sequenced using the chain termination method to obtain "reads". Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a contiguous sequence .

For example, consider the following two rounds of shotgun reads:

In this extremely simplified example, none of the reads cover the full length of the original sequence, however, the four reads can be assembled into the original sequence using the overlap of their ends to align and order them. In reality, this process uses enormous amounts of information that are rife with ambiguities and sequencing errors. Assembly of complex genomes is additionally complicated by the great abundance of repetitive sequence, meaning similar short reads could come from completely different parts of the sequence.

Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence. For example, to complete the Human Genome Project, most of the human genome was sequenced at 12X or greater "coverage"; that is, each base in the final sequence was present, on average, in 12 reads. Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the (euchromatic) human genome.

Whole genome shotgun sequencing

Whole genome shotgun sequencing for small (4000 to 7000 basepair) genomes was already in use in 1979 broader application benefited from pairwise end sequencing, known colloquially as "double-barrel shotgun sequencing". As sequencing projects began to take on longer and more complicated DNAs, multiple groups began to realize that useful information could be obtained by sequencing both ends of a fragment of DNA. Although sequencing both ends of the same fragment and keeping track of the paired data was more cumbersome than sequencing a single end of two distinct fragments, the knowledge that the two sequences were oriented in opposite directions and were about the length of a fragment apart from each other was valuable in reconstructing the sequence of the original target fragment. The first published description of the use of paired ends was in 1990 [cite journal
last = Edwards
first = A
coauthors = Caskey, T
title = Closure strategies for random DNA sequencing
journal = Methods: A Companion to Methods in Enzymology
volume = 3
issue = 1
pages = 41–47
date = 1991
doi = 10.1016/S1046-2023(05)80162-8
] as part of the sequencing of the human HPRT locus, although the use of paired ends was limited to closing gaps after the application of a traditional shotgun sequencing approach. The first theoretical description of a pure pairwise end sequencing strategy, assuming fragments of constant length, was in 1991 [cite journal
last = Edwards
first = A
coauthors = Voss, H.; Rice, P.; Civitello, A.; Stegemann, J.; Schwager, C.; Zimmerman, J.; Erfle, H.; Caskey, T.; Ansorge, W.
title = Automated DNA sequencing of the human HPRT locus
journal = Genomics
volume = 6
pages = 593–608
date = 1990
pmid = 2341149
doi = 10.1016/0888-7543(90)90493-E
] . At the time, there was community consensus that the optimal fragment length for pairwise end sequencing would be three times the sequence read length. In 1995 Roach et al. [cite journal
last = Roach
first = JC
coauthors = Boysen, C; Wang, K; Hood, L
title = Pairwise end sequencing: a unified approach to genomic mapping and sequencing
journal = Genomics
volume = 26
pages = 345–353
date = 1995
pmid = 7601461
doi = 10.1016/0888-7543(95)80219-C
] introduced the innovation of using fragments of varying sizes, and demonstrated that a pure pairwise end-sequencing strategy would be possible on large targets. The strategy was subsequently adopted by The Institute for Genomic Research (TIGR) to sequence the genome of the bacterium "Haemophilus influenzae" in 1995 [cite journal
last = Fleischmann
first = RD
coauthors = et al.
title = Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.
journal = Science
volume = 269
issue = 5223
pages = 496–512
date = 1995
pmid = 7542800
doi = 10.1126/science.7542800
] , and then by Celera Genomics to sequence the fruit fly genome in 2000 [cite journal
last = Adams
first = MD
coauthors = et al. | title = The genome sequence of Drosophila melanogaster
journal = Science
volume = 287
issue = 5461
pages = 2185–95
date = 2000
pmid = 10731132 | doi = 10.1126/science.287.5461.2185
] , and subsequently the human genome.

To apply the strategy, high-molecular-weight DNA is sheared into random fragments, size-selected (usually 2, 10, 50, and 150 kb), and cloned into an appropriate vector. The clones are then sequenced from both ends using the chain termination method yielding two short sequences. Each sequence is called an "end-read" or "read" and two reads from the same clone are referred to as "mate pairs". Since the chain termination method usually can only produce reads between 500 and 1000 bases long, in all but the smallest clones, mate pairs will rarely overlap.

The original sequence is reconstructed from the reads using sequence assembly software. First, overlapping reads are collected into longer composite sequences known as "contigs". Contigs can be linked together into "scaffolds" by following connections between mate pairs. The distance between contigs can be inferred from the mate pair positions if the average fragment length of the library is known and has a narrow window of deviation.

Redundancy (sometimes erroneously referred to as "coverage") is the average number of reads representing a given nucleotide in the reconstructed sequence. It can be calculated from the length of the original genome ("G"), the number of reads("N"), and the average read length("L") as NL/G. For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x coverage. This parameter also enables one to estimate other quantities, such as the percentage of the genome covered by reads (the coverage). The subject of DNA sequencing theory addresses the relationships of such quantities.

Proponents of this approach argue that it is possible to sequence the whole genome at once using large arrays of sequencers, which makes the whole process much more efficient than more traditional approaches. Detractors argue that although the technique quickly sequences large regions of DNA, its ability to correctly link these regions is suspect, particularly for genomes with repeating regions. As sequence assembly programs become more sophisticated and computing power becomes cheaper, it may be possible to overcome this limitationFact|date=February 2007.

References

*cite web | title=Shotgun sequencing comes of age | work=The Scientist | url=http://www.the-scientist.com/news/20021231/06 | accessdate=December 31 | accessyear=2002
*cite web | title=Shotgun sequencing finds nanoorganisms - Probe of acid mine drainage turns up unsuspected virus-sized Archaea
work=SpaceRef.com| url=http://www.spaceref.com/news/viewpr.rss.html?pid=21532
accessdate=December 23 | accessyear=2006

External links


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Shotgun Sequencing — bzw. Schrotschusssequenzierung ist in der Molekularbiologie eine Methode zur Sequenzierung langer DNA Stränge. Sie wurde von Frederick Sanger 1982 entwickelt. Hierbei wird die DNA mehrfach kopiert und die Kopien werden zufällig in zahlreiche… …   Deutsch Wikipedia

  • Shotgun sequencing — An approach used to decode a genome by shredding ( shotgunning ) it into smaller fragments of DNA which can then be individually sequenced. The sequences of these fragments are then ordered, based on overlaps in the genetic code, and finally… …   Medical dictionary

  • shotgun sequencing — noun A DNA sequencing technique in which a large number of small fragments of a long DNA strand are generated at random, sequenced, and reassembled to form a sequence of the original strand …   Wiktionary

  • whole-genome shotgun sequencing — An approach to genome sequencing in which the complete genome is broken into random fragments, which are then individually sequenced. Finally the fragments are placed in the proper order using sophisticated computer programs …   Dictionary of microbiology

  • Shotgun proteomics — is a method of identifying proteins in complex mixtures using a combination of high performance liquid chromatography combined with mass spectrometry.cite journal |author=Hu L, Ye M, Jiang X, Feng S, Zou H |title=Advances in hyphenated analytical …   Wikipedia

  • Shotgun (disambiguation) — A shotgun may refer to:*Shotgun, a firearm *Shotgun (cannabis), one of a number of terms in the slang of cannabis users * Shotgun (film), a 1955 B Western starring Sterling Hayden * Shotgun , a song by the Canadian alternative rock band Moist *… …   Wikipedia

  • Shotgun lipidomics — In lipidomics, the process of shotgun lipidomics (named by analogy with shotgun sequencing uses analytical chemistry to investigate the biological function, significance, and sequelae of alterations in lipids and protein constituents mediating… …   Wikipedia

  • Shotgun — Der Begriff Shotgun (von engl. shotgun = Schrotflinte) bezeichnet: eine Flinte eine molekularbiologische Methode zur Entschlüsselung von Genomen, siehe Shotgun Sequencing einen Haustyp, der vor allem im Süden der USA verbreitet ist, siehe Shotgun …   Deutsch Wikipedia

  • shotgun genome sequencing — A strategy for sequencing a whole genome, in which the genomic DNA is initially fragmented into pieces small enough to be sequenced. Specialized computer software is then used to piece together the individual sequences to create long contiguous… …   Glossary of Biotechnology

  • Full genome sequencing — Genome sequencing redirects here. For the sequencing only of DNA, see DNA sequencing. An image of the 46 chromosomes, making up the diploid genome of human male. (The mitochondrial chromosome is not shown.) Full genome sequencing (FGS), also… …   Wikipedia


Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.