Sequence database

Sequence database

In the field of bioinformatics, a sequence database is a large collection of computerized ("digital") nucleic acid sequences, protein sequences, or other sequences stored on a computer. A database can include sequences from only one organism (e.g., a database for all proteins in Saccharomyces cerevisiae), or it can include sequences from all organisms whose DNA has been sequenced.


Search issues

Sequence databases can be searched using a variety of methods. The most common is probably searching for a sequence similar to a certain target protein or gene whose sequence is already known to the user. The BLAST program is a method of this type.

Many inputs create inconsistencies

A major problem with all the large genetic sequence databases is that records are deposited in them from a wide range of sources, from individual researchers to large genome sequencing centers. As a result, the sequences themselves, and especially the biological annotations attached to these sequences, vary tremendously in quality. Also there is much redundancy, as multiple labs often submit numerous sequences that are identical, or nearly identical, to others in the databases.

Many annotations are based not on laboratory experiments, but on the results of sequence similarity searches for previously-annotated sequences. Of course, once a sequence has been annotated based on similarity to others, and itself deposited in the database, it can also become the basis for future annotations. This leads to the transitive annotation problem because there may be several such annotation transfers by sequence similarity between a particular database record and actual wet lab experimental information. Therefore, one must always regard the biological annotations in major sequence databases with a considerable degree of skepticism, unless they can be verified by reference to published papers describing high-quality experimental data, or at least by reference to a human-curated sequence database.

See also

Database formats

Distributed Computing

External links

Major bioinformatics databases

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • International Nucleotide Sequence Database Collaboration — The International Nucleotide Sequence Database Collaboration (INSDC, consists of a joint effort to collect and disseminate databases containing DNA and RNA sequences. It involves the following computerized databases: DNA Data… …   Wikipedia

  • Sequence clustering — In bioinformatics, sequence clustering algorithms attempt to group sequences that are somehow related. The sequences can be either of genomic, transcriptomic (ESTs) or protein origin.For proteins, homologous sequences are typically grouped into… …   Wikipedia

  • Sequence analysis — The term sequence analysis in biology implies subjecting a DNA or peptide sequence to sequence alignment, sequence databases, repeated sequence searches, or other bioinformatics methods on a computer.Sequence analysis in molecular biology and… …   Wikipedia

  • Sequence alignment — In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.[1]… …   Wikipedia

  • Sequence profiling tool — A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and… …   Wikipedia

  • Sequence motif — In genetics, a sequence motif is a nucleotide or amino acid sequence pattern that is widespread and has, or is conjectured to have, a biological significance. For proteins, a sequence motif is distinguished from a structural motif, a motif formed …   Wikipedia

  • Séquence principale — Le diagramme de Hertzsprung Russell figure les étoiles. En abscisse, l indice de couleur (B V) ; en ordonnée, la magnitude absolue. La séquence principale se voit comme une bande diagonale marquée allant du haut à gauche au bas à droite. Ce… …   Wikipédia en Français

  • database — /day teuh bays /, n. 1. a comprehensive collection of related data organized for convenient access, generally in a computer. 2. See data bank. Also, data base, data base. [1965 70; DATA + BASE1] * * * Collection of data or information organized… …   Universalium

  • Liste d'espèces eubactériennes dont le génome est séquencé — Cette liste d espèces eubactériennes dont le génome est séquencé (qui peut ne pas être à jour) présente une liste d espèces d Eubacteria dont le génome a été séquencé. La plupart de ces séquences ont été publiées dans des bases de données… …   Wikipédia en Français

  • Liste d'espèces protéobacteriennes dont le génome est séquencé — Cette liste d espèces protéobacteriennes dont le génome est séquencé (qui peut ne pas être à jour) présente une liste d espèces de Proteobacteria dont le génome a été séquencé. La plupart de ces séquences ont été publiées dans des bases de… …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.