- Genome project
Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus) and to annotate protein-coding genes and other important genome-encoded features. The genome sequence of an organism includes the collective DNA sequences of each chromosome in the organism. For a bacterium containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of autosomes and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences.
The Human Genome Project was a landmark genome project that is already having a major impact on research across the life sciences, with potential for spurring numerous medical and commercial developments.
In 2011 ICAR scientists were the first in the world to sequence the pigeon pea genome. it was a purely indigenous effort by 31 scientists led by Nagendra Kumar Singh of NRCPB. The first draft of the sequence was published in J. Plant Biochem. Biotechnol 
Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which the DNA originated. In a shotgun sequencing project, all the DNA from a source (usually a single organism, anything from a bacterium to a mammal) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines, which can read up to 900 nucleotides or bases at a time. (The four bases are adenine, guanine, cytosine, and thymine, represented as AGCT.) A genome assembly algorithm works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or reads, overlap. These overlapping reads can be merged together, and the process continues.
Genome assembly is a very difficult computational problem, made more difficult because many genomes contain large numbers of identical sequences, known as repeats. These repeats can be thousands of nucleotides long, and some occur in thousands of different locations, especially in the large genomes of plants and animals.
The resulting (draft) genome sequence is produced by combining the information sequenced contigs and then employing linking information to create scaffolds. Scaffolds are positioned along the physical map of the chromosomes creating a "golden path".
Originally, most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased. An example of such assembler Short Oligonucleotide Analysis Package developed by BGI for de novo assembly of human-sized genomes, alignment, SNP detection, resequencing, indel finding, and structural variation analysis.
- identifying elements on the genome, a process called gene prediction, and
- attaching biological information to these elements.
Automatic annotation tools try to perform all this by computer analysis, as opposed to manual annotation (a.k.a. curation) which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.
The basic level of annotation is using BLAST for finding similarities, and then annotating genomes based on that. However, nowadays more and more additional information is added to the annotation platform. The additional information allows manual annotators to deconvolute discrepancies between genes that are given the same annotation. Some databases use genome context information, similarity scores, experimental data, and integrations of other resources to provide genome annotations through their Subsystems approach. Other databases (e.g. Ensembl) rely on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline.
Structural annotation consists of the identification of genomic elements.
- ORFs and their localisation
- gene structure
- coding regions
- location of regulatory motifs
Functional annotation consists of attaching biological information to genomic elements.
- biochemical function
- biological function
- involved regulation and interactions
These steps may involve both biological experiments and in silico analysis.
A variety of software tools have been developed to permit scientists to view and share genome annotations.
Genome annotation is the next major challenge for the Human Genome Project, now that the genome sequences of human and several model organisms are largely complete. Identifying the locations of genes and other genetic control elements is often described as defining the biological "parts list" for the assembly and normal operation of an organism. Scientists are still at an early stage in the process of delineating this parts list and in understanding how all the parts "fit together".
Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available biological databases accessible via the web and other electronic means. Here is an alphabetical listing of on-going projects relevant to genome annotation:
- ENCyclopedia Of DNA Elements (ENCODE)
- Entrez Gene
- Gene Ontology Consortium
- Vertebrate and Genome Annotation Project (Vega)
At Wikipedia, genome annotation has started to become automated under the auspices of the Gene Wiki portal which operates a bot that harvests gene data from research databases and creates gene stubs on that basis.
When is a genome project finished?
When sequencing a genome, there are usually regions that are difficult to sequence (often regions with highly repetitive DNA). Thus, 'completed' genome sequences are rarely ever complete, and terms such as 'working draft' or 'essentially complete' have been used to more accurately describe the status of such genome projects. Even when every base pair of a genome sequence has been determined, there are still likely to be errors present because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences of mitochondria and (for plants) chloroplasts as these organelles have their own genomes.
It is often reported that the goal of sequencing a genome is to obtain information about the complete set of genes in that particular genome sequence. The proportion of a genome that encodes for genes may be very small (particularly in eukaryotes such as humans, where coding DNA may only account for a few percent of the entire sequence). However, it is not always possible (or desirable) to only sequence the coding regions separately. Also, as scientists understand more about the role of this noncoding DNA (often referred to as junk DNA), it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism.
In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Such projects may also include gene prediction to find out where the genes are in a genome, and what those genes do. There may also be related projects to sequence ESTs or mRNAs to help find out where the genes actually are.
Historical and technological perspectives
Historically, when sequencing eukaryotic genomes (such as the worm Caenorhabditis elegans) it was common to first map the genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be 'shotgun sequenced' in one go (there are caveats to this approach though when compared to the traditional approach).
Improvements in DNA sequencing technology has meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per base pair) and newer technology has also meant that genomes can be sequenced far more quickly.
When research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance as model organism or have a relevance to human health (e.g. pathogenic bacteria or vectors of disease such as mosquitos) or species which have commercial importance (e.g. livestock and crop plants). Secondary emphasis is placed on species whose genomes will help answer important questions in molecular evolution (e.g. the common chimpanzee).
In the future, it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. For humans, this will allow us to better understand aspects of human genetic diversity.
Example genome projects
Many organisms have genome projects that have either been completed or will be completed shortly, including:
- Humans, Homo sapiens; see Human genome project
- Palaeo-Eskimo, an ancient-human
- Neanderthal, "Homo neanderthalensis" (partial); see Neanderthal Genome Project
- Common Chimpanzee Pan troglodytes; see Chimpanzee Genome Project
- Domestic Cow 
- Bovine Genome
- Honey Bee Genome Sequencing Consortium
- Human microbiome project
- International Grape Genome Program
- International HapMap Project
- Joint Genome Institute
- Model organism
- National Center for Biotechnology Information
- Illumina, private company involved in genome sequencing
- Knome, private company offering genome analysis & sequencing
- ^ "Potential Benefits of Human Genome Project Research". Department of Energy, Human Genome Project Information. 2009-10-09. http://www.ornl.gov/sci/techresources/Human_Genome/project/benefits.shtml. Retrieved 2010-06-18.
- ^ Nagendra Kumar Singh et al. (2011)The First Draft of pigeon pea genome sequence. J. Plant Biochem Biotechnol DOI 10.1007/s13562-011-0088-8
- ^ Li, R.; Zhu, H.; Ruan, J.; Qian, W.; Fang, X.; Shi, Z.; Li, Y.; Li, S. et al. (Feb 2010). "De novo assembly of human genomes with massively parallel short read sequencing.". Genome Res 20 (2): 265–72. doi:10.1101/gr.097261.109. PMC 2813482. PMID 20019144. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2813482.
- ^ a b Rasmussen, M.; Li, Y.; Lindgreen, S.; Pedersen, JS.; Albrechtsen, A.; Moltke, I.; Metspalu, M.; Metspalu, E. et al. (Feb 2010). "Ancient human genome sequence of an extinct Palaeo-Eskimo.". Nature 463 (7282): 757–62. doi:10.1038/nature08835. PMID 20148029.
- ^ Wang, J.; Wang, W.; Li, R.; Li, Y.; Tian, G.; Goodman, L.; Fan, W.; Zhang, J. et al. (Nov 2008). "The diploid genome sequence of an Asian individual.". Nature 456 (7218): 60–5. doi:10.1038/nature07484. PMC 2716080. PMID 18987735. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2716080.
- ^ Stein, L. (2001). "Genome annotation: from sequence to biology". Nature Reviews Genetics 2 (7): 493–503. doi:10.1038/35080529. PMID 11433356.
- ^ "Ensembl's genome annotation pipeline online documentation". http://www.ensembl.org/info/about/docs/index.html.
- ^ Huss, Jon W.; Orozco, C; Goodale, J; Wu, C; Batalov, S; Vickers, TJ; Valafar, F; Su, AI (2008). "A Gene Wiki for Community Annotation of Gene Function". PLoS Biology 6 (7): e175. doi:10.1371/journal.pbio.0060175. PMC 2443188. PMID 18613750. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2443188.
- ^ http://www.eurekalert.org/pub_releases/2009-04/uoia-wma041709.php
- GOLD:Genomes OnLine Database
- Genome Project Database
- The Protein Naming Utility
- The sea urchin genome database
Wikimedia Foundation. 2010.
Look at other dictionaries:
genome project — Coordinated programme to completely sequence the genomic DNA of an organism. Usually, genomic sequencing is combined with several associated ventures; the physical mapping of the genome (to allow the genome to be sequenced) ; the sequencing of… … Dictionary of molecular biology
Human Genome Project — The Human Genome Project (HGP) was an international scientific research project with a primary goal to determine the sequence of chemical base pairs which make up DNA and to identify the approximately 25,000 genes of the human genome from both a… … Wikipedia
Human Genome Project — a federally funded U.S. scientific project to identify both the genes and the entire sequence of DNA base pairs that make up the human genome. [1985 90] * * * U.S. research effort initiated in 1990 by the U.S. Department of Energy and the… … Universalium
Chimpanzee genome project — The Chimpanzee Genome Project is an effort to determine the DNA sequence of the Chimpanzee genome. It is expected that by comparing the genomes of humans and other apes, it will be possible to better understand what makes humans distinct from… … Wikipedia
Neanderthal genome project — Max Planck Institute for Evolutionary Anthropology, in Leipzig, Germany The Neanderthal genome project is a collaboration of scientists coordinated by the Max Planck Institute for Evolutionary Anthropology in Germany and 454 Life Sciences in the… … Wikipedia
Music Genome Project — The Music Genome Project was first conceived by Will Glaser and Tim Westergren in late 1999. In January 2000, they joined forces with Jon Kraft to found Pandora Media to bring their idea to market. The Music Genome Project was an effort to… … Wikipedia
Human Genome Project — Das Humangenomprojekt (HGP, engl. Human Genome Project) wurde im Herbst 1990 mit dem Ziel gegründet, das Genom des Menschen vollständig zu entschlüsseln, d. h. die Abfolge der Basenpaare in der menschlichen DNA auf ihren einzelnen Chromosomen… … Deutsch Wikipedia
Human Genome Project:Road Map for Science and Medicine — ▪ 2001 Introduction by Judith L. Fridovich Keil Certain to rank among the all time landmarks of human technical achievement, the completion of a rough draft of the sequence of the human nuclear genome was announced in June 2000. Its… … Universalium
Personal Genome Project — The Personal Genome Project (PGP) aims to publish the complete genomes and medical records of several volunteers, in order to enable research into personalized medicine. It was initiated by Harvard University s George Church and announced in… … Wikipedia
Personal Genome Project — Saltar a navegación, búsqueda El Proyecto del Genoma Personal (Personal Genome Project en inglés) intenta publicar el genoma completo y registros médicos de varios voluntarios, para de este modo permitir la investigación en la medicina… … Wikipedia Español