Protein subcellular localization prediction

Protein subcellular localization prediction involves the computational prediction of where a protein resides in a cell. Prediction of protein subcellular localization is an important component of bioinformatics-based prediction of protein function and genome annotation, and it can aid the identification of drug targets.

Most eukaryotic proteins are encoded in the nuclear genome and synthesized in the cytosol, but many need to be further sorted before they reach their final destination. For prokaryotes, proteins are synthesized in the cytoplasm and some must be targeted to other locations such as to a cell membrane or the extracellular environment. Proteins must be localized at their appropriate subcellular compartment to perform their desired function.

Experimentally determining the subcellular localization of a protein is a laborious and time consuming task. Through the development of new approaches in computer science, coupled with an increased dataset of proteins of known localization, computational tools can now provide fast and accurate localization predictions for many organisms. This has resulted in subcellular localization prediction becoming one of the challenges being successfully aided by bioinformatics. Many protein subcellular localization prediction methods now exceed the accuracy of some high-throughput laboratory methods for the identification of protein subcellular localization. [Rey, S., J.L. Gardy, and F.S.L. Brinkman (2005). "Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria." BMC Genomics 6:162]


Several computational tools for predicting the subcellular localization of a protein are publicly available, a few of which are listed below. Note that the number of different subcellular localizations predicted for each method varies, and the accuracy of methods varies, so different methods are suitable depending on what you want to predict and how sensitive or specific you wish your analysis to be. Methods for the prediction of bacterial localization predictors, and their accuracy, have been recently reviewed. [Gardy, J.L., and F.S.L. Brinkman (2006). "Methods for predicting bacterial protein subcellular localization." Nature Reviews Microbiology 4:741-751.] See also the [] portal for a more extensive list of localization predictors for both bacteria and eukaryotes:

* [ BaCelLo] : Prediction of eukaryotic protein subcellular localization. Unlike other methods, the predictions are balanced among different classes and all the localizations that are predicted are considered as equiprobable, to avoid mispredictions. [Pierleoni A., Martelli P.L., et al. (2006). "BaCelLo: a balanced subcellular localization predictor." Bioinformatics 22:e408-16.]

* [ CELLO] : CELLO uses a two-level Support Vector Machine system to assign localizations to both prokaryotic and eukaryotic proteins. [Yu, C.S., Lin, C.J., Hwang, J.K. (2004). "Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions." Protein Sci 13:1402-6.] [Yu, C.S., Chen, Y.C., Lu, C.H., Hwang, J.K. (2006). "Prediction of protein subcellular localization." Proteins 64:643-51.]

* [ HSLpred] : This method allow to predict subcellular localization of human proteins. This method combines power of composition based SVM models and similarity search techniques PSI-BLAST. [ J. Biol. Chem. 2005, 280:14427–14432.

* [ LOCtree] : Prediction based on mimicking the cellular sorting mechanism using a hierarchical implementation of support vector machines. LOCtree is a comprehensive predictor incorporating predictions based on PROSITE/PFAM signatures as well as SwissProt keywords. [Nair, R. & Rost,B. (2005). "Mimicking cellular sorting improves prediction of Subcellular Localization." J Mol Biol 348:85-100.]

* [ MultiLoc] : An SVM-based prediction engine for a wide range of subcellular locations. [A. Höglund, P. Dönnes, T. Blum, H.-W. Adolph, O. Kohlbacher (2006). MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs, and amino acid composition. Bioinformatics 22(10):1158-65.]

* [ PSORT] : The first widely used method for protein subcellular localization prediction, developed under the leadership of Kenta Nakai. [Nakai, K., and M. Kanehisa (1991). Expert system for predicting protein localization sites in gram-negative bacteria. Proteins. 11:95-110.] Now researchers are also encouraged to use other PSORT programs such as WoLF PSORT and PSORTb for making predictions for certain types of organisms (see below). PSORT prediction performances are lower than those of recently developed predictors.

* [ PSORTb] : Prediction of bacterial protein localization. [Gardy, J.L., C. Spencer, et al. (2003). "PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria." Nucleic Acids Res 31:3613-7.] [Gardy, J.L., M. Laird, F. Chen, S. Rey, C.J. Walsh, G.E. Tusnády, M. Ester, F.S.L. Brinkman (2005). "PSORT-B v.2.0: Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis." Bioinformatics. 21:617-623.]

* [ PredictNLS] : Prediction of nuclear localization signals. [Nair, R., P. Carter, et al. (2003). "NLSdb: database of nuclear localization signals." Nucleic Acids Res 31: 397-9.]

* [ Proteome Analyst] : Prediction of protein localization for both prokaryotes and eukaryotes using a text mining approach. [Lu, Z., D. Szafron D, et al. (2004). "Predicting subcellular localization of proteins using machine-learned classifiers." Bioinformatics 20:547-56.]

* [ SecretomeP] : Prediction of eukaryotic proteins that are secreted via a non-traditional secretory mechanism. [Bendtsen J.D., L.J. Jensen, et al. (2004). "Feature-based prediction of non-classical and leaderless protein secretion." Protein Eng Des Sel. 17:349-56.]

* [ SherLoc] : An SVM-based predictor combining MultiLoc with text-based features derived from PubMed abstracts. [H. Shatkay, A. Höglund, S. Brady, T. Blum, P. Dönnes, O. Kohlbacher (2007). SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. Bioinformatics 23(11):1410-1417.]

* [ TargetP] : Prediction of N-terminal sorting signals. [Emanuelsson, O., H. Nielsen, et al. (2000). "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence." J Mol Biol 300:1005-16.]

* [ WoLF PSORT] : An updated version of PSORT/PSORT II for the prediction of eukaryotic sequences. [Horton, P., K.-J. Park, T. Obayashi and K. Nakai (2006). "Protein Subcellular Localization Prediction with WoLF PSORT." Proceedings of Asian Pacific Bioinformatics Conference 2006, Taipei, Taiwan.]


Determining subcellular localization is important for understanding protein function and is a critical step in genome annotation.

Knowledge of the subcellular localization of a protein can significantly improve target identification during the drug discovery process. For example, secreted proteins and plasma membrane proteins are easily accessible by drug molecules due to their localization in the extracellular space or on the cell surface.

Bacterial cell surface and secreted proteins are also of interest for their potential as vaccine candidates or as diagnostic targets.

Aberrant subcellular localization of proteins has been observed in the cells of several diseases, such as cancer and Alzheimer’s disease.

Secreted proteins from some archaea that can survive in unusual environments have industrially important applications.


Further reading

* Alberts, B., D. Bray, et al. (1994). Molecular Biology of the Cell. New York and London, Garland Publishing.

* Bork, P., T. Dandekar, et al. (1998). "Predicting function: from genes to genomes and back." J Mol Biol 283:707-25.

* Emanuelsson, O. (2002). "Predicting protein subcellular localisation from amino acid sequence information." Briefings in Bioinformatics 3:361-376.

* Gardy, J.L., and F.S.L. Brinkman (2006). "Methods for predicting bacterial protein subcellular localization." Nature Reviews Microbiology 4:741-751.

* Schneider, G. and U. Fechner (2004). "Advances in the prediction of protein targeting signals" Proteomics 4:1571-1580.

* Nakai, K. (2000). "Protein sorting signals and prediction of subcellular localization" Adv. Protein Chem. 54:277-344.

External links

* [ BaCelLo] - Balanced subCellular Localization predictor
* [ CELLO] - subCELlular LOcalization predictor for prokaryotes and eukaryotes
* [ MultiLoc] - MultiLoc prediction webserver
* Protein Analysis Subcellular Localization Prediction
* [] - A portal for protein subcellular localization predictors
* [ SherLoc] - SherLoc prediction webserver

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Protein Analysis Subcellular Localization Prediction — Protein (or in general, proteome) Analysis Subcellular Localization Prediction is a process (usually through the use of web based software) of predicting the location or destination of a protein within the cell using only the protein sequence as… …   Wikipedia

  • Subcellular localization — The cells of eukaryotic organisms are elaborately subdivided into functionally distinct membrane bound compartments. Some major constituents of eukaryotic cells are: extracellular space, cytoplasm, nucleus, mitochondria, Golgi apparatus,… …   Wikipedia

  • Fiona Brinkman — (nee Lawson) is an Associate Professor in Bioinformatics and Genomics ( [ Department of Molecular Biology and Biochemistry] ) at Simon Fraser University, British Columbia, Canada, and is a leader in the area of pathogen… …   Wikipedia

  • PROSITE — is a database of protein families and domains. It consists of entries describing the domains, families and functional sites as well as amino acid patterns, signatures, and profiles in them. These are manually curated by a team of the Swiss… …   Wikipedia

  • Pseudo amino acid composition — Pseudo amino acid composition, or PseAA composition, was originally introduced by [ Professor Kuo Chen Chou] [1] in 2001 to represent protein samples for statistical prediction. In contrast… …   Wikipedia

  • Organelle — A typical animal cell. Within the cytoplasm, the major organelles and cellular structures include: (1) nucleolus (2) nucleus (3) ribosome (4) vesicle (5) rough endoplasmic reticulum (6) Golgi apparatus …   Wikipedia

  • Mahalanobis distance — In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936.[1] It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown… …   Wikipedia

  • PSKH1 — Protein serine kinase H1, also known as PSKH1, is a human gene.cite web | title = Entrez Gene: PSKH1 protein serine kinase H1| url = Cmd=ShowDetailView TermToSearch=5681| accessdate = ] PBB Summary …   Wikipedia

  • PACSIN1 — Protein kinase C and casein kinase substrate in neurons 1 Rendering based on PDB 3ABH …   Wikipedia

  • NEDD4 — Neural precursor cell expressed, developmentally down regulated 4 Rendering based on PDB 2KPZ …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.