Chemical similarity

Chemical similarity (or molecular similarity) refers to the similarity of chemical elements, molecules or chemical compounds with respect to either structural or functional qualities, i.e. the effect that the chemical compound has on reaction partners in anorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity of a compound. In general terms, function can be related to the chemical activity of compounds (among others).

The notion of chemical similarity (or molecular similarity) is one of the most important concepts in chemoinformatics.[1][2] It plays an important role in modern approaches to predicting the properties of chemical compounds, designing chemicals with a predefined set of properties and, especially, in conducting drug design studies by screening large databases containing structures of available (or potentially available) chemicals. These studies are based on the similar property principle of Johnson and Maggiora, which states: similar compounds have similar properties.[1]


Similarity Measures

Chemical similarity is often described as an inverse of a measure of distance in descriptor space. Distance measures can be classified into Euclidean measures and non-Euclidean measures depending on whether the triangle inequality holds.

Similarity Search and Virtual Screening

The similarity-based [3] virtual screening (a kind of ligand-based virtual screening) assumes that all compounds in a database that are similar to a query compound have similar biological activity. Although this hypothesis is not always valid,[4] quite often the set of retrieved compounds is considerably enriched with actives.[5] To achieve high efficacy of similarity-based screening of databases containing millions of compounds, molecular structures are usually represented by molecular screens (structural keys) or by fixed-size or variable-size molecular fingerprints. Molecular screens and fingerprints can contain both 2D- and 3D-information. However, the 2D-fingerprints, which are a kind of binary fragment descriptors, dominate in this area. Fragment-based structural keys, like MDL keys,[6] are sufficiently good for handling small and medium-sized chemical databases, whereas processing of large databases is performed with fingerprints having much higher information density. Fragment-based Daylight,[7] BCI,[8] and UNITY 2D (Tripos[9]) fingerprints are the best known examples. The most popular similarity measure for comparing chemical structures represented by means of fingerprints is the Tanimoto (or Jaccard) coefficient T. Two structures are usually considered similar if T > 0.85 (for Daylight fingerprints).


  1. ^ a b A. M. Johnson, G. M. Maggiora (1990). Concepts and Applications of Molecular Similarity. New York: John Willey & Sons. ISBN 0471621757. 
  2. ^ N. Nikolova, J. Jaworska (2003). "Approaches to Measure Chemical Similarity - a Review". QSAR & Combinatorial Science 22 (9-10): 1006–1026. doi:10.1002/qsar.200330831. 
  3. ^ S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12. DOI:10.1186/1758-2946-1-12
  4. ^ H. Kubinyi (1998). "Similarity and Dissimilarity: A Medicinal Chemist’s View". Persp. Drug Discov. Design 9-11: 225–252. doi:10.1023/A:1027221424359. 
  5. ^ Y. C. Martin, J. L. Kofron, L. M. Traphagen (2002). "Do structurally similar molecules have similar biological activity?". J. Med. Chem. 45 (19): 4350–4358. doi:10.1021/jm020155c. PMID 12213076. 
  6. ^ J. L. Durant, B. A. Leland, D. R. Henry, J. G. Nourse (2002). "Reoptimization of MDL Keys for Use in Drug Discovery". J. Chem. Inf. Comput. Sci. 42 (6): 1273–1280. PMID 12444722. 
  7. ^ "Daylight Chemical Information Systems Inc.". 
  8. ^ "Barnard Chemical Information Ltd.". 
  9. ^ "Tripos Inc.". 

External links

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Chemical database — A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data. Contents 1 Types of chemical databases… …   Wikipedia

  • Similarity — Similar redirects here. For the place in India, see Shimla. Contents 1 Specific definitions 2 In mathematics 3 In computer science 4 In other fields …   Wikipedia

  • chemical compound — Introduction  any substance composed of identical molecules consisting of atoms (atom) of two or more chemical elements (chemical element).       All the matter in the universe is composed of the atoms of more than 100 different chemical elements …   Universalium

  • Similarity matrix — A similarity matrix is a matrix of scores which express the similarity between two data points. Similarity matrices are strongly related to their counterparts, distance matrices and substitution matrices.Similarity of Matrix Representations is… …   Wikipedia

  • Semantic similarity — or semantic relatedness is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content. Concretely, this can be achieved for instance by defining a topological… …   Wikipedia

  • rare-earth element — /rair errth /, Chem. any of a group of closely related metallic elements, comprising the lanthanides, scandium, and yttrium, that are chemically similar by virtue of having the same number of valence electrons. Also called rare earth metal. [1955 …   Universalium

  • Молекулярное подобие — Понятие молекулярного подобия (или химического подобия, chemical similarity) является одной из ключевых концепций хемоинформатики [1][2]. Оно играет важную роль в современных подходах к прогнозированию свойств химических соединений, дизайну новых …   Википедия

  • Evolution —     Evolution (History and Scientific Foundation)     † Catholic Encyclopedia ► Evolution (History and Scientific Foundation)     The world of organisms comprises a great system of individual forms generally classified according to structural… …   Catholic encyclopedia

  • Molecule mining — This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data… …   Wikipedia

  • Latent Semantic Structure Indexing — (LaSSI) is a technique for calculating chemical similarity derived from Latent semantic analysis (LSA).LaSSI was developed at Merck Co. and patented in 2007 [ Parser?patentnumber=7219020] by Richard Hull, Eugene… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.