Lexicostatistics is an approach to comparative linguistics that involves quantitative comparison of lexical cognates. Lexicostatistics is related to the comparative method but does not reconstruct a proto-language. It is to be distinguished from glottochronology, which attempts to use lexicostatistical methods to estimate the length of time since two or more languages diverged from a common earlier proto-language. This is merely one application of lexicostatistics, however, and other applications of it may not share the assumption of a constant rate of change for basic lexical items.

The term "lexicostatistics" is misleading in that mathematical equations are used but not statistics. Other features of a language may be used other than the lexicon, though this is not usual. Whereas the comparative method used shared identified innovations to determine sub-groups, lexicostatistics does not identify these. The latter is a distance based method but the comparative method considers language characters directly. The lexicostatistics method is a simple and fast technique relative to the comparative method but has limitations that are discussed below. It can be validated by cross-checking the trees produced by both methods.


Lexicostatistics was developed by Morris Swadesh in a series of articles in the 1950s, based on earlier ideas. The concept seems to have been originated by Dumont d'Urville in 1834 who compared various "Oceanic" languages and proposed a method for calculating a coefficient of relationship. Hymes (1960) and Embleton (1986) both review the history of lexicostatistics.


Create word list

The aim is to generate a list of universal culture-free meanings. Words are then collected for these meaning slots for each language being considered. Swadesh reduced a larger set of meanings down to 200 originally. He later found that it was necessary to reduce it further but that he could include some meanings that were not in his original list, giving his later 100-item list. The "Swadesh List" in Wiktionary gives the total 207 meanings in a number of languages. Alternative lists for particular purposes have been generated e.g. Dyen, Kruskal and Black have 200 meanings for 84 Indo-European languages in digital form.

Determine cognacies

Cognacy decisions need to be made by a trained and experienced linguist. However it should be noted that the decisions may need to be refined as the state of knowledge increases. However, lexicostatistics does not rely on all the decisions being correct. For each pair of lists the cognacy of a form could be positive, negative or indeterminate. Sometimes a language has two words for one meaning, e.g. "small" and "little" for "not big".

Calculate lexicostatistic percentages

This percentage is related to the proportion of meanings for a particular language pair that are cognate, i.e. relative to the total without indeterminacy. This value is entered into a N x N table of distances, where N is the number of languages being compared. When complete this table is half-filled in triangular form. The higher the proportion of cognacy the closer the languages are related.

Create family tree

Creation of the language tree is based solely on the table found above. Various sub-grouping methods can be used but that adopted by Dyen, Krustal and Black was:
* all lists are placed in a pool
* the two closest members are removed and form a nucleus which is placed in the pool
* this step is repeated
* under certain conditions a nucleus becomes a group
* this is repeated until the pool only contains one group.

Calculations need to be made of nucleus and group lexical percentages.


A leading exponent of lexicostatistics application has been Isidore Dyen. He used lexicostatistics to classify Austronesian languages as well as Indo-European ones. A major study of the latter was reported by Dyen, Kruskal and Black (1992). Studies have also been carried out of Amerindian and African languages.


People such as Hoijer (1956) have showed that there were difficulties in finding equivalents to the meaning items while many have found it necessary to modify Swadesh's lists. Gudschinsky (1956) questioned whether it was possible to obtain a universal list.

Factors such as borrowing, tradition and taboo can skew the results, as with other methods. Sometimes lexicostatistics has been used with lexical similarity being used rather than cognacy to find resemblances. This is then equivalent to mass comparison.

The choice of meaning slots is subjective as is the choice of synonyms.

Improved methods

Some of the modern computational statistical hypothesis testing methods can be regarded as improvements of lexicostatistics in that they use similar word lists and distance measures.


* Dobson, Annette (1969). Lexicostatistical Grouping. Anthropological Lingustics 7, 216-221.
* Dobson, Annette and Black, Paul (1979). Multidimenssional Scaling of some Lexicostatistical Data. Mathematical Scientist 1979/4, 55-61.
* Dyen, Isidore (1962). The Lexicostatistically Determined Relationship of a Language Group. International Journal of American Linguistics 28/3.
* Dyen, Isidore (1963). Lexicostistically Determined Borrowing and Taboo. Language 39, 60-66.
* Dyen, Isidore (1965). A Lexicostatistical Classification of the Austronesian Languages. International Journal of American Linguistics, Memoir 19.
* Dyen, Isidore (1973), Editor. Lexicostatistics in Genetic Linguistics. The Hague, Mouton.
* Dyen Isidore (1975). Linguistic Subgrouping and Lexicostatistics. The Hague, Mouton.
* Dyen, Isidore Kruskal, Joseph and Black, Paul (1992). An Indoeuropean Classification, a Lexicostatistical Experiment. Transactions of the American Philosophical Society 82/5.
* Embleton, Sheila (1986). Statistics in Historical Linguistics. Bochum.
* Gudschinsky, Sarah (1956). The ABCs of lexicostatistics (glottochronology).
* Hoijer, Harry (1956). Lexicostatistics : a critique. Language 32, 49-60.
* Hymes, Dell (1960). Lexicostatistics so far. Current Anthropology, 1/1, 3-44.
* McMahon, April and McMahon, Robert (2005). Language Classification by Numbers. Oxford University Press.
* Rea, John (1990). Lexicostatistics. In "Trends in Linguistics" edited by Polome, Edgar.
*Sankoff, David (1970). "On the Rate of Replacement of Word-Meaning Relationships." "Language" 46.564-569.
* Swadesh, Morris (1950). Salish Internal Relationships. International Joural of American Linguistics 16, 157-167.
* Swadesh, Morris (1952). Lexicostatistical Dating of Prehistoric Ethnic Contacts. Proceedings of American Philosophical Society 96, 452-463.
* Swadesh, Morris (1955). Towards Greater Accuracy in Lexicostatistic Dating. International Journal of American Linguistics 21, 121-137.
*Wittmann, Henri (1969). "A lexico-statistic inquiry into the diachrony of Hittite." "Indogermanische Forschungen" 74.1-10. [http://homepage.mac.com/noula/ling/1969a-lexstatHitt.pdf]
*Wittmann, Henri (1973). "The lexicostatistical classification of the French-based Creole languages." "Lexicostatistics in genetic linguistics: Proceedings of the Yale conference, April 3-4, 1971", dir. Isidore Dyen, 89-99. La Haye: Mouton. [http://homepage.mac.com/noula/ling/1973f-lexstatFC.pdf]

ee also

*Swadesh list
*Mass lexical comparison
*Basic English
*Historical linguistics
*Indo-European studies
*Comparative linguistics
*Comparative method

External links

* [http://www.ntu.edu.au/education/langs/ielex IE database]
* [http://www.specgram.com/CLIV.1/08.phlogiston.cartoon.jiu.html A simplified explanation of the difference between glottochronology and lexicostatistics.]

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • lexicostatistics — [lek΄si kō΄stə tis′tiks] n. a technique used in GLOTTOCHRONOLOGY to determine the time when the languages under study separated, based on the statistical comparison of sample word lists from the languages …   English World dictionary

  • lexicostatistics — lexicostatistic, lexicostatistical, adj. /lek si koh steuh tis tiks/, n. (used with a sing. v.) Ling. the statistical study of the vocabulary of a language or languages for historical purposes. Cf. glottochronology. [1955 60; LEXIC(ON) + O +… …   Universalium

  • lexicostatistics — noun Statistical estimation of the degree of linguistic divergence between two languages, based on the proportion of cognates …   Wiktionary

  • lexicostatistics — lex·i·co·statistics …   English syllables

  • lexicostatistics — lex•i•co•sta•tis•tics [[t]ˌlɛk sɪ koʊ stəˈtɪs tɪks[/t]] n. (used with a sing. v.) ling. the statistical study of the vocabulary of a language or languages for historical purposes • Etymology: 1955–60 …   From formal English to slang

  • lexicostatistics — /ˌlɛksɪkoʊstəˈtɪstɪks/ (say .leksikohstuh tistiks) plural noun (construed as singular) the comparison and classification of dialects and languages on the basis of their percentage of shared vocabulary. –lexicostatistic, adjective… …   Australian English dictionary

  • lexicostatistics — noun a statistical technique used in glottochronology; used to estimate how long ago different languages evolved from a common source language (Freq. 4) • Derivationally related forms: ↑lexicostatistic • Hypernyms: ↑etymology * * * “+ noun plural …   Useful english dictionary

  • Glottochronology — refers to methods in historical linguistics used to estimate the time at which languages diverged, based on the assumption that the basic (core) vocabulary of a language changes at a constant average rate. This assumption, originally put forward… …   Wikipedia

  • Swadesh list — A Swadesh list is one of several lists of vocabulary with basic meanings, developed by Morris Swadesh in the 1940 50s, which is used in lexicostatistics (quantitative language relatedness assessment) and glottochronology (language divergence… …   Wikipedia

  • Comparative linguistics — Linguistics …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.