Urheimat (a German compound of Ur- "primitive, original" and Heimat "home, homeland"; German pronunciation: [ˈʔuːɐ̯ˌhaɪmaːt], English: /ˈʊərhaɪmɑːt/) is a linguistic term denoting the original homeland of the speakers of a proto-language. The homelands of many, but not all, major language families are summarized in this article.


Language families predominantly found in Europe, North Asia and South Asia

Indo-European homeland

Scholars have tried to identify the homeland of the Proto-Indo-European language, to which the term Urheimat is most frequently applied. Possibly relevant geographical indicators are common words for "beech" and "salmon" (while there is no common word for "lion", for example—the fact so many European words for "lion" are similar-looking cognates is due to more recent borrowings). Many hypotheses for an Urheimat have been proposed, and Mallory (1989:143) said, “One does not ask ‘where is the Indo-European homeland?’ but rather ‘where do they put it now?’”

Mallory (1997:106) states that current discussion of the Indo-European homeland problem is largely confined to four basic models, with variations; these are, in chronological order:

Another theory for the Indo European homeland states that the original home land was in India, and later spread outwards. This theory contrasts with the mainstream model, but is widespread among Hindu nationalists.

Other, less-widely accepted models include the Armenian hypothesis (suggested by Soviet scholars in the 1980s), the Paleolithic Continuity Theory (suggested by Italian "paleolinguist" Mario Alinei in the 1990s), and the Out of India theory (historically suggested by Friedrich Schlegel).

Indo-Iranian homeland

The Proto-Indo-Iranians are widely identified with the bearers of the Andronovo horizon of the late 3rd and early 2nd millennia BC.

Approximate extent of the Corded Ware horizon with adjacent 3rd millennium cultures (after EIEC).

Balto-Slavic homeland

The Balto-Slavic homeland largely corresponds to the historical distribution of Baltic and Slavic, Proto-Baltic likely emerging in the eastern parts of the Corded Ware horizon.

The Slavic homeland likely corresponds to the distribution of the oldest recognisably-Slavic hydronyms, found in northern and western Ukraine and southern Belarus.

Balkans dialects

The following languages are reported to have been spoken on the Balkan Peninsula by Ancient Greek and Roman writers: Ancient Greek, Ancient Macedonian, Dacian, Illyrian, Liburnian, Messapic, Paeonian, Phrygian, Thracian, and Venetic

The history of the Daco-Thracian/Thraco-Illyrian dialects of the Balkans is obscure, in part, because the written record of these languages is fragmentary. One of these languages may have been the language that evolved into the modern Albanian language.

The Phrygian, Macedonian, and Greek proto-languages likely also originate in the Balkans. Proto-Armenian may also be Balkans (Greco-Phrygian) derived, or at least strongly influenced by a Phrygian substrate. The Phrygian influence on [pre-]Proto-Armenian would date to about the 7th century BC, in the context of the declining kingdom of Urartu.

Centum dialects

Celtic homeland

The Proto-Celtic homeland is usually located in the Early Iron Age Hallstatt culture of northern Austria. There is a broad consensus that the center of the La Tène culture lay on the northwest edges of the Hallstatt culture. Pre-La Tène (6th to 5th century BC) Celtic expansions reached Great Britain and Ireland (Insular Celtic) and Gaul. La Tène groups expanded in the 4th century BC to Hispania, the Po Valley, the Balkans, and even as far as Galatia[citation needed] in Asia Minor, in the course of several major migrations.

Germanic homeland

Pre-Germanic cultures were the bearers of the Nordic Bronze Age. Proto-Germanic proper is hypothesized by some to have developed in the Jastorf culture of the Pre-Roman Iron Age.[1]

Map of the Nordic Bronze Age culture, c. 1200 BC
Italic homeland

Candidates for the first introduction of Proto-Italic speakers to Italy are the Terramare culture (1500 BC) or the Villanovan culture (1100 BC), although the latter is now usually identified with the non-Italic (indeed, non-Indo-European) Etruscan civilisation. Both are culturally derived from or strongly influenced by the Urnfield culture and its predecessor, the Tumulus culture of Central Europe (1600 BC), so that the latter is a likely candidate for the homeland of an Italo-Celtic proto-language or dialect continuum.

The Romance languages are all derivative of Latin, a member of this Indo-European language subfamily, which was the common language of the Western Roman Empire that had its roots in Italic dialect spoken in and around the capital, Rome, until the empire collapsed in the 5th century CE.


Dravidian homeland

Current location of Dravidian languages

The Dravidian languages have been found mainly in South India since at least the second century BCE (inscriptions, ed. I. Mahadevan 2003). It is, however, a widely held hypothesis that Dravidian speakers may have been more widespread throughout India, including the northwest region[2] before the arrival of Indo-European speakers. A map showing where Dravidian languages are spoken today appears to the left.

Historical records suggest that the South Dravidian language group had separated from a Proto-Dravidian language no later than 700 BCE, linguistic evidence suggests that they probably became distinctive around 1,100 BCE, and some scholars using linguistic methods put the deepest divisions in the language group at roughly 3,000 BCE.[3] Russian linguist M.S. Andronov puts the split between Tamil (a written Southern Dravidian language) and Telugu (a written Northern Dravidian language) at 1,500 BCE to 1,000 BCE.[4]

Southworth identifies late Proto-Dravidian with the Southern Neolithic culture in the lower Godavari River basin of South Central India, which first appeared ca. 2,500 BCE, based upon its agricultural vocabulary, while noting that this "would not preclude the possibility that speakers of an earlier stage of Dravidian entered the subcontinent from western or central Asia, as has often been suggested."[5]

Speculations regarding the original homeland have centered on the Indus Valley Civilization, or on Elam, whose language was spoken in the hills to the east of the ancient Sumerian civilization with whom the Indus Valley Civilization traded and shared domesticated species) in an Elamo-Dravidian hypothesis, but results have not been convincing. The possibility that the language is indigenous to the Dravidian area and is a true language isolate has also not been ruled out.

Late Indus script found on pottery at Bet Dwarka dated to 1528 BC based on thermoluminescence dating.

Prof. Asko Parpola (University of Helsinki), the Jesuit priest Father Heras in the 1930s and other scholars (such as Indian and early Tamil expert Iravatham Mahadevan and Prof. Walter A. Fairservis Jr.) conclude that the Indus sign system represented an ancient Dravidian language, a view that they assume is supported by Tamil artifacts discovered in 2006.[6] Thus, in Parpola's view, the urheimat of Dravidian would be in the Indus River Valley. However, Harvard Indologist Michael Witzel takes the view that has received serious academic consideration (ca. 2004 CE), which is critical of an Indus Valley Civilization Dravidian homeland and of the widely held view that the inscriptions of the Indus Valley Civilization even constitute a written language.[7] In the essay "Substrate Languages in Old Indo-Aryan" (with RV in this context referring to Rigvedic, i.e. Indo-Aryan), Witzel says "As we can no longer reckon with Dravidian influence on the early RV, this means that the language of the pre-Rigvedic Indus civilization, at least in the Panjab, was of (Para-) Austro-Asiatic nature." There are no written examples of Austro-Asiatic languages being spoken further West than Central India during the recent historical era (i.e. in the era for which we have written records).

Recent studies of the distribution of alleles on the Y chromosome,[8] microsatellite DNA,[9] and mitochondrial DNA [10] in India have cast doubt for a biological Dravidian "race" distinct from non-Dravidians in the Indian subcontinent ;[11] other recent genetic studies have found evidence of Aryan, Dravidian and pre-Dravidian (original Asian) strata in South Asian populations.[12] Geneticist Luigi Luca Cavalli-Sforza proposes that a Dravidian people were preceded in India by Austro-Asiatic people, and were present prior to the arrival of Indo-Aryan language speakers in India.[13]

Uralic homeland

Neolithic period

The Uralic homeland is unknown. A possible locus is the Comb Ceramic Culture of ca 4200 – ca 2000 BC (shown on the map to the right). This is suggested by the high language diversity around the middle Volga River, where three highly distinct branches of the Uralic family, Mordvinic, Mari, and Permic, are located. Reconstructed plant and animal names (including spruce, Siberian pine, Siberian Fir, Siberian larch, brittle willow, elm, and hedgehog) are consistent with this location. This is adjacent to the proposed homeland for Proto-Indo-European under the Kurgan hypothesis.

French anthropologist Bernard Sergent, in La Genèse de l'Inde (1997),[14] argued that Finno-Ugric (Uralic) may have a genetic source or have borrowed significantly from proto-Dravidian or a predecessor language of West African origins. Some linguists see Uralic (Hungarian, Finnish) as having a linguistic relationship to both Altaic (Turkic, Mongol) language groups[15] (as in the outdated Ural-Altaic hypothesis) and Dravidian languages. The theory that the Dravidian languages display similarities with the Uralic language group, suggesting a prolonged period of contact in the past,[16] is popular amongst Dravidian linguists and has been supported by a number of scholars, including Robert Caldwell,[17] Thomas Burrow,[18] Kamil Zvelebil,[19] and Mikhail Andronov[20] This theory has, however, been rejected by some specialists in Uralic languages,[21] and has in recent times also been criticised by other Dravidian linguists like Bhadriraju Krishnamurti.[22]

Altaic homeland

Some linguists recognize a proposed language family called the Altaic languages that is held by its proponents to include the Turkic, Mongolic, Tungusic, and sometimes the Japonic language families and the Korean language isolate.[23] Mongolic languages are spoken in Mongolia, Inner Mongolia and regions close to its border such as Xinjiang, Gansu, Qinghai (China); Buryatia and Kalmykia (Russian Federation). Tungistic languages are spoken by Tungusic peoples in Eastern Siberia and Manchuria. The proposed linguistic group is named after the Altai Mountains, a mountain range in Central Asia.

Prior to the last 2000 years or so, the Turkic, Mongolic, and Tungusic languages which form the proposed core of the Altaic languages would all have been found only Eastern Siberia and Manchuria, in the areas North and West of the early Chinese dynasties.

Turkic peoples started to expand sometimes after that probably reaching Europe by the 4th century, in the form of the Huns, as more fully discussed below.

The Mongols expanded into present day Mongolia sometime after the demise of the Karasuk culture (1500-300 BC), an Indo-European and according to ancient DNA, a genetically Western Eurasian population.[24] Genghis Khan, starting around 1206 CE, waged a series of military campaigns that together with campaigns by his successors stretched from present-day Poland in the west to Korea in the east, and from Siberia in the north to the Gulf of Oman and Vietnam in the south, after which the empire ultimately collapsed with little long lasting linguistic impact outside the core Mongolian area.[25]

The Tungusic peoples never expanded far beyond Eastern Siberia and Manchuria.

The core three populations in the Altaic classification show autosomal population genetic commonalities.[26] These core three populations also show lexical affinities in their languages.[27]

Turkic homeland

The Countries and autonomous regions where a Turkic language has official status.

There is considerable dispute over the time and place of origin of the Turkic languages, but it is undisputed that their origins are not in or near the country named after the language, Turkey, a.k.a. Anatolia. The people of Anatolia spoke Indo-European language family languages from at least the time of the Hittite Empire (whose expansion to most of Anatolia started ca. 2000 BCE), which is the earliest evidence of Indo-European languages in the region attested historically (some non-Indo-European languages were spoken in at least some parts of Anatolia for some substantial periods of time prior to the Hittite empire), until the Persian Sassanid Empire collapsed in 651 CE.

The Turkic languages are now spoken in Turkey, Central Asia and Siberia. As noted in the wikipedia article on Turkic migration, the Turkic peoples originated in "the Far East including North China, especially Xinjiang Province and Inner Mongolia with parts of Mongolia and Siberia possibly as far west as Lake Baikal and the Altai Mountains. They may have been among the peoples of the multi-ethnic historical Saka known as early as the Greek writer Herodotus. Certainly identified Turkic tribes were known by the 6th century and by the 10th century most of Central Asia, formerly dominated by Iranian peoples, was settled by Turkic tribes. The Seljuk Turks from the 11th century invaded Anatolia, ultimately resulting in permanent Turkic settlement there and the establishment of the nation of Turkey."

The first possibly Turkic peoples to arrive in Europe were the Huns, who were at war with the Roman Empire in the 4th century CE. Confusingly, the Hungarian language is not a Turkic language (it is a Uralic language related to languages like the Finnish language and Estonian language) and was not spoken by the Huns.

Prior to the Turkic migration, Indo-European languages were spoken in Anatolia and Central Asia as far as the Tarim Basin.

The inferred population genetic contributions of Turkic populations show a cline from a high point in the East to the a low point in the West.[28] In Turkey, the Turkic contribution to the local population genetic mix is about 6%.[29]

The deeper origins of the Turkic language in connection with other language families, and in time and place, is deeply disputed. The lack of written records prior to the earliest Chinese accounts, and the fact that the early Turkic peoples were nomadic pastoralists, and hence mobile, makes localizing and dating the earliest homeland of the Turkic language difficult. The divide in linguistics is currently between those who see Turkic languages as a top level language family without any further established linguistic origins, and those who see it as part of an Altaic language family with its roots among neighboring "barbarian" peoples in Manchuria and in other areas beyond the span of Chinese rule.

Language families predominantly found in Africa and Southwest Asia

Map showing the distribution of major African language families

Khoisan homeland

The Khoisan languages click languages of Africa do not form a language family and so do not, as a family, have a homeland. However, limited genetic evidence from some Khoisan-language speakers in southern Africa suggest an origin "along the African rift and a possible wider East African range."[30]

Afro-Asiatic homeland

The Afro-Asiatic languages include Arabic, Hebrew, Berber, and a variety of other languages now found mostly in Northeast Africa, although the exact boundaries of this language family are disputed.

The limited area of the Afro-Asiatic Sprachraum (prior to its expansion to new areas in the historic era) has limited the potential areas where the that family's Urheimat could be. Generally speaking, two proposals have been developed: that Afro-Asiatic arose in a Semitic Urheimat in the Middle East aka Southwest Asia, or that Afro-Asiatic languages arose in northeast Africa (generally, either between Darfur and Tibesti or in Ethiopia and the other countries of the Horn of Africa). The African hypothesis is considered to be rather more likely at the present time, because of the greater diversity of languages with more distant relationships to each other there.

Semitic homeland

There has been speculation regarding the specific Semitic subfamily of Afro-Asiatic languages, again with the Horn of Africa and Southwest Asia—specifically the Levant—being the most common proposals. The large number of Semitic languages present in the Horn of Africa seems at first glance to support the hypothesis that the Semitic homeland lies there, the Semitic languages in the Horn of Africa all belong to the South Semitic subfamily, while the East and Central Semitic languages are native solely to Asia. These features, and the presence of certain common Semitic lexical items referring to items that did not arrive in Africa until after the arrival of the Semitic languages in Ethiopia, have lent weight to the Levantine theory.

Nilo-Saharan homeland

Genetic studies of Nilo-Saharan-speaking populations are in general agreement with archaeological evidence and linguistic studies that argue for a Nilo-Saharan homeland in eastern Sudan before 6000 BCE, with subsequent migration events northward to the eastern Sahara, westward to the Chad Basin, and southeastward into Kenya and Tanzania.[31]

Linguist Roger Blench has suggested that the Nilo-Saharan languages and the Niger–Congo languages may be branches of the same macro-language family.[32][33] Earlier proposals along this line were made by linguist Edgar Gregersen in 1972. [34] These proposals have not reached a linguistic consensus, however, and this connection presupposes that all of the Nilo-Saharan languages are actually related in a single family, which has not been definitively established.

Razib Khan, based on analysis of the autosomal genetics of the Tutsi ethnic group of Africa, suggests that "the Tutsi were in all likelihood once a Nilotic speaking population, who switched to the language of the Bantus amongst whom they settled."[35][36]

Niger–Congo homeland

The homeland of the Niger–Congo languages, which has as its subfamily the Benue–Congo languages, which in turn includes the Bantu languages, is not known in time or place, beyond the fact that it probably originated in or near the area where these languages were spoken prior to Bantu expansion (i.e. West Africa or Central Africa) and probably predated the Bantu expansion of ca. 3000 BCE by many thousands of years.[37] Its expansion may have been associated with the expansion of Sahel agriculture in the African Neolithic period.[37]

According to linguist Roger Blench, as of 2004, all specialists in Niger–Congo languages believe the languages to have a common origin, rather than merely constituting a typological classification, for reasons including their shared noun-class system, their shared verbal extensions and their shared basic lexicon.[38][39] Similar classifications have been made ever since Diedrich Westermann made it in 1922.[40] Joseph Greenberg continued that tradition making it the starting point for modern linguistic classification in Africa, with some of his most notable publications going to press starting in the 1960s.[41] But, there has been active debate for many decades over the appropriate subclassifications of the languages in that language family, which is a key tool used in localizing a language's place of origin.[38] No definitive "Proto-Niger-Congo" lexicon or grammar has been developed for the language family as a whole.

An important unresolved issue in determining the time and place where the Niger-Congo languages originated and their range prior to recorded history is this language family's relationship to the Kordofanian languages spoken now spoken in the Nuba mountains of Sudan, which is not contiguous with the remainder of the Niger-Congo language speaking region and is at the northeasternmost extent of the current Niger-Congo linguistic region. The current prevailing linguistic view is that Kordofanian languages are part of the Niger-Congo linguistic family, and that these may be the first of the many languages still spoken in that region to have been spoken in the region.[42] The evidence is insufficient to determine if this outlier group of Niger-Congo language speakers represent a prehistoric range of a Niger-Congo linguistic region that has since contracted as other languages have intruded, or if instead, this represents a group of Niger-Congo language speakers who migrated to the area at some point in prehistory where they were an isolated linguistic community from the beginning.

The prehistoric range for the Niger-Congo languages has implications, not just for the history of the Niger-Congo languages, but for the origins of the Afro-Asiatic languages and Nilo-Saharan languages whose homelands have been hypothesized by some to overlap with the Niger-Congo linguistic range prior to recorded history. If the consensus view regarding the origins of the Nilo-Saharan languages which came to East Africa is adopted, and a North African or Southeast Asian origin for Afro-Asiatic languages is assumed, the linguistic affiliation of East Africa prior to the arrival of Nilo-Saharan and Afro-Asiatic languages is left open. The overlap between the potential areas of origin for these languages in East Africa is particularly notable because includes the regions from which the Proto-Eurasians who brought anatomically modern humans Out of Africa, and presumably their original proto-language or languages originated.

However, there is more agreement regarding the place of origin of the Benue–Congo subfamily of languages, which is the largest subfamily of the group, and the place of origin of the Bantu languages and the time at which it started to expand is known with great specificity.

Beneu-Congo homeland

Nigeria Benin Cameroon languages.png

Roger Blench, relying particularly on prior work by Professor Kay Williamson of the University of Port Harcourt, and the linguist P. De Wolf, who each took the same position, has argued that a Benue–Congo linguistic subfamily of the Niger–Congo language family, which includes the Bantu languages and other related languages and would be the largest branch of Niger–Congo, is an empirically supported grouping which probably originated at the confluence of the Beneu and Congo Rivers in Central Nigeria.[38][43][44][45][46][47] These estimates of the place of origin of the Beneu-Congo language family do not fix a date for the start of that expansion other than that it must have been sufficiently prior to the Bantu expansion to allow for the diversification of the languages within this language family that includes Bantu.

Bantu homeland

There is a widespread consensus among linguistic scholars that Bantu languages of the Niger–Congo family have a homeland near the coastal boundary of Nigeria and Cameroon, prior to a rapid expansion from that homeland starting about 3000 BCE.[31][37][48][49][50][51][52]

Linguisic, archeological and genetic evidence also indicates that this expansion included "independent waves of migration of western African and East African Bantu-speakers into southern Africa occurred."[31] In some places, Bantu language, genetic evidence suggests that Bantu language expansion was largely a result of substantial population replacement.[53] In other places, Bantu language expansion, like many other languages, has been documented with population genetic evidence to have occurred by means other than complete or predominant population replacement (e.g. via language shift and admixture of incoming and existing populations). For example, one study found this to be the case in Bantu language speakers who are African Pygmies or are in Mozambique,[53] while another population genetic study found this to be the case in the Bantu language speaking Lemba of Zimbabwe.[54] Where Bantu was adopted via language shift of existing populations, prior African languages, probably from African language families that are now lost, except as substrate influences of local Bantu languages (such as click sounds in local Bantu languages).

Malagasy language homeland

The Malagasy language of Madagascar is not related to nearby African languages, instead being the westernmost member of the Malayo-Polynesian branch of the Austronesian language family (whose origins are described separately in this Article), a fact noted as long ago as 1708 by the Dutch scholar Adriaan van Reeland.[55] It is related to the Malayo-Polynesian languages of Indonesia, Malaysia, and the Philippines, and more closely with the Southeast Barito group of languages spoken in Borneo except for its Polynesian morphophonemics.[56] Malagasy shares much of its basic vocabulary with the Ma'anyan language, a language from the region of the Barito River in southern Borneo. This indicates that Madagascar was first settled by Austronesian people from the Malay Archipelago who had passed through Borneo. This happened approximately 0 CE to 500 CE, before which the island of Madagascar lacked human inhabitants.[37] Later, the original Austronesian settlers must have mixed with East Africans and Arabs, amongst others.[57] The Malagasy language also includes some borrowings from Arabic, and Bantu languages (notably Swahili). Limited sample size whole genome analysis of Malgasy individuals show that the African component of the Malagasy genome is most similar to modern Bantu language speaking East African populations.[58]

Language families predominantly found in Southeast Asia, East Asia and Oceania

Sino-Tibetan homeland

Sino-tibetan languages.png

According to the Sino-Tibetan Etymological Dictionary and Thesaurus project of the University of California at Berkeley[59] (the reference to ST is to the Sino-Tibetan language family):

"The Proto-Sino-Tibetan (PST) homeland seems to have been somewhere on the Himalayan plateau, where the great rivers of East and Southeast Asia (including the Yellow, Yangtze, Mekong, Brahmaputra, Salween, and Irrawaddy) have their source. The time of hypothetical ST unity, when the Proto-Han (= Proto-Chinese) and Proto-Tibeto-Burman (PTB) peoples formed a relatively undifferentiated linguistic community, must have been at least as remote as the Proto-Indo-European period, perhaps around 4000 B.C."

Some scholars place the Tibeto-Burman homeland in the area encompassing western Sichuan, northern Yunnan and eastern Tibet.[60]

Population genetic evidence, favors an origin for Proto-Sino-Tibetan languages in the upper and middle Yellow River basin, with part of that source population branching off to settle in the Himalayas, with the split of the population that would provide the genesis of the Chinese language from the population that would provide the genesis of the larger Sino-Tibetan language family in the East Asian Neolithic era:[61]

"[T]he closest relatives of the Tibetans are the Yi people, who live in the Hengduan Mountains and were originally formed through fusion with natives along their migration routes into the mountains. The Tibetan and Yi languages belong to the Tibeto-Bruman language group and their ancestries can be traced back to an ancient tribe, the Di-Qiang . . . After the ancestors of Sino-Tibetans reached the upper and middle Yellow River basin, they divided into two subgroups: Proto-Tibeto-Burman and Proto-Chinese. . . . The ancestral component which was dominant in Tibetan and Yi arose from the Proto-Tibeto-Burman subgroup, which marched on to south-west China and later, through one of its branches, became the ancestor of modern Tibetans. Proto-Tibeto-Burmans also spread over the Hengduan Mountains where the Yi have lived for hundreds of generations. Taking the optimal living condition and the easiest migration route into account, we favor the single-route hypothesis; it is more likely that their migration into the Tibetan Plateau through the Hengduan Mountain valleys occurred after Tibetan ancestors separated from the other Proto-Tibeto-Burman groups and diverged to form the modern Tibetan population."

One of the earliest Neolithic cultures of China in the upper to middle Yellow River basin was the Peiligang culture of 7000 BCE to 5000 BCE, so the population genetic reference in the quoted material is to a date on or after this time period. The Neolithic era concluded in the Yellow River around 1500 BCE. This is not inconsistent with the linguistically based estimate from the Sino-Tibetan Etymological Dictionary and Thesaurus project. The origin of the Chinese branch of the Sino-Tibetan language family is associated with the early and middle Zhou Dynasty (1122 BCE–256 BCE) in Northern China where the Chinese language spoken in the Zhou court became the standardized dialect of that language for that kingdom.[62]

In contrast, the other main language families of East Asia and Southeast Asia outside the Sino-Tibetan language family including Austro-Asiatic, Austronesian, Hmong–Mien and Tai–Kadai are generally believed to have at origins at some stage of their development in Southern China.

Austro-Asiatic languages

Austro-Asiatic homeland

The homeland of the Austro-Asiatic languages (e.g. Vietnamese, Cambodian) which are found from Southeast Asia to India is hypothesized to be located "the hills of southern Yunnan in China," between 4000 BCE and 2000 BCE,[63] with influences from Aryan and Dravidian languages at the Western edge of its expanse in India, and influence from Chinese at the Eastern edge of the regions were it is found. The disjoint distribution of Austro-Asiatic languages suggest that they were once spoken in most of the areas where the Tai–Kadai languages are now dominant.

However, Paul Sidwell has recently advocated a homeland in Southeast Asia instead[64], preferring a late date of dispersal of about 2000 BCE.[65]

There is a strong correlation between the population genetic distribution Y-Chromosomal haplogroup O2b-M95 and the distribution of Austro-Asiatic language speakers.[66]

Hmong–Mien homeland

The most likely homeland of the Hmong–Mien languages (aka Miao–Yao languages) is in Southern China between the Yangtze and Mekong rivers, but speakers of these languages may have migrated from Central China either as part of the Han Chinese expansion or as a result of exile from an original homeland by Han Chinese.[67] Migration of people speaking these languages from South China to Southeast Asia took place ca. 1600-1700 CE. Ancient DNA evidence suggests that the ancestors of the speakers of the Hmong–Mien languages were a population genetically distinct from that of the Tai–Kadai and Austronesian language source populations at a location on the Yangtze River.[68] Recent Y-DNA phylogeny evidence supports the proposition that people who speak the Hmong-Mien languages are descended from the population that now speaks Austro-Asiatic Mon-Khmer languages.[69]

Austronesian homeland

The homeland of the Austronesian languages is Taiwan. On this island the deepest divisions in Austronesian are found, among the families of the native Formosan languages. According to Blust (1999), the Formosan languages form nine of the ten primary branches of the Austronesian language family. Comrie (2001:28) noted this when he wrote:

... the internal diversity among the... Formosan languages... is greater than that in all the rest of Austronesian put together, so there is a major genetic split within Austronesian between Formosan and the rest... Indeed, the genetic diversity within Formosan is so great that it may well consist of several primary branches of the overall Austronesian family.

Archaeological evidence (e.g., Bellwood 1997) suggests that speakers of pre-Proto-Austronesian spread from the South Chinese mainland to Taiwan at some time around 6000 BCE. Evidence from historical linguistics suggests that it is from this island that seafaring peoples migrated, perhaps in distinct waves separated by millennia, to the entire region encompassed by the Austronesian languages (Diamond 2000). It is believed that this migration began around 4000 BCE (Blust 1999). However, evidence from historical linguistics cannot bridge the gap between those two periods.

The specific origins of most far flung member of this language family, the Malagasy language of Madagascar off the coast of Africa, are described above in the part of this article concerning African languages.

The Austro-Tai hypothesis suggests a common origin for the Austronesian languages and the Tai–Kadai languages whose hypothesized place of origin is geographically close to Taiwan.

Tai–Kadai homeland

The Tai–Kadai languages today

Many scholars have addressed the question of the origins of the Tai–Kadai languages.[70][71][72][73][74]

There is a consensus that the Tai–Kadai languages have their origins in Southern China or on major nearby islands (such as Taiwan or Hainan).

The leading hypothesis is that the likely homeland of proto-Tai–Kadai was coastal Fujian or Guangdong as part of the neolithic Longshan culture (of 3000 BCE – 2000 BCE). The spread of the Tai–Kadai peoples may have been aided by agriculture, but any who remained near the coast were eventually absorbed by the Chinese. Weera Ostapirat is one academic who articulates this position.[75]

Laurent Sagart, on the other hand, holds that Tai–Kadai is a branch of Austronesian which migrated back to the mainland from northeastern Formosa (i.e. Taiwan) long after Formosa was settled, but probably before the expansion of Malayo-Polynesian out of Formosa.[76][77][78] The language was then largely relexified from what he believes may have been an Austro-Asiatic language. Sagart suggests that Austro-Tai is ultimately related to the Sino-Tibetan languages and has its origin in the Neolithic communities of the coastal regions of prehistoric North China or East China.

Ostapirat, by contrast, sees connections with the Austro-Asiatic languages (in Austric), as has Benedict.[79][80][81] Reid notes that the two approaches are not incompatible, if Austric is valid and can be connected to Sino-Tibetan.[82]

Robert Blust (1999) suggests that proto-Tai–Kadai speakers originated in the northern Philippines and migrated from there to Hainan (hence the diversity of Tai–Kadai languages on that island), and were radically restructured following contact with Hmong–Mien and Sinitic. However, Ostapirat maintains that Tai–Kadai could not descend from Malayo-Polynesian in the Philippines, and likely not from the languages of eastern Formosa either. His evidence is in the Tai–Kadai sound correspondences, which reflect Austronesian distinctions that were lost in Malayo-Polynesian and even Eastern Formosan.

Genetic evidence coroborates evidence from Kadai speaking people's oral traditions that puts a Kadai homeland on Hainan.[83] Ancient DNA evidence also shows a connection between speakers of Tai–Kadai speaking populations and Austronesian language speaking populations,[68] and a genetically distinct population at a different location on the Yangtze River as a possible source of Hmong–Mien languages.[68]

Japanese and Korean language homelands

Today, there is one Korean language spoken in Korea, and a small family of related languages called Japonic spoken in Japan. There is also an Ainu language spoken by an ethnic minority in Northern Japan.

There were multiple languages spoken in Manchuria and the Korean Peninsula prior to Korea's unification, and there is dispute over which of those languages gave rise to modern Korean sometime in the first millennium CE, and what relationship that proto-language may have had to the proposed family of Altaic languages.

There is also dispute over the extent, if any, to which one of those multiple languages of the Korean peninsula prior to its unification gave rise to the Japanese language, and if so, which of those languages was the language of the Yaoyi part of the founding group of modern Japan. The Yaoyi may also have had linguistic influences from China. Japanese links to Altaic languages, if they exist, could have arisen via an Altaic source for a Korean peninsula language spoken by the Yaoyi, and/or via Altaic influences on the Ainu languages via contacts between the Ainu people and Siberia.

The Ainu language or another extinct language of the indigenous people of Japan called the Jomon may have also been a formative element in the Japanese language as the Yaoyi people and the Jomon people merged into a common Japanese ethnicity around 2300 years ago.

Both the Koreans and the Japanese make use of Chinese ideograms in their written language, whose Chinese origins are not disputed. But, neither of these spoken languages is closely related to the spoken Chinese language, and need not be because ideograms do not code phonetic versions of the ideas that they describe.


Korea in 576 CE.

The Korean language is spoken in Korea and among emigrants from Korea. Conservative historical linguists tend to classify the Korean language as a language isolate, although other suggest a relationship to Altaic languages or to Japonic languages.

Old Korean is attested in Chinese histories, in the Three Kingdoms period of Korea (ca. 0 to 900 CE), when the Silla Kingdom (in Eastern Korea), Baekje Kingdom (in Southwestern Korea), and Goguryeo Kingdom (in Northern Korea) were simultaneously present on the Korean peninsula, although Korean was not a literary language until later; the hangul script of Korean was invented in the 15th century CE (an earlier Idu script dates to the 6th century CE).

There was a group of similar languages called the Buyeo languages in the northern Korean Peninsula and southern Manchuria and possibly Japan, which included, according to Chinese records, the languages of Buyeo, Goguryeo, Baekje, Dongye, Okjeo, —and possibly Gojoseon and possibly was a sister language family to that of the Xianbei in Manchuria and Eastern Mongolia, but was different from ancient Manchu languages like Mohe language. Gojoseon was a kingdom in Northern Korea that is said by tradition to have been founded in 2333 BC (archaeological evidence and Chinese histories support a cultural civilization from around 1500 BCE and a kingdom fused from a federation of smaller states around the 7th century BCE), that was conquered by Han Dynasty China in 108 BC, and re-emereged from Chinese rule as the Kingdom Buyeo. The Three Kingdoms era kingdoms of Goguryeo and Baekje were successors to the Kingdom of Buyeo. Dongye was a vassal state of Goguryeo in Northeast Korea founded in the 3rd-century BCE that was eventually absorbed by Goguryeo around the 5th century CE. Okjeo was a minor state in Northern Korea to the North of Dongye that was a subordinate unit of Gojoseon from the 3rd century BCE to 108 BCE, then came under Han rule, and then was a subordinate state of Goguryeo. None of these Buyeo language family kingdoms ever included the Kingdom of Silla, which was just a small kingdom on the Southern coast of Korea until the Three Kingdoms period during which it expanded and conquered the other two kingdoms.

Linguists including Christopher Beckwith argue for Japanese as a descendant of Goguryeo, and for Korean as a descendant of the Silla language, based on lexical similarities between Goguryeo and Japanese, and based upon Silla's ultimate triumph in the quest for political control of Korea. Other linguistists, including Kim Banghan, Alexander Vovin, and J. Marshall Unger argue that Japanese is related to the pre-Goguryeo language of the central and southern part of Korean peninsula, including what would become the Kingdom of Silla, and that Old Korean is Goguryeo with a pre-Goguryeo Japonic substrate, in part, because Japanese-like toponyms found in the historical homeland of Silla were also distributed in southern part of Korean peninsula, and are not found in the northern part of Korean peninsula or south-western Manchuria.[84] None of the extinct languages is attested in writing well enough to reach definitive conclusions resolving the debate.


Japanese language family languages are spoken in Japan and among emigrants from Japan and is attested in Japanese language writing from the 8th century CE, and in imperfect Chinese transcriptions from the late 5th century CE. Conservative historical linguists tend to classify a small number of Japanese languages as a language family of their own.

There are similarities between the Japanese language and the Korean language in lexicon and grammatical features, but there is dispute over whether these denote a common origin, or mere linguistic borrowing due to a sprachbund of neighboring languages that are adjacent to each other. Samuel E. Martin, Roy Andrew Miller, and Sergei Starostin are linguists who have argued that they have common origins.[85][86][87][88][89] In contrast, Alexander Vovin has argued for a regional borrowing model to explain the linguistic similarities.[90]

The Wikipedia article on Classification of Japonic which notes that one "hypothesis proposes that Japanese is a relative of the extinct languages spoken by the Buyeo-Goguryeo cultures of Korea, southern Manchuria, and Liaodong" of which the best attested is the extinct language Goguryeo.[91][92][93] This proposal is attributed to Shinmura Izuru, who proposed it in 1916. Modern Korean, in contrast, according to proponents of this hypothesis, appears to have stronger connnections the Silla language, spoken in the ancient kingdom of Silla (57 BC – AD 935), one of the Three Kingdoms of Korea, whose similarity to the Goguryeo language is not clearly established.

The earliest Chinese historical records concerning the "Wa" in Japan indicate that they were fractured into many warring states. But, modern Japanese dialects show a common origin, rather than a "bushy" one. So, it is possible that there were many Yayoi dialects in the period before Old Japanese emerged, of which the dialect of the warring states that ended up prevailing politically as the Japanese state was unified superseded other early Yayoi languages or dialects.[94]

As noted in the Wikipedia article on the Ainu people: "After a new wave of immigration, probably from the Korean Peninsula some 2,300 years ago, of the Yayoi people, the Jōmon were pushed into northern Japan. Genetic data suggest that modern Japanese are descended from both the Yayoi and the Jōmon." Tradition, as documented by the Nihon Shoki, a legendary account of Japan's history, puts the date of the Yayoi arrival in Japan at 660 BCE. Chinese historical records mention the existence of the Yayoi (called "Wa") starting in 57 BCE. The existing Japanese language has its origins at approximately this point in time, if not earlier (to the extent that Japanese derives primarily from either the language of the Bronze Age Yayoi people, as it existed prior to their arrival in Japan, or derives primarily from a language of the Jomon at that point of time, rather than being a creole of some sort). Skeletal remains suggests that the two cultures had fused into a group with a homogeneous physical appearance in Southern Japan by 250 CE.[95] It is possible that the Japanese language has roots related to the Ainu language, the historical language of the Yayoi, whatever that may have been, or could have been a creole of both. It is also possible the Japanese has roots in a language spoken in Southern Japan that is lost and now unknown.[96]

Location of Ezo

The Ainu people are genetic descendants of the Jomon, with some contribution from the Okhotsk people.[97] The Ainu languages that are now spoken by Ainu minorities in Hokkaidō; and were formerly spoken in southern and central Sakhalin, and the Kuril Islands (an area also known as Ezo), and perhaps northern Honshū island by the Emishi people (until approximately 1000 CE), are associated with the founding Jomon people of Japan from than 14,000 years ago or earlier, and the Satsumon culture of Hokkaidō, although the Ainu also had contact with the Paleo-Siberian Okhotsk culture whose modern descendants include the Nivkh people (whose original homeland was mostly occupied by the Tungusic people), which could have linguistically influenced the Ainu language.[98] Thus, as a result of this important outside cultural influence, it is impossible to know with certainty how similar the language of the original language of the Jomon people was to that spoken by the Aniu people today. Some linguists have suggested other language family connections for the Ainu language: Shafer has suggested a distant connection to the Austro-Asiatic languages.[99] Vovin, had viewed that suggestion as merely preliminary.[100] Japanese linguist Shichirō Murayama tried to link Ainu to the Austronesian languages, which include the languages of the Philippines, Taiwan, and Indonesia through both vocabulary and cultural comparisons. There is no consensus, however, that the Ainu languages have sources in any other known language, and the unique population genetics of the Ainu people support the hypothesis that they were largely isolated from the rest of the world for many thousands of years.

The Yayoi people had strong physical, genetic and cultural similarities to the Chinese during the Han Dynasty (202 BCE-8) in the Jiangsu province on China's Eastern Coast.[101] The Yayoi also have strong cultural similarities to the Koreans of that time period.[102][103]

Location of Ryukyu Islands

Some linguists, such as Turchin,[27] see a connection between Japanese and Korean and an Altaic language family or similar larger grouping of languages, with those speakers coming from an area North of Korea, based in part upon similarities in lexical roots. The statistical method used by Turchin, however, would not discriminate between Jomon and Yaoyi sources for any Altaic linguistic affinities. Turchin's analysis also did not look at the various proposed ancient predecessors of the Korean language in Korea or the relationship of those languages to any of the proto-Altaic languages, despite the fact that the hypothesis would require one of those ancient Korean peninsular languages to be intermediate between Japanese and one of the proto-Altaic languages. Old Japanese when first attested had eight vowels, rather than the current five (which were lost within a century of the oldest preserved writings) which was close to the vowel system seen in Uralic and Altaic languages.[104] Old Japanese also had more grammatical similarity to Altaic languages than modern Japanese.

These classifications of the origins of Japanese language origins ignore significant borrowing from other languages in recent times. Current estimates are that "wago" (i.e. words attributable to the original Yaoyi language) make up 33.8% of the Japanese lexicon, that "knago" (i.e. words with roots borrowed from Chinese since the 5th century CE) make up 49.1% of Japanese words (and in addition, the Chinese ideograms used in the Japanese written language), that foreign words called gairaigo make up 8.8% of Japanese words, and that 8.3% of Japanese words are konshugo that draw upon multiple languages.[105] This account attributes only a small number of words in modern Japanese to Ainu roots.

The six Ryukyuan languages spoken in the islands to the South of Japan, are descended from Japanese but are not mutually intelligble with Japanese with which they share about 72% of their words (or each other) and started to diverge from Japanese around the 7th century CE. these islands were united in a Ryukyuan kingdom from 1429 CE (prior to that there were multiple divided kingdoms which were tributary states of China after 1372 CE); the kingdom was a tributary state of China until 1609 when it became a vassal state of Japan, until it was annexed by Japan in 1879. These languages were then suppressed and while they have about a million native speakers, there are relatively few native speakers under the age of twenty. They are effectively minority languages in their own countries at this point.

Languages spoken predominantly in North and South America


Na-Dene langs.png

The Na-Dene languages have been linked linguistically to the Yeniseian languages of the Ket people of central Siberia, suggesting a homeland in Siberia or a back migration of Na-Dene speakers from Beringia. Na-Dene languages are spoken by Native Alaskans and some people from the First Nations of Western Canada, in the Pacific Northwest, and also includes the Southern Athabaskan languages spoken in the American Southwest (e.g. the Apache language and Navajo language). The consensus is that Na-Dene language speaking people migrated from the Pacific Northwest to the American Southwest around 1000 CE.

There is dispute concerning the time at which the Yeniseian languages separated from the Na-Dene languages. One possibility is to assume that the link is contemporaneous with the initial population of the Americas. But, linguistic evidence alone does not rule out a more recent connection.

Other Indigenous Languages of North and South America

Other than Na-Dene, no indigenous languages of North America or South America have been convincingly linked to languages of Eurasia, Africa, or other parts of the world. Many American indigenous languages are currently classified as language isolates, and linguistics tend to either lump the bulk of the language of the Americas into one family, as Joseph Greenberg did, or to identify many smaller language families with no clear relationship to each other. Population genetic evidence suggests that the non-circumpolar indigenous peoples of the Americas have origins in a common founder population, and there is no evidence of any enduring outside linguistic influence in the Americas prior to the arrival of Columbus in the Americas in 1492 CE. But, linguists have not been able to piece together a common origin for the indigenous languages of the Americas, mostly because of the time-depth involved, and there is no way to know if the founding population(s) of the Americas spoke only one or more than one language.

Implications of current research

The Out of Africa theory of human origins marshals archeological, genetic, and ancient climate evidence to suggest a common origin for all modern humans in Africa about 70,000 years ago and an origin for farming and herding about 8,000 to 10,000 years ago.[106]

We also have some idea about the time death of these languages. For example, the Urheimats in which the proto-languages of the subfamilies are the Indo-European language family necessarily arose more recently than the Proto-Indo-European language family. Similarly, a language superfamily's proto-language must have been spoken in an Urheimat not more recent than the time depth of the oldest language in the language family. The time and place of the Urheimats of various language family proto-languages spoken by most people alive today is in many cases much more recent than either the Out of Africa date or the origin of farming and herding. The relatively young time depth of modern language families can arise from at least two factors: prior languages went extinct as other languages expanded,[37] and some language families may have deeper connections at a greater time depth.

It will probably never be possible to know with any great confidence what the linguistic landscape of the world looked like 18,000 years ago, and even determining what the linguistic landscape of the world looked like 8,000 years ago is a profound challenge and highly controversial undertaking. It is unlikely that it is possible to reconstruct a historical Tower of Babel linguistic community in which all humans spoke a common language (although we can say with confidence that large stone edifices built by large organized communities of people, which date to the Neolithic era at the earliest, weren't built by any culture on Earth until at least many tens of thousands of years after there was a hypothetic common language of all humans), or to gain very specific insight about what the language the original proto-Eurasians or the earliest modern humans spoke, although the lack of instances of writing more than about 5,500 years ago, despite the extensive recovery of earlier artifacts and art from prehistory, makes it unlikely that earlier humans had anything approaching a complete written language. Proto-linguistic markings used in trade are only a few thousand years older.

Evidence from pre-Columbian languages in the Americas and from places like Papua New Guinea and Australia that were isolated during periods of linguistic consolidation in the rest of the world, suggest that pre-Neolithic revolution societies had a great many languages relative to their populations, most of which are now irrevocably lost.

The expansion of particular major language families is frequently associated with the adoption of superior food production, military technologies or social organization by a particular group of people that allowed them to expand and exert dominance over neighborhoring societies, either ruling them or replacing them. For example, the domestication of horses is frequently associated with the expansion of the Indo-European language family (other linguists see an earlier expansion date which they attribute to the expansion to farming and herding), the expansion of the Chinese language is sometimes associated first with millet and later with rice farming, and the development of crops and domesticated animals that can thrive in tropical environments may have been one factor in Bantu expansion. Some of the examples of this, such as the expansions of the Hungarian, Turkish, Arabic and Chinese languages, are historically documented. Other language replacement events are lost to history and must be inferred.

Limitations of the concept of Urheimat

It is only meaningful to describe a language or language family as having an Urheimat when it has a single genetic source in a particular population where a proto-language for that language family was spoken from which there has been divergence of isolated populations speaking the language over time.

This is not always the case. For example, creole languages are hybrids of separate languages that sometimes do not belong to the same language family and have similarities that arise from shared aspects of the creole formation process, rather than from a common origin. For example, a creole language will often lack significant inflectional morphology, lack tone on monosyllabic words, and lack semantically opaque word formatiom, even if these features are found in all of the parent languages.[107][108]

Other circumstances can also complicate the matter. For example, in places where language families meet, like the interface of the Nilo-Saharan and Afro-Asiatic language family in Western Ethiopia, the relationship between a group that speaks a language and the Urheimat for that language is complicated by "processes of migration, language shift and group absorption are documented by linguists and ethnographers" in groups that are themselves "transient and plastic."[109]

Also, over a sufficient period of time, in the absence of evidence of intermediary steps in the process, it may be impossible to observe linkages between languages that have a shared urheimat. This general concern is a manifestation of the larger issue of "time depth" in historical linguistics.[110] For example, while the evidence from genetics, archeology and historical climate change strongly points to a relatively small number of waves in a fairly short time period from Asia to the Americas,[111] there continues to be intense controversy regarding the classification of the indigenous languages of the Americas, for which there is little direct evidence because all but a couple of those languages were not written in the pre-Columbian era, and in Australia and New Guinea, whose history of human migration and contact is also well documented,[112] in which there were thousands of languages none of which were written prior to European contact.[113] Given enough time, natural change in isolated language can obliterate any meaningful linguistic evidence of a known common genetic source for the languages.

See also


