Esperanto phonology

The creator of Esperanto, L. L. Zamenhof, did not specify phonemic-phonetic correspondences for his language. Instead, he simply described the orthography as "one letter, one sound". Literally interpreted, this is impossible: Every language has allophonic variation; and so there are disagreements, for example, as to whether voicing assimilation is allowed, expected, or forbidden in sequences like "kz" (found in "ekzemple" for example).

Zamenhof also failed to explicitly lay out Esperanto phonotactics, only saying that borrowings "need to conform to Esperanto orthography". Therefore spellings have been adopted that appear to violate his intentions, such as "poŭpo, ŭato, jida, matĉo". However, many of these coinages have proven to be unstable, and have either fallen out of use or been replaced with pronunciations more in keeping with the original Esperanto vocabulary, such as "pobo" and "vato" for "poŭpo" and "ŭato."

Orthography and pronunciation

Zamenhof suggested using Italian as a model for Esperanto pronunciation.

The Esperanto alphabet is nearly phonemic and coincides closely to the International Phonetic Alphabet. The letters, along with their IPA and nearest English equivalents, are,

There are also six diphthongs: IPA|/ai̯/, IPA|/oi̯/, IPA|/ui̯/, IPA|/ei̯/ and IPA|/au̯/, IPA|/eu̯/.


A syllable in Esperanto is generally of the form (s/ŝ)(C)(C)V(C)(C). That is, it "may" have an onset, of up to three consonants; "must" have a nucleus of a single vowel or diphthong (except in onomatopoeic words such as "zzz!"), and may have a coda of zero to one (occasionally two) consonants.

Any consonant may occur initially, with the exception of "j" before "i" (though there is now one word that violates this restriction, "jida" "Yiddish" which contrasts with "ida" "of an offspring").

Any consonant except "h" may close a syllable, though coda "ĝ" and "ĵ" are rare in monomorphemes (for example, "ĵ" in the name of the conlang "Loĵbano" "Lojban"). Within a morpheme, there may be a maximum of four sequential consonants, as for example in "instruas" "teaches", "dekstren" "to the right". Long clusters generally include a sibilant such as "s" or one of the liquids "l" or "r".

Geminate consonants generally only occur in polymorphemic words, such as "mal-longa" "short", "ek-kuŝi" "to flop down", "mis-skribi" "to mis-write"; in ethnonyms such as "finno" "a Finn", "gallo" "a Gaul" (now more commonly "gaŭlo"); in proper names such as "Ŝillero" "Schiller", "Buddo" "Buddha" (now more commonly "Budho"); and in a handful of unstable borrowings such as "matĉo" "a sports match".

Word-final consonants occur, though final voiced obstruents are dispreferred. For example, Latin "ad" "to" became Esperanto "al", and Polish "od" "of, by, than" morphed into Esperanto "ol" "than". Sonorants and voiceless obstruents, on the other hand, are found in many of the numerals: "cent" "hundred", "ok" "eight", "sep" "seven", "ses" "six", "kvin" "five", "kvar" "four"; also "dum" "during", "eĉ" "even". Even the poetic elision of final "-o" is rarely seen if it would leave a final voiced obstruent. A very few words with final voiced obstruents do occur, such as "sed" "but" and "apud" "next to", but in such cases there is no minimal-pair contrast with a voiceless counterpart (that is, there is no "*set" or "*aput" to cause confusion with "sed" or "apud)." This is due to the fact that many people, including the Slavs and Germans, do not contrast voicing in final obstruents.

Syllabic consonants occur only as interjections and onomatopoeia: "fr!, sss!, ŝŝ!, hm!".

All triconsonantal onsets begin with a sibilant, "s" or "ŝ". Disregarding proper names such as "Vladimiro", the following initial consonant clusters occur:

*Plosive + liquid — "bl, br; pl, pr; dr; tr; gl, gr; kl, kr"
*Voiceless fricative + liquid — "fl, fr; sl; ŝl, ŝr"
*Voiceless sibilant + voiceless plosive (+ liquid) — "sc" IPA| [sts] , "sp, spl, spr; st, str; sk, skl, skr; ŝp, ŝpr; ŝt, ŝtr"
*Obstruent + nasal — "gn, kn, sm, sn, ŝm, ŝn"
*Obstruent + IPA|/v/ — "gv, kv, sv, ŝv"

And more marginally,:Consonant + IPA|/j/ — "(tj), ĉj, fj, vj, nj"

The affectionate suffixes "-ĉj-" and "-nj-", which retain remnants of the Slavic palatalized consonants, may very occasionally be used as words in their own right, as in "mia ĉjanja popolo" "my dear nation", in which case they may be word initial and not just syllable initial.

Although it does not occur initially, the sequence "dz" is pronounced as a cluster if not as an affricate, as in "edzo" IPA| [e.dzo] / [e.ʣo] "a husband" with an open first syllable [e] , not as IPA|* [ed.zo] .

In addition, initial "pf-" occurs in German-derived "pfenigo" "penny", "kŝ-" in Sanskrit "kŝatrio" "kshatriya", and several additional uncommon initial clusters occur in technical words of Greek origin, such as "mn-, pn-, ks-, ps-, sf-, ft-, kt-, pt-, bd-", such as "sfinktero" "a sphincter" (which also has the coda "nk"). Quite a few more clusters turn up in sufficiently obscure words, such as "tl" in "tlaspo" "Thlaspi" (a genus of herb), and Aztec deities such as "Tlaloko" "Tlaloc". (The IPA|/l/ phonemes are presumably devoiced in these words.)

As this might suggest, greater phonotactic diversity and complexity is tolerated in learned than in quotidian words, almost as if "difficult" phonotactics were an iconic indication of "difficult" vocabulary. Diconsonantal codas, for example, generally only occur in technical terms, proper names, and in geographical and ethnic terms: "konjunkcio" "a conjunction", "arkta" "Arctic", "istmo" "isthmus".

However, there is a strong tendency for more basic terms to avoid such clusters, although "cent" "hundred", "post" "after", "sankta" "holy", and the prefix "eks-" "ex-" (which can be used as an interjection: "Eks la reĝo!" "Down with the king!") are exceptions. Even when coda clusters occur in the source languages, they are often eliminated in Esperanto. For instance, many European languages have words relating to "body" with a root of "korps-". This root gave rise to two words in Esperanto, neither of which keep the full cluster: "korpuso" "a military corps" (retaining the original Latin "u"), and "korpo" "a biological body" (losing the "s").

Many ordinary roots end in two or three consonants, such as "cikl-o" "a (bi)cycle", "ŝultr-o" "a shoulder", "pingl-o" "a needle", "tranĉ-i" "to cut". However, these roots do not normally entail coda clusters except when followed by another consonant in compounds, or with poetic elision of the final "-o". Even then, only sequences with decreasing sonority are possible, so while poetic "tranĉ’" occurs, *"cikl’", *"ŝultr’", and *"pingl’" do not. (Note that the humorous jargon "Esperant’" does not follow this restriction, as it elides the grammatical suffix of all nouns no matter how awkward the result.)

Within compounds, an epenthetic vowel is added to break up what would otherwise be unacceptable clusters of consonants. This vowel is most commonly the nominal affix "-o," regardless of number or case, as in "kant-o-birdo" "a songbird" (the root "kant-" "to sing" is inherently a verb), but other part-of-speech endings may be used when "-o-" is judged to be grammatically inappropriate, as in "mult-e-kosta" "expensive".

There is a great deal of variation as to when an epenthetic vowel is used, since what is "acceptable" varies from speaker to speaker, and it also appears to depend on the frequency of the compound word, or perhaps the medium of expression (spoken vs. written). For example, the rather dry (and usually written) compound of "vorto" "word" and "provizo" "stock" for "vocabulary" turned up as "vortprovizo," with an unbroken "rtpr" cluster, in 97% of Google hits. (Interestingly, in the accusative case the cluster only occurs 78% of the time.) A similar compound of "parto" "part" and "preni" "to take", for "to participate", could produce the same "rtpr" cluster. However, this compound is much more frequent (25 times as many hits on Google ) and is part of people's basic speaking vocabulary. Here the epenthetic form "partopreni" is nearly universal; "partpreni" and its conjugations only occur 0.1% of the time.

Allophonic variation

With only five oral and no nasal or long vowels, Esperanto allows a fair amount of allophonic variation, though the distinction between IPA|/e/ and IPA|/ei̯/, and arguably IPA|/o/ and IPA|/ou̯/, is phonemic. Disregarding assimilation for the moment, the more noticeable allophony among the consonants is with IPA|/r/ and IPA|/v/. The IPA|/r/ may be pronounced as either an alveolar flap IPA| [ɾ] or an alveolar trill IPA| [r] , in free variation but with the flap more common. The IPA|/v/ may be a labiodental fricative IPA| [v] or a labiodental approximant IPA| [ʋ] , again in free variation, but with IPA| [v] considered normative. Alveolar consonants "t, d, n, l" are acceptably either apical (as in English) or laminal (as in French, generally but incorrectly called "dental"). Postalveolars "ĉ, ĝ, ŝ, ĵ" may be "palato-alveolar" (semi-palatalized) IPA| [t̠ʃ, d̠ʒ, ʃ, ʒ] as in English and French, or "retroflex" (non-palatalized) IPA| [t̠ʂ d̠ʐ ʂ ʐ] as in Polish, Russian, and Mandarin Chinese. "H" and "ĥ" may be voiced IPA| [ɦ, ɣ] , especially between vowels. However, aspiration or incomplete voicing of consonants as in English or Mandarin is considered substandard, as are the English diphthongized "long" vowels IPA| [ij, ɛj, uw, ɔw] .

Vowel length and quality

Vowels may be lengthened in open syllables or when stressed, and vowel quality often correlates with length, though the details vary with the language background of the speaker. (Zamenhof recommended pronouncing the vowels "e" and "o" as mid vowels at all times, but he himself pronounced them as open-mid vowels.) Adjacent stressed syllables are not allowed in compound words, and when stress disappears in such situations, it may leave behind a residue of vowel length.

Vowel length is sometimes presented as an argument for the phonemic status of the affricates, because vowels tend to be short before most consonant clusters (excepting plosives plus "l" or "r," as in many European languages), but long before "ĉ, ĝ, c," and "dz."

Kalocsay & Waringhien recommend pronouncing unstressed vowels short, even in open syllables, with stressed "e, o" as short open-mid IPA| [ɛ, ɔ] in closed syllables and long close-mid IPA| [eˑ, oˑ] in open syllables. When syllables of compound words lose their stress, they recommend that the vowel should be long and open-mid: "liber-tempo" IPA| [libɛˑrˈtɛmpo] , "or-ĉeno" IPA| [ɔˑrˈtʃeˑno] . However, this is widely considered unduly elaborate, and such minor details of pronunciation generally reflect speakers' backgrounds.


Epenthetic glottal stops in vowel sequences such as "boao" "boa" are non-phonemic, but allowed for the comfort of the speaker. They are especially common with sequences of identical vowels, such as "heroo" IPA| [heˈroˑʔo] "hero" and "praavo" IPA| [praˈʔɑˑʋo] "great-grandfather". It is also very common to pronounce an epenthetic IPA| [j] between an /i/ and a following vowel ("mia" IPA| [ˈmiˑja] , "mielo" IPA| [miˈjɛˑlo] ), but this is avoided in careful enunciation.

Poetic elision

Vowel elision is allowed with the grammatical suffix "-o" of singular nominative nouns, and the "a" of the article "la", though this rarely occurs outside of poetry: "de l’ kor’" "from the heart".

Normally semivowels are restricted to offglides in diphthongs. However, poetic meter may force the reduction of unstressed /i/ and /u/ to semivowels before a stressed vowel: "kormilionoj" IPA| [kɔɾmiˈli̯oˑnɔi̯] ; "buduaro" IPA| [buˈdu̯ɑˑɾo] .


Zamenhof recognized two types of regressive assimilation in Esperanto:
*Place assimilation among nasals, and
*Voicing assimilation among obstruents. However, he stated that "severely regular" speech would not have assimilation, and this has led to debate over whether it "should" occur.

An example of the first type is assimilation of "n" before a velar, as in "banko" IPA| [ˈbaŋko] "bank" or "sango" IPA| [ˈsaŋɡo] "blood". "N" may also palatalize before palatal /j/, as in "panjo" IPA| [ˈpɑˑɲjo] "mommy" and "sinjoro" IPA| [siˈɲjoˑro] "sir". However, although the desirability of these may be debated, the question almost never arises as to whether the "m" in "emfazi" should remain bilabial or should assimilate to labiodental "f" (IPA| [ɛɱˈfɑˑzi] ), as this assimilation is nearly universal in human language. Indeed, where the orthography allows, we see that assimilation does occur. For example, original "bonbono" "bonbon" has over time become "bombono" even in dictionaries.

The debate on voicing assimilation is likewise dependent on speakers' language backgrounds. The question of assimilation is almost never an issue with words that maintain Latinate orthography, such as "absolute" IPA| [apsoˈluˑte] "absolutely" or "obtuza" IPA| [ɔpˈtuˑza] "obtuse", despite the fact that potentially contrastive voiceless equivalents such as "apsido" "apsis" and "optiko" "optics" occur. Instead, the debate centers around the non-Latinate orthographic sequence "kz", frequently found in Latinate words like "ekzemple" "for example" and "ekzisti" "to exist". It is often claimed that "kz" is properly pronounced as written, with mixed voicing, IPA| [kz] , despite the fact that Zamenhof recognized that the "k" may assimilate to the "z" for IPA| [ɛɡˈzɛmple, ɛɡˈzisti] , as in Slavic, English, French, and many other languages. The two opinions are called "ekzismo" and "egzismo" in Esperanto. (Orthographic "gz" does not occur in Esperanto, except in the nonce word "egzismo" itself.) In practice, most Esperanto speakers assimilate both "kz" to IPA| [ɡz] and "nk" to IPA| [ŋk] when speaking fluently.

Voicing assimilation of affricates and fricatives before nasals, as in "taĉmento" "a detachment" and the suffix "-ismo" "ism", is both more noticeable and easier for most speakers to avoid, so IPA| [ˈizmo] for "-ismo" is less tolerated than IPA| [apsoˈluˑte] for "absolute". Compound words such as "okdek" "eighty", "longtempe" "for a long time", and "glavsonoro" "the ringing of a sword" are likewise more likely to retain mixed voicing, though assimilation is not uncommon in rapid speech: IPA| [ˈɔɡdɛk, lɔˑŋkˈtɛmpe, ˈglaˑfsoˈnoˑro] .

Similarly, mixed sibilant sequences, as in the polymorphemic "disĵeti" "to scatter", tend to assimilate, sometimes completely in rapid speech ( [IPA|diʃˈʃeˑti] ), though, if noticed, this would be considered wrong.

Like the generally ignored regressive devoicing in words such as "absurda", progressive devoicing tends to go unnoticed within plosive-sonorant clusters, as in "plua" IPA| [ˈpl̥uˑa] ("additional"; contrasts with "blua" IPA| [ˈbluˑa] "blue") and "knabo" IPA| [ˈkn̥ɑˑbo] ("boy"; the "kn-" contrasts with "gn-", as in "gnomo" IPA| [ˈɡnoˑmo] "gnome"). Partial to full devoicing of the sonorant is probably the norm for most speakers.

Loss of phonemic "ĥ"

The sound "ĥ" [x] was always somewhat marginal in Esperanto, and there has been a strong move to merge it into [k] . [Chris Gledhill. "Regularity and Representation in Spelling: the case of Esperanto." "Journal of the Simplified Spelling Society," 1994-1 pp 17-23. [] ] [R. Bartholdt and A. Christen, H. Res. 415 "A resolution providing for the study of Esperanto as an auxiliary language." "Hearings before the Committee on Education, House of Representatives, 63rd Congress, 2nd Session." 1914 March 17. [] ] Dictionaries generally cross-reference "ĥ" and "k," but the sequence "rĥ" (as in "arĥitekturo" "architecture") was replaced by "rk" ("arkitekturo") so completely by the early 20th century that few dictionaries even list "rĥ" as an option. Other words, such as "ĥemio" "chemistry" and "monaĥo" "monk", still vary but are more commonly found with "k" ("kemio, monako"). In a few cases, such as with words of Russian origin, "ĥ" may instead be replaced by "h." This merger has had only a few complications. "Ĥoro" "chorus" has been given the alternate form "koruso," because both "koro" "heart" and "horo" "hour" were taken. The two words still almost universally seen with "ĥ" are "eĥo" "echo" and "ĉeĥo" "a Czech". "Ek-" (perfective aspect) and "ĉeko" "check" already exist, and Esperanto roots cannot end in "h," although "ekoo" for "eĥo" is occasionally seen.

Proper names and borrowings

A common source of allophonic variation is borrowed words, especially proper names, when non-Esperantized remnants of the source-language orthography remain, or when novel sequences are created in order to avoid duplicating existing roots. For example, it is doubtful that many people fully pronounce the "g" in "Vaŝingtono" "Washington DC" as either IPA| [ɡ] or IPA| [k] , or pronounce the "h" in "Budho" "Buddha". Such situations are unstable, and in many cases dictionaries recognize that certain spellings (and therefore pronunciations) are inadvisable. For example, the physical unit "Watt" was first borrowed as "ŭato", to distinguish it from "vato" "cotton-wool", and this is the only form found in dictionaries in 1930. However, initial "ŭ" violates Esperanto phonotactics, and by 1970 there was an alternate spelling, "vatto". This was also unsatisfactory, however, due to the geminate "t", and by 2000 the effort had been given up, with "vato" now the advised spelling for both "Watt" and "cotton-wool". Some recent dictionaries, such as the " [ Reta Vortaro] ", no longer even list initial "ŭ" in their index. Likewise, several dictionaries now list a newer spelling "Vaŝintono" for Washington.

