Skip to content

Lost and Found: How the Tocharian Languages Reveal Answers and Raise New Questions

This paper will discuss how the Tocharian language family has contributed to historians’ understanding of some of the migrations and changes throughout the Silk Road. By studying the Tocharian languages, we can infer information about their speakers, and the speakers of nearby communities in the past. The languages important to this paper are the two Tocharian languages that have been discovered, Tocharian A and Tocharian B, more accurately known as Agnean[1] and Kuchean[2].[i] For the purposes of continuity in this paper[3] I will refer to them still as Tocharian A and Tocharian B.

Historical linguistics, the study of language change, can tell us how languages have interacted over time, and how the people who spoke those languages interacted amongst themselves and with surrounding peoples and their languages. Historical linguistics is therefore relevant to the study of human history more generally, and “historical linguistic findings have been utilized to solve historical problems of concern to society which extend far beyond linguistics.”[ii] These findings are made through investigations regarding linguistic prehistory, which combines other information from different disciplines such as archaeology, ethnohistory, history, ethnographic analogy, human biology and other sources of information about humans in the past to determine a better and more complete picture of the past, and in general “uses historical linguistics findings for cultural and historical references.”[iii] Since we are concerned with finding out more about what the Tocharian languages can tell us about the homeland of their speakers, one method used in historical linguistics can help us, namely linguistic migration theory.

The basic underlying assumption behind linguistic migration theory is that, as a language splits up, the daughter languages are expected to stay in close linguistic relation to one another geographically. Therefore, looking at today’s geographic distributions “we can hypothesize how they got where they are now and where they came from.”[iv] Since the Tocharian languages are dead, we have to shift this model slightly by examining where they were found, and how they may be connected to other languages. This is compounded by the fact that as far as we have been able to determine, Tocharian A and B are the only descendants of Proto-Tocharian. While they are separate languages (about as different as French and Spanish[v]) they are so closely related that they must have been one language (Proto-Tocharian) until around a thousand years before the date of the earliest documents discovered. The language was deemed a branch of the Indo-European (IE) family by Sieg and Siegling over a century ago. However, these languages are distinct from any other IE language [vi] and Proto-Tocharian forms its own branch on the IE tree as descended directly from Proto-Indo-European. Given the fact that the two Tocharian languages were discovered in such a close proximity, one might assume via the linguistic migration theory that they migrated as descendants of PIE in order to make a Proto-Tocharian homeland in the Tarim Basin.

One of the issues of this is that linguistic migration theory is inherently flawed in a number of ways. Firstly, there are no other surviving Tocharian languages, so aside from where they were found we cannot guarantee that there were no other Tocharian languages spoken elsewhere, as it is possible that we have lost documentation of these other Tocharian languages. Hansen notes that it is “possible that other Indo-European languages more similar to Tocharian A and B were spoken in Central Asia but that no material in these lost languages survives.”[vii] On a related note, one source notes that while Tocharian A and B are markedly different enough to be considered true languages and not merely dialects, because of their close proximity, the sharpness of the divergence between Tocharian A and Tocharian B is questionable. One source cites an argument made by Werner Winter:

Werner Winter has argued that Tocharian A did not actually develop in the region in which we find it used, but that it was merely the language of the Buddhist mission to the Turks, The Tocharian A manuscripts were prepared under the auspices of the Turks who settled in the eastern part of the Tarim Basin in territories that are occupied by the Tocharian B speakers, This would suggest that Tocharian A, when it was at home, was situated between the Tocharian B speakers in the north Tarim and the Turks to their north and east. This […] would imply that the Tocharian languages were once far more widespread than their surviving manuscripts reveal.[viii]

If Tocharian languages were once much more widespread, it stands to reason that there could have been more Tocharian languages without surviving manuscripts. If that is the case, then the ones that we do have are but a small sample, and that leads to further problems when we consider the migratory patterns.

This leads to the second major problem with the linguistic migration theory, that being that Tocharian speakers could have had a different homeland that they were driven out of or decided for some other reason such as resources and/or lack thereof, and, being forced to settle elsewhere, congregated in a new location that they were drawn to.

Although the location in which Tocharian was spoken would suggest that it is related to the Indo-Iranian branch of Indo-European, the way the language treats its palatals places it in the centum branch of Indo-European, which houses more Western languages like Greek, Celtic and Italic, even though it is spoken geographically closer to languages from the satem branch, from the Indo-Iranian family.[ix] Centum and satem branches are distinguished from one another by their language changes, that is to say, the types of changes that the languages underwent in the centum branch are distinct from the changes underwent by the satem branch. One example is this treatment of Indo-European palatals.

Furthermore, the features of Tocharian are distinct enough from those of other Indo-European languages that it constitutes its own branch of Indo-European that is on a level with Indo-Iranian, Germanic, Anatolian, Greek, Italic, Celtic, Germanic, Balto-Slavic, Armenian, and Albanian.[x] When compared to these other language branches, Proto-Tocharian has more in common with German, Greek, Italic, or Celtic languages.[xi] That said, many similarities between the languages can be attributed to natural language change rather than recent common ancestry, and moreover the Tocharian branch “shares virtually no diagnostic innovations with any other branch of [PIE].” [xii] Geographically, the two Tocharian languages are found to be very closely tied together. According to Mallory and Mair,

Tocharian B texts [are found] across the entire northern rim of the Tarim Basin (about 1,100 km or 700 miles) while Tocharian A seems to have been confined to the area between Qarashahar and Bezaklik, a linear distance of only 300 km (200 miles). In the case of Tocharian B, regional differences have suggested the existence of three dialects: the first (western) in Kucha and the area surrounding it, the central dialect of the Qarashahar region, and finally, the eastern dialect of the Turpan oasis (in the east).[xiii]

That Tocharian B had dialects while Tocharian A did not lies concurrent with the idea that Tocharian A was already a dead language at the time that the sources date from. Hansen notes that it was “by the sixth, seventh and eighth centuries” that Tocharian A[4] had died out and was as an exclusively written language used only by Buddhists inside monasteries. Dialectal differences make sense considering that through these centuries the language was “used by monastic officials for accounts, kings for royal orders, historians for their chronicles, travelers for graffiti, and devotees to label their offerings to monasteries. In addition, storytellers used Kuchean[5] to tell Buddhist narratives.[xiv]

Further implications of this are that the Tocharian branch developed in geographic isolation from the other surviving language branches. This isolated development indicates that the peoples who spoke languages of the Tocharian branch had migrated to their ultimate location as their language diverged from PIE.

While Proto-Tocharian languages developed in isolation, the borrowing of lexical items and other linguistic features took place between the later Tocharian languages and the Indo-Iranian languages which continued for the duration of the survival of these later Tocharian languages.[xv] This confirms that the Tocharian speakers had trade and close interaction for significant portions of the language’s development. The nature of the interactions is not always clear, especially because the earliest loanwords appear to be borrowed from Old Iranian[xvi] which indicates that they had interaction at an early point in their development but does not make it clear from which specific Indo-Iranian language the borrowings occurred. Some later loanwords, while still appearing in the early development of Tocharian languages, are similar to Ossetic, which is descended from Scythian. Thus “the Tocharians could have been affiliated with one or another of the steppe confederations dominated by Iranian tribes at a relatively earlier period of their prehistory.”[xvii] The nature of this affiliation is not clear, and so while we know that they had contact, the exact circumstances are unknown as we also do not have a set time or location. At that stage, Tocharian still seems to have been spoken by peoples who were distributed over a larger location.

The clearest identified loanwords are from Bactrian, which indicates a relationship with the Kushāna kingdom, although that relation is also unclear in nature. One theory is that they may have been Kushāna subjects;[xviii] however, they would have then been much further to the southwest than they were found. The later loans (from Sogdian, Khotanese and Sanskrit) are said to be an influence “after the Tocharians reached their historical home.”[xix] “Historical home” here refers to the home that they had in the Tarim Basin, as we know little of ancient migrations, and we cannot base them with linguistic evidence, especially considering that we do not have evidence of other Tocharian languages, as any non-circumstantial evidence towards their existence has been lost to time.

One question that arises when we turn to the Tocharian languages and discoveries is the question of whether the languages spoken are connected to the people we call Tocharians, namely the mummies that were found in Xinjiang.

The route of the Tocharians from west to east across the Steppe can be argued for by the presence of apparently Tocharian elements in Finno-Ugric languages, whose speakers in the second millennium B.C. dwelt in the south of the Eurasian forest zone. Later contacts of Tocharian with East Iranian languages that spread in the Steppe and Central Asia have been traced.[xx]

While there were long migrations at the time, “the supposed migrations of the Tocharians from Europe to Xinjiang through alien territory is hard to motivate and strains credulity.”[xxi] When the documents from which Tocharian was first identified were discovered, it was assumed by identifiers of the documents that the Tocharians must have made a long migration from Europe. However, this asserts that Proto-Tocharian speakers travelled through the territories that spoke Indo-Iranian languages is a flawed hypothesis, as it does not fit with the distinction between branches since they will have travelled through the territory of Baltic people, Slavic people, and Iranians, all of whom have languages belonging to the satem branch of Indo-European. The alternate theory, which is more plausible due to its simplicity,[6] is that the palatalization that the satem languages underwent was “an innovation in the more central dialects of Indo-European that did not spread to the peripheral dialects.”[xxii] Thus, Tocharians would have moved before that innovation, which identifies them as moving from the north around 2000 BCE, placing them in a plausible position to be identified with the mummies found at Xinjiang.[xxiii]

Evidence towards the stage of developments of Tocharian languages beyond Proto-Tocharian do not go back as far as 2000 BCE. As Mallory and Mair note:

we are not dealing with two dialects that closely resemble each other, but something more on the order of two different languages. This is an important point because it suggests that these languages had been separated from one another for a considerable period, estimated by linguists as possibly somewhere between 500 years and a millennium, before our earliest texts. This would indicate a lower date for the break-up of Proto-Tocharian […] at a very roughly 500 BC.[xxiv]

This late break-up period indicates that there is a possibility that Proto-Tocharian, or a subfamily of Proto-Tocharian that separated later into Tocharian A and Tocharian B, was the language of the Xinjiang mummies, whose earliest remains date back as far as 2000 BCE, and are said to have survived in the region to about the third century CE.[xxv] Since Proto-Tocharian developed into daughter languages throughout the course of time that the peoples these mummies belonged to existed, their culture could have even spoken a completely different daughter of the Tocharian language tree that has since been lost to us; in fact, the mummies were not found with any linguistic documentation that would identify them as speakers of Tocharian languages.

We must also be cautious about equating a language with a culture or ethnicity because linguistic and cultural change are not synonymous. While language can be and often is a symbol of identity, it is not the only possible symbol.[7]

The languages we do have access to also exist in different contexts:

While both languages were employed in the translation of Buddhist documents, only Tocharian B occurs in secular and administrative contexts as well. This has led to the well-supported supposition that we have of Tocharian A is the remains of a liturgical language, probably a dead liturgical language at that, while Tocharian B was the only living language in the period from which we draw our texts.[xxvi]

With the indication that Tocharian A was actually a dead language at the time we have the ability to observe it, we could assume that, at the time, Tocharian had developed enough that certain branches of the tree were starting to die out, that is, the language had spread far enough that the people who spoke it were now learning other languages spoken in the region and the use for it as anything other than a liturgical language had died out. This is a conjecture, but a possibility.

The home in the Tarim Basin was likely reached before the common era, evidence for which is that Middle Chinese apparently borrowed the word *mjit for honey, most likely from Tocharian B. The first instance of its use was in the prose of Wang Chong, who wrote in the 1st century CE. The Tocharian-Chinese connection must have been established for a few generations at this point, since “the diffusion of such a word under pre-modern conditions is not likely to have been very rapid.”[xxvii] Thus there was contact, however we cannot rely wholly on linguistic evidence, as language is ever-shifting, and so the connection cannot be clearly defined.

Looking at these linguistic influences, we can determine that there was significant contact between the people who spoke Tocharian and the surrounding peoples who communicated with them, presumably through trade as there had to be constant and consistent contact between them. For interaction between languages to hold on to borrowed terminology, there must be an amount of contact between languages. The why of borrowing depends on whether there is a need for the word, prestige gained when using the word, and in some cases availability, choice, change, etc. If words are being used interchangeably, then some may be adopted as a base, whether for utility or preference. For any of these to occur, there had to have been prolonged contact on the generational level.

While we can never come to a perfect solution in regard to the fate of the Tocharian speakers, and the mummies at Xinjiang continue to be a mystery to us, the study of these languages and the cultures they may have sprouted live on in our imaginations and provide us with at least a small amount of knowledge about the world that used to be theirs.

Selected Bibliography

Bonfante, Giuliano. “The Relative Position of the Indo-European Languages,” The Journal of Indo-European Studies 15 (1987): 77-80.

Campbell, Lyle. Historical Linguistics: An Introduction. 3rd. Cambridge, Massachusetts: The MIT Press, 2013.

Hansen, Valerie. The Silk Road: A New History. Oxford: Oxford University Press, 2012.

Kuzmina, E. E. The Prehistory of the Silk Road. Edited by Victor H. Mair. Philadelphia, Pennsylvania: University of Pennsylvania Press, 2008.

Mallory, J. P., and D. Q. Adams. The Oxford Introduction to Proto-Indo-European and the Proto-Indo-European World. Oxford, GB: OUP Oxford, 2006. Accessed October 19, 2016. ProQuest ebrary.

Mallory, J. P., and Victor H. Mair. The Tarim Mummies: Ancient China and the Mystery of the Earliest Peoples from the West. London: Thames & Hudson, 2000.

Pulleyblank, Edwin G. “Why Tocharians?” The Journal of Indo-European Studies 23, 1995: 415-428.

Ringe, Donald. “Tocharians in Xinjiang: The Linguistic Evidence.” The Journal of Indo-European Studies 23, 1995: 439-442.

Winter, Werner. “Tocharian and Proto-Indo-European.” Lingua Posnaniensis 25 (1982): 1-11.


[1] “The specific sites of Tocharian A documents are confined to the east where they are found at the religious sanctuaries of Shorchuq near Qarashahar, the ancient kingdom of Agni, and hence Tocharian A is also known as Agnean or East Tocharian.” (Mallory and Mair 274)
[2] Tocharian B documents “have been recovered from Maralbeshi, in various religious establishments near Kucha, i.e. Duldur Aqur, Qumtura, Subeshi and Qizili. It is for this reason that Tocharian B is also known as Kuchean or West Tocharian, but the latter is something of a misnomer since Tocharian B documents are also known across the east, generally in the same sites as Tocharian A, i.e. Qarashahar, Turpan, Khocho, Tuyuq, Sangim and exclusively at Murtuq.” (Mallory and Mair 274-275)
[3] With the exception of Hansen and Mallory/Mair, all of my sources have named them as being Tocharian A and Tocharian B.
[4] There referred to as Agnean.
[5] Tocharian B
[6] When considering historical linguistics, it is important to employ Occam’s razor.
[7] i.e. “shared cultural tradition (heritage), kinship or perceived genealogy, religion, territory, national origin, even ideology, values and social class.” (Campbell 439)


[i] Hansen 73
[ii] Campbell 1
[iii] Ibid. 405
[iv] Ibid. 424
[v] Hansen 72
[vi] Ringe 439
[vii] Hansen 73
[viii] Mallory and Mair 277
[ix] Pulleyblank 416
[x] Campbell 176-177
[xi] Hansen 72
[xii] Ringe 440
[xiii] Mallory and Mair 276
[xiv] Hansen 76
[xv] Ringe 440

[xvi] Ibid. 441
[xvii] Ibid.
[xviii] Ibid.
[xix] Ibid.
[xx] Kuzmina 89
[xxi] Pulleyblank 416
[xxii] Ibid.
[xxiii] Ibid.
[xxiv] Mallory and Mair 276
[xxv] Ibid. 7
[xxvi] Ibid. 277
[xxvii] Ringe 442


  1. On Academic Intersections: Knowledge Stacks – Word-for-Sense and Other Stories
    September 14, 2018 @ 12:00

    […] [2] If any of y’all are interested in how deciphering dead languages can add to historical knowledge you might also be interested in reading my essay about how the discovery of Tocharian languages has contributed to historians’ understanding of some of the m… […]


  2. Sharika Kuhlman
    July 17, 2020 @ 23:11

    I just added this weblog to my rss reader, great stuff. Can’t get enough!


  3. On Academic Intersections: Knowledge Stacks – Word-for-Sense and Other Stories
    July 18, 2020 @ 07:50

    […] [2] If any of y’all are interested in how deciphering dead languages can add to historical knowledge you might also be interested in reading my essay about how the discovery of Tocharian languages has contributed to historians’ understanding of some of the m… […]


Leave a Reply