Problems in phylogenetic studies on Sino-Tibetan origin: why Sichuan is still its homeland

9 min readDec 18, 2023

In the debate of where lies the homeland of proto-Sino-Tibetan (ST) language, three hypotheses exist: middle Yellow River basin of China’s Central Plain, western Sichuan/eastern Tibetan Plateau, northeastern India. I have been a staunch advocate for the western Sichuan hypothesis for various reasons which I do not see any compelling evidence waged against. The most well known scholar advocating for the Sichuan hypothesis is van Driem. Myself wrote a few posts on this site through the lens of historical comparative methods that trace the cognates for flora and fauna in the proto-ST environment, especially different species of bears, in addition to an ambitious general introduction of why genetic and historian studies pointed to Sichuan/Basuria as the cradle of early East Asian civilizations.

Chinese scholars have generally espoused the Yellow River origin hypothesis, which I have cautioned, at least among some of these scholars, were a nationalist propaganda implicitly encouraged by the P.R. China government. This strategic propaganda effort is now being forcefully pushed on university-level archaeology and historical education and research, after Xi Jinping released his speech of “forging the communal sense of the Chinese nation” and “sticking to four confidences”. The new directorial spirit of the Chinese government now explicitly requires historical researchers to stick to the narrative that promote the core of Chinese civilization as independently developed out of the Central Plain along Yellow River while absorbing parts and parcels of surrounding cultures in China. Naturally, whether fit or not, linguists based in Chinese institutions need to forge their findings to match with the Yellow River origin hypothesis for basically any culture-level phenomenon. Millet farming needs to originate in Yellow River, ST language needs to develop first in Yellow River, Bronze metallurgy needs to independently develop in there, even Homo Sapiens cannot come out of Africa before China.

I will not suggest all studies concluding with the Yellow River hypothesis being propaganda, some are clearly not. The suspicious tendency only appears when the findings of a study do not fully support the hypothesis, yet the authors proceeded to the conclusion without hesitation, nonetheless. I am going to show the difference between two studies on the homeland of proto-ST with the same conclusion affirming the Yellow River hypothesis but different tones of narrative and rigor. One was published in Scientific Reports in 2020 and another in Nature-Letters in 2019. Both studies have a lead author named Zhang, but the first (Zhang et al 2020) had authors based in the UK and the latter (Zhang et al 2019) was based in China.

To begin with, phylogenetic method, whether using Bayesian statistical inference or the classical frequentist model-testing, is simply a testing tool against priors or hypotheses. Scholars would first propose a family tree for many members of a supposed genus. Then, based on the number of the presence and absence of common traits (e.g. morphology, cognate, etc), similar members will be grouped together and claimed to have a common ancestor which is a node preceding them. The most distant member from all the rest is supposed to be remotely related or have split the earliest from the rest. In ST languages, this most distant member is often Chinese, which has the fewest number of common traits with other ST languages. Thus, the immediate preceding ancestor for Old Chinese is just proto-ST.

But phylogenetic method was developed to apply the logic of biological evolution to empirical data. When birds shared the fewest traits with a group of mammals, phylogenetic tree means birds diverged from mammals before these mammals evolved and they shared a common ancestor. It by no means implies birds predates the ancestor of mammals other than the fact birds is genealogically, not in real time, closer to the bird-mammal common ancestor. More importantly, biological species and their fossils are the actual embodiment of anatomical traits that are used to group commonalities and divergence. This is not the case for linguistic evolution. Birds cannot suddenly lose their wings without prolonged evolution and adaptation, but a language can. A labelled language may undergo fundamental internal changes that nearly fully altered the original morphological, topological, and semantic features. Old Chinese and Middle Chinese have very significant differences that cannot be explained by natural evolution and contact, such significant Schuessler and Benedict both suggested that Old Chinese was only spoken by a minority of Zhou conquerors and imposed in nomine on the Kra-Dai, Trans-Eurasian, and Austronesian people who resided in Central Plain during the 1st Millenium. Middle Chinese was a creolization product that resembles more of Hmong-Mien, Kra-Dai features rather than other ST languages. Therefore, while a biological species that share the fewest traits with other species can be justified as a distant split-off in a phylogenetic tree, a distant language may not be so. The distinction of Old Chinese from Tibetan-Burman language may suggest it is the first split-off (even so, it still does not imply which is older), but it may also mean Old Chinese is the strangest ST language due to creolization arising from the contact with the possibly Tungustic-speaking Shang people and myriad of Kra-Dai and Hmong-Mien speakers residing in the Central Plain.

After all, even if recorded writing and pronunciation of Old Chinese faithfully reflects how Zhou people spoke, phylogeny still does nothing to confirm the direction of language evolution. The only resort by both phylogenetic studies to affirm the Yellow River hypothesis is by referring to a potential match between their estimated time of divergence in proto-ST and the time of millet farming, both of which are highly variable with wide confidence intervals. However, as many scholars have explained and I have also made a post earlier, the adoption of technology does not equal replacement of population (this point is also admitted in Zhang et al 2020 paper). The assumption of demic diffusion has undergone critical evaluation recently. Given that millet cultivation was discontinuously transmitted from modern Manchuria (West Liao River) to Yellow River, and only then spread to Sichuan and Tibetan Plateau over a period of two to three thousand years (Hosner et al 2016), being an earlier culture to cultivate millet says nothing about whether the population of this culture migrated to and replaced other later adopters. The lack of etyma for millet in ST languages refutes the assumption that millet was very important for all ST people. Similarly, another hallmark species used to prove Yellow River agriculturists initiated demic diffusion of proto-ST is pig. Unfortunately, just like millet, Sagart (1999) found in the Roots of Old Chinese that “No outside comparisons have been found for a root *hlaj? ‘pig’. It is possible that we are dealing with a Chinese innovation”.

Finally, Jade Guiede’s (2018) review of archaeological fieldworks in eastern Tibetan Plateau and western Sichuan showed that millet farmers did not fully expand into this region but gave way millet and pottery technology to the local foragers and hunters. The rapid penetration of millet farmers into Himalayas was also questioned as implausible for the short period by Liu Chi-chun et al (2022). Then, the more realistic scenario is that the local foragers traded, conflicted, inter-married with millet farmers, forcing the latter to hold steads in low-elevation river valleys of the Plateau. Local foragers maintained a versatile economic advantage over the millet farmers by adopting multitudes of subsistence strategy including hunting, subsistence farming and herding. Thus, neolithic sites in eastern Plateau such Karuo received Yangshao pottery and millet infusion only to a limited extent during a limited time. Millet farmers intermingled with local advantageous foragers, instead of replacing them in the manner of what the Indo-Europeans did to prehistoric Europeans and Indus River civilization. This peaceful coexistence scenario likely reflects the fact that the indigenous inhabitants of eastern Tibetan Plateau were already cultural and genetic relatives of Yangshao millet farmers before the arrival of the latter, and this possibility is attested by genetic studies on Y-haplogroup O-M134.

In short, the recent obsession over Bayesian phylogenetic modeling overlooks the fundamental issue of applying phylogenetic models to linguistic data. Now we turn to the two studies mentioned above, some oddity will emerge to baffle us regarding how Yellow River hypothesis can be affirmed in face of weak evidence.

Zhang et al (2020) study reserved a more careful tone in linking phylogenetic tree to the homeland of proto-ST. Their main conclusion is that proto-Chinese split off from proto-ST in 8000 BP, a time much earlier than any other studies. This period overlaps that of millet cultivation in northern China but actually predates Yangshao culture tremendously by two to three millennia, contradicting their own claim that millet farming marked the bifurcated divergence within proto-ST. On contrary, the Bayesian phylogenetic result of Chinese being the earliest divergence would attest my hypothesis that Old Chinese is the earliest wanderer off from western Sichuan and intermingled with other non-ST population in eastern China, including proto-Tungustic millet gatherers distributed from West Liao River to Shandong (Robbeets et al 2021). Millet was then slowly and gradually borrowed back to proto-ST groups remained in western Sichuan and eastern Tibetan Plateau without population replacement.

Bayesian models can give posterior likelihoods and their distribution. Like any statistics, these likelihoods per se have a confidence interval and a range allowing randomness. An estimation of 8000 BP with a 95% CI ranging from 4079 to 11112 BP is just too wild to meaningfully match and triangulate with any historical and archaeological evidence. If this estimation is taken as true, any sloppy archaeological evidence of millet farming within the wild scope of 4 to 11 millennia BP would substantiate their assumption that millet farming coincided with the split of Chinese from proto-ST.

Another problem being the phylogeny in this study, as elsewhere, was built only on cognates, and no morphological traits were considered. The problem of only using cognates to establish the Stammbaum in ST language has been warned by Randy LaPolla in edited book Areal Diffusion and Genetic Inheritance, where he cautioned that ST is notorious for word contagion and borrowing with the neighboring Hmong-Mien, Austroasiatic, Kra-Dai, and Indo-Aryan languages. Similar opinion was expressed in Laurent Sagart (1999) that many perceived congates were loanwords between Chinese and Tibetan-Burman, or other languages, due to areal proximity. Therefore, without also consulting the divergence of morphological traits in the ST languages, phylogenetic studies based on cognates are likely inherently biased by areality. This leads to the fundamental issue with phylogenetic study on language: phylogene cannot account for areal proximity. Old Chinese were the earliest divergence from proto-ST not because its own evolutionary path and mutation, but simply because Old Chinese was the farthest branch that tapped deep into areas of non-ST languages!

Due to the above reasons, the authors themselves were hesitant: “This calls for a more cautious interpretation of the inferred root age, and a more nuanced understanding of the first Sino-Tibetan speakers than ‘out-migrating farmers. Farming in East Asia may have spread gradually through the mixing of farmers and hunter-gatherers”. This nuanced interpretation actually aligns with Guedes (2018). We have reasons to doubt, if millet farmers arrived extremely slowly over a long period time, how can proto-ST originate from millet farmers without replacing the large number of ST languages spread across the entire southwestern China, Himalayas, and Burma? The more realistic scenario is again that proto-ST speakers already existed in eastern Himalaya/western Sichuan before the arrival of Yangshao millet farmers in the 6th millennium BP.

In the other study — Zhang et al (2019), the same problem inherent in phylogenetic study still exists. They made a phylogenetic tree with Chinese as the outlier first split off from proto-ST and estimated the time of the onset of proto-ST to be 6000 BP, with 95% CI being 4100–7800 BP. Because this time of onset coincided with the time of millet cultivation, the authors claimed Yellow River origin to be the correct hypothesis. Not to mention that millet cultivation now prove to be much earlier than 6000 BP in West Liao River, the logical link connecting food production with the formation of a language is very arbitrary. As we have discussed above, the expansion of millet and pig based agriculture is very gradual and discontinous than previously believed.

The authors also made a probiblity density map for the possible homeland of ST language, and the result is surprisingly western Sichuan/eastern Himalayas. They claimed two conditions were not met so their own map couldn’t be taken seriously. But their explanation of the violation of the two conditions were very ad hoc: first, the language extinction occurred in eastern China did not primarily happen to ST language but ancient Hmong-Mien and Austroasiatic languages. Second, all ST groups undergoing migration in the same direction is by itself an assumption under the hypothesis of Yellow River origin. In fact, under Sichuan origin hypothesis, some ST groups migrated north and east-ward to become Chinese and Tangut; some migrated southward via valleys into Burma and northeastern Indian; while others migrated west-ward to form Bodic group and then further south across the Himalaya to form various ST groups in Nepal and northeastern India. The map by these authors espousing the Yellow River hypothesis ironically pointed out western Sichuan as the homeland of proto-ST.

Problems in phylogenetic studies on Sino-Tibetan origin: why Sichuan is still its homeland

Written by Yusuf Basurian