The Dark Arts: How We Know What We Know

If you’ve been following us at the HLC, and especially our Fun Etymologies every Tuesday, you will have noticed that we often reference old languages: the Old English of Beowulf[1], the Latin of Cicero and Seneca, the Ancient Greek of Homer, and in the future (spoiler alert!), even the Classical Chinese of Confucius, the Babylonian of Hammurabi, or the Egyptian of Ramses. These languages all have extensive written records, which allows us to know them pretty much as if they were still spoken today, with maybe a few little doubts here and there for the older ones[2].

Egyptians might have had a bit TOO great of  a passion for writing, if you catch my drift

But occasionally, you’ve seen us reference much, much older languages: one in particular stands out, and it’s called Proto-Indo-European (often shortened to PIE). If you’ve read our post on language families, you’re probably wearily familiar with it by now. However, here’s the problem: the language is 10,000 years old! And writing was invented “just” 5,000 years ago, nowhere near where PIE was spoken.So, you may be asking, how the heck do we know what that language looked like, or if it even existed at all? And what do all those asterisks (as in *ekwom or *wlna) I see on the Fun Etymologies each week mean? Well, buckle up, dear readers, because the HLC will finally reveal it all: the dark magic that makes Historical Linguistics work. It’s time to take a look at…

The Comparative Method of Linguistic Reconstruction

“Linguistic history is basically the darkest of the dark arts, the only means to conjure up the ghosts of vanished centuries.”

-Cola Minis, 1952

If we historical linguists had to go only by written records, we would be wading in shallow waters indeed: the oldest known written language, Sumerian, is only just about 5,000 years old.

The oldest joke we know of is in Sumerian. It’s a fart joke. Humanity never changes.

Wait, “only just”?? Well, consider that modern humans are at least 300,000 years old, and that some theories put the origins of language closer to a million years ago. You could fit the whole of history from the Sumerians to us 200 times in that and still have time to spare!

So, while writing is usually thought of as one of the oldest things we have, it is actually a pretty recent invention in the grand scheme of things. For centuries, it was just taken for granted that language just appeared out of nowhere a few millennia in the past, usually as a gift from some god or other: in Chinese mythology, the invention of language was attributed to an ancient god-king named Fuxi (approximately pronounced “foo-shee”), while in Europe it was pretty much considered obvious that ancient Hebrew was the first language of humankind, and that the proliferation of languages in the world was explained by the biblical story of the Tower of Babel.

Imagine your surprise when the guy who was supposed to pass you the trowel suddenly started speaking Vietnamese

This (and pretty much everything else) changed during the 18th century, with the dawn of the Age of Enlightenment. During this age of bold exploration (and less savoury things done to the people found in the newly “discovered” regions), scholars started to notice something curious: wholly different languages presented interesting similarities with one another and, crucially, could be grouped together based on these similarities. If all the different languages of Earth had truly been created out of nothing on the same day, you would not expect to see such patterns at all.

In what is widely considered to be the founding document of historical linguistics, Sir William Jones, an English scholar living in India in 1786, writes:

The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of the verbs and in the forms of the grammar, than could possibly have been produced by accident; so strong indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists […]”

That source is, of course, PIE. But, again, how can we guess what that language sounded like? People at the time were too busy herding sheep and domesticating horses to worry about paltry stuff like writing.

Enter Jacob Grimm[3] and his Danish colleague Rasmus Rask. They noticed that the similarities between their native German and Danish languages, and other close languages (what we call the Germanic family today), were not only evident, but predictable: if you know how a certain word sounds in one language, you can predict with a reasonable degree of accuracy how its equivalent (or cognate) sounds in another. But their truly revolutionary discovery was that if you carefully compared these changes, you could make an educated guess as to what the sounds and grammar of their common ancestor language were. That’s because the changes that happen to a language over time are mostly regular and predictable. Think how lucky that is! If sounds in a language changed on a random basis, we would have no way of even guessing what any language before Sumerian looked like!

More like HANDSOME and Gretel, amirite?

This was the birth of the comparative method of linguistic reconstruction (simply known as “the comparative method” to friends), the heart of historical linguistics and probably the linguistic equivalent of Newton’s laws of motion or Darwin’s theory of evolution when it comes to world-changing power.

Here, in brief, is how it works:

How the magic happens

So, do we just look at a couple of different languages and guess what their ancestor looked like? Well, it’s a bit more complicated than that. A lot more, in fact.

Not to rain on everyone’s parade before we even begin, but the comparative method is a long, difficult and extremely tedious process, which involves comparing thousands upon thousands of items and keeping reams of notes that would make the Burj Khalifa look like a molehill if stacked on top of each other.

The Burj Khalifa, for reference

What you need to do to reconstruct your very own proto-language is this:

  1. Take a sample of languages you’re reasonably sure are related, the larger the better. The more languages you have in your sample, the more accurate your reconstruction will be, since you might find out features which only a few languages (or even only one!) have retained, but which have disappeared in the others.
  2. Find out which sounds correspond to which in each language. If you do this with a Romance language and a Germanic one, you’ll find that Germanic “f” sounds pretty reliably correspond to Romance “p” sounds, for example (for instance, in the cognate couple padre and father). When you find a correspondance, it usually means that there is an ancestral sound underlying it.
  3. Reconstruct the ancestral sound. This is the trickiest part: there are a few rules which we linguists follow to get an accurate reconstruction. For example, if most languages in a sample have one sound rather than another, it’s more probable that that is the ancestral sound. Another criterion is that certain sound changes usually happen more frequently than others cross-linguistically (across many languages), and are therefore more probable . For example, /p/ becoming /f/ is far more likely than /f/ becoming /p/, for reasons I won’t get into here. That means that in our padre/father pair above, it’s more likely that “p” is the ancestral sound (and it is! The PIE root is *ph2tér[4]) Finally, between two proposed ancestral sounds, the one whose evolution requires the least number of steps is usually the more likely one.
  4. Check that your result is plausible. Is it in accordance with what is generally known about the phonetics and phonology of the language family you’re studying? Does it present some very bizarre or unlikely sounds or phonotactics? Be sure to account for all instances of borrowing, coincidences and scary German-named stuff like Sprachbunds[5]. If you’ve done all that, congratulations! You have an educated guess of what some proto-language might have sounded like! Now submit it to a few journals and see it taken down by three different people, together with your self-esteem.[6]But how do we know this process works? What if we’re just inventing a language which just so happens to look similar to all the languages we have in our sample, but which has nothing to do with what any hypothetical ancestor language of theirs would have looked like?

Well, the first linguists asked these very same questions, and did a simple experiment, which you can do at home yourself[7]: they took many of the modern Romance languages, pooled them together, and tried the method on them. The result was a very good approximation of Vulgar Latin.

Well, it works up to a certain point. See, while the comparative method is powerful, it has its limits. Notice how in the paragraph above I specified that it yielded a very good approximation of Vulgar Latin. You see, sometimes some features of a language get lost in all of its descendants, and there’s no way for us linguists to know they even existed! One example of this is the final consonant sounds in Classical Latin (for example, the -us and -um endings, as in “lupus” and “curriculum”), which were lost in all the modern Romance languages, and are therefore very difficult to reconstruct[8]. What this means is that the further back in time you go the less precise your guess becomes, until you’re at a level of guesswork so high it’s effectively indistinguishable from pulling random sounds out of a bag (i.e. utterly useless). That’s why, to our eternal disappointment, we can’t use the comparative method to go back indefinitely in the history of language[9].

When you use the comparative method, you must always keep in mind that what you end up with is not 100% mathematical truth, but just an approximation, sometimes a very crude one. That’s what all the asterisks are for: in historical linguistics, an asterisk before a word basically means that the word is reconstructed, and that it should therefore be taken with a pinch of salt[10].

The End

And so, now you know how we historical linguists work our spells of time travel and find out what the languages of bronze age people sounded like. It’s tedious work, and very frustrating, but the results are well worth the suffering and the toxic-level intake of caffeine necessary to carry it out. The beauty of all this is that it doesn’t only work with sounds: it has been applied to morphology as well, and in recent years we’ve finally been getting the knack of how to apply it to syntax as well! Isn’t that exciting?

It certainly is for us.

Stay tuned for next week, when we’ll dive into the law that started it all: Grimm’s law!

  1. P.S. Remember that Fun Etymology we did on the word “bear”? Yeah, “Beowulf” is another of those non-god-angering Germanic taboo names for bear! It literally means “bee-wolf”.
  2. Or even some big ones: we know very little about how Egyptian vowels were pronounced and where to put them in words, for example.
  3. Yes, the same guy who wrote the fairy tale books, together with his brother.
  4. I won’t explain the “h2” thing, because that opens a whole other can of worms we haven’t time to dive into here.
  5. We’ll talk about these in a future post.
  6. This doesn’t always happen. Usually.
  7. And it doesn’t involve any explosives or dangerous substances, only long, sleepless nights and the potential for soul-crushing boredom. Hooray!
  8. I don’t say “impossible”, because in some cases a sound lost in all descendant languages can be reconstructed thanks to its influence on neighbouring sounds, or (as in the case of Latin) by comparing with different branches of the family. But this is, like, super advanced über-linguistics.
  9. Which would instantly solve a lot of problems, believe me.
  10. Historical linguistics is an exception here. In most other fields of linguistics, the asterisk means “whatever follows is grammatically impossible”.

Leave a Reply

Your email address will not be published. Required fields are marked *