The Dark Arts: How We Know What We Know

If you’ve been following us at the HLC, and especially our Fun Etymologies every Tuesday, you will have noticed that we often reference old languages: the Old English of Beowulf[1], the Latin of Cicero and Seneca, the Ancient Greek of Homer, and in the future (spoiler alert!), even the Classical Chinese of Confucius, the Babylonian of Hammurabi, or the Egyptian of Ramses. These languages all have extensive written records, which allows us to know them pretty much as if they were still spoken today, with maybe a few little doubts here and there for the older ones[2].

Egyptians might have had a bit TOO great of  a passion for writing, if you catch my drift

But occasionally, you’ve seen us reference much, much older languages: one in particular stands out, and it’s called Proto-Indo-European (often shortened to PIE). If you’ve read our post on language families, you’re probably wearily familiar with it by now. However, here’s the problem: the language is 10,000 years old! And writing was invented “just” 5,000 years ago, nowhere near where PIE was spoken.So, you may be asking, how the heck do we know what that language looked like, or if it even existed at all? And what do all those asterisks (as in *ekwom or *wlna) I see on the Fun Etymologies each week mean? Well, buckle up, dear readers, because the HLC will finally reveal it all: the dark magic that makes Historical Linguistics work. It’s time to take a look at…

The Comparative Method of Linguistic Reconstruction

“Linguistic history is basically the darkest of the dark arts, the only means to conjure up the ghosts of vanished centuries.”

-Cola Minis, 1952

If we historical linguists had to go only by written records, we would be wading in shallow waters indeed: the oldest known written language, Sumerian, is only just about 5,000 years old.

The oldest joke we know of is in Sumerian. It’s a fart joke. Humanity never changes.

Wait, “only just”?? Well, consider that modern humans are at least 300,000 years old, and that some theories put the origins of language closer to a million years ago. You could fit the whole of history from the Sumerians to us 200 times in that and still have time to spare!

So, while writing is usually thought of as one of the oldest things we have, it is actually a pretty recent invention in the grand scheme of things. For centuries, it was just taken for granted that language just appeared out of nowhere a few millennia in the past, usually as a gift from some god or other: in Chinese mythology, the invention of language was attributed to an ancient god-king named Fuxi (approximately pronounced “foo-shee”), while in Europe it was pretty much considered obvious that ancient Hebrew was the first language of humankind, and that the proliferation of languages in the world was explained by the biblical story of the Tower of Babel.

Imagine your surprise when the guy who was supposed to pass you the trowel suddenly started speaking Vietnamese

This (and pretty much everything else) changed during the 18th century, with the dawn of the Age of Enlightenment. During this age of bold exploration (and less savoury things done to the people found in the newly “discovered” regions), scholars started to notice something curious: wholly different languages presented interesting similarities with one another and, crucially, could be grouped together based on these similarities. If all the different languages of Earth had truly been created out of nothing on the same day, you would not expect to see such patterns at all.

In what is widely considered to be the founding document of historical linguistics, Sir William Jones, an English scholar living in India in 1786, writes:

The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of the verbs and in the forms of the grammar, than could possibly have been produced by accident; so strong indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists […]”

That source is, of course, PIE. But, again, how can we guess what that language sounded like? People at the time were too busy herding sheep and domesticating horses to worry about paltry stuff like writing.

Enter Jacob Grimm[3] and his Danish colleague Rasmus Rask. They noticed that the similarities between their native German and Danish languages, and other close languages (what we call the Germanic family today), were not only evident, but predictable: if you know how a certain word sounds in one language, you can predict with a reasonable degree of accuracy how its equivalent (or cognate) sounds in another. But their truly revolutionary discovery was that if you carefully compared these changes, you could make an educated guess as to what the sounds and grammar of their common ancestor language were. That’s because the changes that happen to a language over time are mostly regular and predictable. Think how lucky that is! If sounds in a language changed on a random basis, we would have no way of even guessing what any language before Sumerian looked like!

More like HANDSOME and Gretel, amirite?

This was the birth of the comparative method of linguistic reconstruction (simply known as “the comparative method” to friends), the heart of historical linguistics and probably the linguistic equivalent of Newton’s laws of motion or Darwin’s theory of evolution when it comes to world-changing power.

Here, in brief, is how it works:

How the magic happens

So, do we just look at a couple of different languages and guess what their ancestor looked like? Well, it’s a bit more complicated than that. A lot more, in fact.

Not to rain on everyone’s parade before we even begin, but the comparative method is a long, difficult and extremely tedious process, which involves comparing thousands upon thousands of items and keeping reams of notes that would make the Burj Khalifa look like a molehill if stacked on top of each other.

The Burj Khalifa, for reference

What you need to do to reconstruct your very own proto-language is this:

  1. Take a sample of languages you’re reasonably sure are related, the larger the better. The more languages you have in your sample, the more accurate your reconstruction will be, since you might find out features which only a few languages (or even only one!) have retained, but which have disappeared in the others.
  2. Find out which sounds correspond to which in each language. If you do this with a Romance language and a Germanic one, you’ll find that Germanic “f” sounds pretty reliably correspond to Romance “p” sounds, for example (for instance, in the cognate couple padre and father). When you find a correspondance, it usually means that there is an ancestral sound underlying it.
  3. Reconstruct the ancestral sound. This is the trickiest part: there are a few rules which we linguists follow to get an accurate reconstruction. For example, if most languages in a sample have one sound rather than another, it’s more probable that that is the ancestral sound. Another criterion is that certain sound changes usually happen more frequently than others cross-linguistically (across many languages), and are therefore more probable . For example, /p/ becoming /f/ is far more likely than /f/ becoming /p/, for reasons I won’t get into here. That means that in our padre/father pair above, it’s more likely that “p” is the ancestral sound (and it is! The PIE root is *ph2tér[4]) Finally, between two proposed ancestral sounds, the one whose evolution requires the least number of steps is usually the more likely one.
  4. Check that your result is plausible. Is it in accordance with what is generally known about the phonetics and phonology of the language family you’re studying? Does it present some very bizarre or unlikely sounds or phonotactics? Be sure to account for all instances of borrowing, coincidences and scary German-named stuff like Sprachbunds[5]. If you’ve done all that, congratulations! You have an educated guess of what some proto-language might have sounded like! Now submit it to a few journals and see it taken down by three different people, together with your self-esteem.[6]But how do we know this process works? What if we’re just inventing a language which just so happens to look similar to all the languages we have in our sample, but which has nothing to do with what any hypothetical ancestor language of theirs would have looked like?

Well, the first linguists asked these very same questions, and did a simple experiment, which you can do at home yourself[7]: they took many of the modern Romance languages, pooled them together, and tried the method on them. The result was a very good approximation of Vulgar Latin.

Well, it works up to a certain point. See, while the comparative method is powerful, it has its limits. Notice how in the paragraph above I specified that it yielded a very good approximation of Vulgar Latin. You see, sometimes some features of a language get lost in all of its descendants, and there’s no way for us linguists to know they even existed! One example of this is the final consonant sounds in Classical Latin (for example, the -us and -um endings, as in “lupus” and “curriculum”), which were lost in all the modern Romance languages, and are therefore very difficult to reconstruct[8]. What this means is that the further back in time you go the less precise your guess becomes, until you’re at a level of guesswork so high it’s effectively indistinguishable from pulling random sounds out of a bag (i.e. utterly useless). That’s why, to our eternal disappointment, we can’t use the comparative method to go back indefinitely in the history of language[9].

When you use the comparative method, you must always keep in mind that what you end up with is not 100% mathematical truth, but just an approximation, sometimes a very crude one. That’s what all the asterisks are for: in historical linguistics, an asterisk before a word basically means that the word is reconstructed, and that it should therefore be taken with a pinch of salt[10].

The End

And so, now you know how we historical linguists work our spells of time travel and find out what the languages of bronze age people sounded like. It’s tedious work, and very frustrating, but the results are well worth the suffering and the toxic-level intake of caffeine necessary to carry it out. The beauty of all this is that it doesn’t only work with sounds: it has been applied to morphology as well, and in recent years we’ve finally been getting the knack of how to apply it to syntax as well! Isn’t that exciting?

It certainly is for us.

Stay tuned for next week, when we’ll dive into the law that started it all: Grimm’s law!

  1. P.S. Remember that Fun Etymology we did on the word “bear”? Yeah, “Beowulf” is another of those non-god-angering Germanic taboo names for bear! It literally means “bee-wolf”.
  2. Or even some big ones: we know very little about how Egyptian vowels were pronounced and where to put them in words, for example.
  3. Yes, the same guy who wrote the fairy tale books, together with his brother.
  4. I won’t explain the “h2” thing, because that opens a whole other can of worms we haven’t time to dive into here.
  5. We’ll talk about these in a future post.
  6. This doesn’t always happen. Usually.
  7. And it doesn’t involve any explosives or dangerous substances, only long, sleepless nights and the potential for soul-crushing boredom. Hooray!
  8. I don’t say “impossible”, because in some cases a sound lost in all descendant languages can be reconstructed thanks to its influence on neighbouring sounds, or (as in the case of Latin) by comparing with different branches of the family. But this is, like, super advanced über-linguistics.
  9. Which would instantly solve a lot of problems, believe me.
  10. Historical linguistics is an exception here. In most other fields of linguistics, the asterisk means “whatever follows is grammatically impossible”.

The Sapir-Whorf Hypothesis

 

“the Sapir-Whorf hypothesis is the theory that the language you speak determines how you think”

 

So says the fictive linguist Louise Banks (ably played by Amy Adams) in the sci-fi flick ‘Arrival’ (2016). The movie’s plot relies rather heavily on the Sapir-Whorf hypothesis, also known as the principle of linguistic relativity, so heavily in fact that the entire plot would be undone without it.

But what is the Sapir-Whorf hypothesis, really? Before digging into why ‘Arrival’ may have gotten it a bit… well, off, a word of caution: If you haven’t seen the movie (and intend to do so), go ahead and do that before reading the rest of this post because there will be SPOILERS!!!


Now that you have been duly warned, let’s get going.

The Sapir-Whorf hypothesis is, in a way, what Louise Banks describes: it is in part a hypothesis claiming that language determines the way you think. This idea is called linguistic determinism and is actually only one half of the Sapir-Whorf hypothesis.

Commonly known as the “strong” version of Sapir-Whorf, linguistic determinism holds that language limits and determines cognitive categories, thereby limiting our worldview to that which can be described in the words of whatever language we speak. Our worldview, and our way of thinking, is thus determined by our language.

That sounds pretty technical, so let’s use the example provided by ‘Arrival’:

The movie’s plot revolves around aliens coming to earth, speaking a language that is completely unknown to mankind. To try to figure out what they want, the movie linguist is called in. She manages to figure out their language pretty quickly (of course), realising that they think of time in a non-linear way.

This is quite a concept for a human to grasp since our idea of time is very linear. In western societies, we commonly think of time as a timeline going from left to right, as below.


 

Let’s say that we are currently at point C of our timeline. We can probably all agree that, as humans, we cannot go back in time to point A, right? However, in ‘Arrival’, we are given the impression that the reason we can’t do that is because our language doesn’t let us think about time in a non-linear way. That is, because our language doesn’t allow us, we can’t go back in time. Sounds a bit wonky, doesn’t it?

Well, you might be somewhat unsurprised to hear that this “strong” version has been discredited in linguistics for quite some time now and, for most modern-day linguists, it is a bit silly. Yet, we can’t claim that language doesn’t influence our way of thinking, can we?

Consider the many bi/multilinguals who has stated that they feel kinda like a different person when speaking their second language. If you’ve never met one, we bilinguals at the HLC agree that we could vouch for that fact.

Why would they feel that way, if language doesn’t affect our way of thinking? Well, of course, language does affect our way of thinking, it just doesn’t determine it. This is the ‘weak’ version of Sapir-Whorf, also known as linguistic relativism.

The weak version may be somewhat more palatable to you (and us): it holds that language influence our way of thinking but does not determine it. Think about it: if someone were to point out a rainbow to you and you had no word for the color red, you would still be able to perceive that that color was different from the others.

If someone were to discover a brand-new color (somewhat mind-boggling, I know, but just consider that), you would be able to explain that this is a color for which you have no word but you would still be able to see it just fine.

That might be the most clear distinction between linguistic determinism and linguistic relativism: the former would claim that you wouldn’t be able to perceive the color while the latter would say that you’ll see it just fine, you just don’t have a word for it.

So, while ‘Arrival’ was (at least in my opinion) a pleasant waste of time, when it comes to the linguistics of it, I’d just like to say:


(Oh, and on a side note, the name of the hypothesis (i.e. Sapir-Whorf), is actually quite misleading since Sapir and Whorf never did a collaborate effort to formalise the hypothesis)

Tune in for more linguistic stuff next week when the marvellous Rebekah will dive into the phonology of consonants (trust me, you have a treat coming)!