The Dark Arts: How We Know What We Know

If you’ve been following us at the HLC, and especially our Fun Etymologies every Tuesday, you will have noticed that we often reference old languages: the Old English of Beowulf[1], the Latin of Cicero and Seneca, the Ancient Greek of Homer, and in the future (spoiler alert!), even the Classical Chinese of Confucius, the Babylonian of Hammurabi, or the Egyptian of Ramses. These languages all have extensive written records, which allows us to know them pretty much as if they were still spoken today, with maybe a few little doubts here and there for the older ones[2].

Egyptians might have had a bit TOO great of  a passion for writing, if you catch my drift

But occasionally, you’ve seen us reference much, much older languages: one in particular stands out, and it’s called Proto-Indo-European (often shortened to PIE). If you’ve read our post on language families, you’re probably wearily familiar with it by now. However, here’s the problem: the language is 10,000 years old! And writing was invented “just” 5,000 years ago, nowhere near where PIE was spoken.So, you may be asking, how the heck do we know what that language looked like, or if it even existed at all? And what do all those asterisks (as in *ekwom or *wlna) I see on the Fun Etymologies each week mean? Well, buckle up, dear readers, because the HLC will finally reveal it all: the dark magic that makes Historical Linguistics work. It’s time to take a look at…

The Comparative Method of Linguistic Reconstruction

“Linguistic history is basically the darkest of the dark arts, the only means to conjure up the ghosts of vanished centuries.”

-Cola Minis, 1952

If we historical linguists had to go only by written records, we would be wading in shallow waters indeed: the oldest known written language, Sumerian, is only just about 5,000 years old.

The oldest joke we know of is in Sumerian. It’s a fart joke. Humanity never changes.

Wait, “only just”?? Well, consider that modern humans are at least 300,000 years old, and that some theories put the origins of language closer to a million years ago. You could fit the whole of history from the Sumerians to us 200 times in that and still have time to spare!

So, while writing is usually thought of as one of the oldest things we have, it is actually a pretty recent invention in the grand scheme of things. For centuries, it was just taken for granted that language just appeared out of nowhere a few millennia in the past, usually as a gift from some god or other: in Chinese mythology, the invention of language was attributed to an ancient god-king named Fuxi (approximately pronounced “foo-shee”), while in Europe it was pretty much considered obvious that ancient Hebrew was the first language of humankind, and that the proliferation of languages in the world was explained by the biblical story of the Tower of Babel.

Imagine your surprise when the guy who was supposed to pass you the trowel suddenly started speaking Vietnamese

This (and pretty much everything else) changed during the 18th century, with the dawn of the Age of Enlightenment. During this age of bold exploration (and less savoury things done to the people found in the newly “discovered” regions), scholars started to notice something curious: wholly different languages presented interesting similarities with one another and, crucially, could be grouped together based on these similarities. If all the different languages of Earth had truly been created out of nothing on the same day, you would not expect to see such patterns at all.

In what is widely considered to be the founding document of historical linguistics, Sir William Jones, an English scholar living in India in 1786, writes:

The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of the verbs and in the forms of the grammar, than could possibly have been produced by accident; so strong indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists […]”

That source is, of course, PIE. But, again, how can we guess what that language sounded like? People at the time were too busy herding sheep and domesticating horses to worry about paltry stuff like writing.

Enter Jacob Grimm[3] and his Danish colleague Rasmus Rask. They noticed that the similarities between their native German and Danish languages, and other close languages (what we call the Germanic family today), were not only evident, but predictable: if you know how a certain word sounds in one language, you can predict with a reasonable degree of accuracy how its equivalent (or cognate) sounds in another. But their truly revolutionary discovery was that if you carefully compared these changes, you could make an educated guess as to what the sounds and grammar of their common ancestor language were. That’s because the changes that happen to a language over time are mostly regular and predictable. Think how lucky that is! If sounds in a language changed on a random basis, we would have no way of even guessing what any language before Sumerian looked like!

More like HANDSOME and Gretel, amirite?

This was the birth of the comparative method of linguistic reconstruction (simply known as “the comparative method” to friends), the heart of historical linguistics and probably the linguistic equivalent of Newton’s laws of motion or Darwin’s theory of evolution when it comes to world-changing power.

Here, in brief, is how it works:

How the magic happens

So, do we just look at a couple of different languages and guess what their ancestor looked like? Well, it’s a bit more complicated than that. A lot more, in fact.

Not to rain on everyone’s parade before we even begin, but the comparative method is a long, difficult and extremely tedious process, which involves comparing thousands upon thousands of items and keeping reams of notes that would make the Burj Khalifa look like a molehill if stacked on top of each other.

The Burj Khalifa, for reference

What you need to do to reconstruct your very own proto-language is this:

  1. Take a sample of languages you’re reasonably sure are related, the larger the better. The more languages you have in your sample, the more accurate your reconstruction will be, since you might find out features which only a few languages (or even only one!) have retained, but which have disappeared in the others.
  2. Find out which sounds correspond to which in each language. If you do this with a Romance language and a Germanic one, you’ll find that Germanic “f” sounds pretty reliably correspond to Romance “p” sounds, for example (for instance, in the cognate couple padre and father). When you find a correspondance, it usually means that there is an ancestral sound underlying it.
  3. Reconstruct the ancestral sound. This is the trickiest part: there are a few rules which we linguists follow to get an accurate reconstruction. For example, if most languages in a sample have one sound rather than another, it’s more probable that that is the ancestral sound. Another criterion is that certain sound changes usually happen more frequently than others cross-linguistically (across many languages), and are therefore more probable . For example, /p/ becoming /f/ is far more likely than /f/ becoming /p/, for reasons I won’t get into here. That means that in our padre/father pair above, it’s more likely that “p” is the ancestral sound (and it is! The PIE root is *ph2tér[4]) Finally, between two proposed ancestral sounds, the one whose evolution requires the least number of steps is usually the more likely one.
  4. Check that your result is plausible. Is it in accordance with what is generally known about the phonetics and phonology of the language family you’re studying? Does it present some very bizarre or unlikely sounds or phonotactics? Be sure to account for all instances of borrowing, coincidences and scary German-named stuff like Sprachbunds[5]. If you’ve done all that, congratulations! You have an educated guess of what some proto-language might have sounded like! Now submit it to a few journals and see it taken down by three different people, together with your self-esteem.[6]But how do we know this process works? What if we’re just inventing a language which just so happens to look similar to all the languages we have in our sample, but which has nothing to do with what any hypothetical ancestor language of theirs would have looked like?

Well, the first linguists asked these very same questions, and did a simple experiment, which you can do at home yourself[7]: they took many of the modern Romance languages, pooled them together, and tried the method on them. The result was a very good approximation of Vulgar Latin.

Well, it works up to a certain point. See, while the comparative method is powerful, it has its limits. Notice how in the paragraph above I specified that it yielded a very good approximation of Vulgar Latin. You see, sometimes some features of a language get lost in all of its descendants, and there’s no way for us linguists to know they even existed! One example of this is the final consonant sounds in Classical Latin (for example, the -us and -um endings, as in “lupus” and “curriculum”), which were lost in all the modern Romance languages, and are therefore very difficult to reconstruct[8]. What this means is that the further back in time you go the less precise your guess becomes, until you’re at a level of guesswork so high it’s effectively indistinguishable from pulling random sounds out of a bag (i.e. utterly useless). That’s why, to our eternal disappointment, we can’t use the comparative method to go back indefinitely in the history of language[9].

When you use the comparative method, you must always keep in mind that what you end up with is not 100% mathematical truth, but just an approximation, sometimes a very crude one. That’s what all the asterisks are for: in historical linguistics, an asterisk before a word basically means that the word is reconstructed, and that it should therefore be taken with a pinch of salt[10].

The End

And so, now you know how we historical linguists work our spells of time travel and find out what the languages of bronze age people sounded like. It’s tedious work, and very frustrating, but the results are well worth the suffering and the toxic-level intake of caffeine necessary to carry it out. The beauty of all this is that it doesn’t only work with sounds: it has been applied to morphology as well, and in recent years we’ve finally been getting the knack of how to apply it to syntax as well! Isn’t that exciting?

It certainly is for us.

Stay tuned for next week, when we’ll dive into the law that started it all: Grimm’s law!

  1. P.S. Remember that Fun Etymology we did on the word “bear”? Yeah, “Beowulf” is another of those non-god-angering Germanic taboo names for bear! It literally means “bee-wolf”.
  2. Or even some big ones: we know very little about how Egyptian vowels were pronounced and where to put them in words, for example.
  3. Yes, the same guy who wrote the fairy tale books, together with his brother.
  4. I won’t explain the “h2” thing, because that opens a whole other can of worms we haven’t time to dive into here.
  5. We’ll talk about these in a future post.
  6. This doesn’t always happen. Usually.
  7. And it doesn’t involve any explosives or dangerous substances, only long, sleepless nights and the potential for soul-crushing boredom. Hooray!
  8. I don’t say “impossible”, because in some cases a sound lost in all descendant languages can be reconstructed thanks to its influence on neighbouring sounds, or (as in the case of Latin) by comparing with different branches of the family. But this is, like, super advanced über-linguistics.
  9. Which would instantly solve a lot of problems, believe me.
  10. Historical linguistics is an exception here. In most other fields of linguistics, the asterisk means “whatever follows is grammatically impossible”.

That’s just bad English!

Hi there!

If you’ve read my mini-series about Scots (here are parts 1 and 2) you are probably more aware of this particular language, its history and its complicated present-day status than before. With these facts in mind, wouldn’t you find it un-intuitive to think of Scots as “Bad English”? In this post, I want to, in a rather bohemian way, explore the problematic idea of Bad English. That is, I want to challenge the often constraining idea of what is correct and what is deviating; once again, we will see that this has very much to do with politics and power1.

We have seen that Scots clearly has a distinct history and development, and that it once was a fully-functioning language used for all purposes – it was, arguably, an autonomous variety. However, during the anglicisation of Scots (read more about it here) English became a prestigious variety associated with power and status, and thus became the target language to which many adapted Scots. This led to a shift in the general perception of Scots’ autonomy, and today many are more likely to perceive Scots as a dialect of English – that is, perceive Scots as heteronomous to English. This means that instead of viewing Scots features, such as the ones presented in my last post, as proper language features, many would see them as (at best) quirky features or (at worst) bastardisations of English2.

As an example of how shifting heteronomy can be, back in the days when the south of (present-day) Sweden belonged to Denmark, the Scanian dialect was considered a dialect of Danish. When Scania (Skåne) became part of Sweden, it took less than 100 years for this dialect to become referred to as a dialect of Swedish in documents from the time. It’s quite unlikely that Scanian changed much in itself during that time. Rather, what had changed was which language had power over it. That is, which language it was perceived as targeting.

When we really get into it, determining what is Bad English gets more and more blurry, just like what I demonstrated for the distinction between language and dialect way back. There are  several dialectal features which are technically “ungrammatical” but used so categorically in some dialects that calling them Bad English just doesn’t sit right. One such example is the use of was instead of were in, for example, Yorkshire: “You was there when it happened”. What we can establish is that Bad English is usually whatever diverts from (the current version of) Standard English, and this brings us to how such a standard is defined – more on this in a future post.

Scots is, unsurprisingly, not the only variety affected by the idea of Bad English. As Sabina recently taught us, a creole is the result of a pidgin (i.e. a mix of two or more languages to ease communication between speakers) gaining native speakers3. This means that a child can be born with a creole as their first language. Further to this, creoles, just like older languages, tend to have distinct grammatical rules and vocabularies. Despite this, many will describe for example Jamaican Creole as “broken English” – I’m sure this is not unfamiliar to anyone reading. This can again be explained by power and prestige: English, being the language of colonisers, was the prestigious target, just like it became for Scots during the anglicisation, and so these creoles have a hard time losing the image of being heteronomous to English even long after the nations where they are spoken have gained independence.

In the United States, there is a lect which linguists call African-American Vernacular English (AAVE), sometimes called Ebonics. As the name suggests, it is mainly spoken by African-Americans, and most of us would be able to recognise it from various American media. This variety is another which is often misunderstood as Bad English, when in fact it carries many similarities to a creole: during the slave trade era, many of the slaves arriving in America would have had different first languages, and likely developed a pidgin to communicate both amongst themselves and with their masters. From there, we can assume that an early version of AAVE would have developed as a creole which is largely based on English vocabulary. In fact, AAVE shares grammatical features with other English-based creoles, such as using be instead of are (as in “these bitches be crazy”, to use a offensively stereotypical expression). If the AAVE speakers were not living in an English-speaking nation, maybe their variety would have continued to develop as an independent creole like those in, for example, the Caribbean nations?

Besides, what is considered standard in a language often change over time. A feature which is often used to represent “dumb” speech is double negation: “I didn’t do nothing!”. The prescriptivist smartass would smirk at such expressions and say that two negations cancel each other out, and using double negations is widely considered Bad English4. However, did you know that using double negation was for a long time the standard way of expressing negation in English? It was actually used by the upper classes until it reached commoner speech, and thus became less prestigious5. This is another example of how language change also affects our perception of what is right and proper – and as Sabina showed us a while ago, language changes will often be met with scepticism and prescriptivist backlash.

What the examples I’ve presented show us is that less prestigious varieties are not necessarily in the wrong, just because they deviate from a standard that they don’t necessarily “belong to” anyway. It can also be argued that, in many cases, classing a variety as a “bad” version of the language in power is just another way of maintaining a superiority over the people who speak that variety. The perception of heteronomy can be a crutch even for linguists when studying particular varieties; this may be a reason why Scots grammar is relatively under-researched still. When we shake off these very deep-rooted ideas, we may find interesting patterns and developments in varieties which can tell us even more about our history, and language development at large. Hopefully, this post will have created some more language bohemians out there, and more tolerance for Bad English.


1While this post focuses on English, this can be applied to many prestigious languages and in particular those involved in colonisation or invasions (e.g. French, Dutch, Spanish, Arabic, etc.)

2Within Scots itself there are also ideas of what is “good” and what is “bad”: Urban Glaswegian speech is an example of what some would call ‘bad Scots’. Prestige is a factor here too – is not surprising that it’s the speech of the lower classes that receive the “bad” stamp.

3 Not all creoles are English-based, of course. Here is a list of some of the more known creoles and where they derive from.

4There are other languages which do fine with double negation as their standard, without causing any meaning issues – most of you may be familiar with French ne…pas.

5Credit goes to Sabina for providing this example!

So you’re a linguist…

“…how many languages do you speak?”

Every linguist on the planet knows and dreads this question, known simply as The Question™. The fact that it’s the first question most people ask when hearing of a linguist’s occupation certainly doesn’t help.

Right now you’re probably thinking “Give me a break, Riccardo. It’s quite a natural question to ask when you learn someone works with languages, isn’t it?”

Well, yes. Yes it is a very natural question. The problem is that it springs from a very common misunderstanding of a linguist’s job, and, to make things worse, it’s one of the most difficult questions to answer for a linguist.

Let me explain in a bit more detail what I mean.

Dammit Jim, I’m a linguist, not a linguist!

One of the reasons The Question™ is so popular amongst laypeople is semantic ambiguity. To our eternal annoyance as academic linguists, the word “linguist” has two different meanings in the English language. The meaning we use on this blog, and the one most people who call themselves “linguists” intend, is “a person engaged in the academic study of human language”. As you’ve probably gathered if you read our blog, this doesn’t necessarily involve the study of any particular language: while there are many linguists which specialise in one language only, many (perhaps even most) specialise in linguistic branches or whole families, and some specialise in particular fields of linguistics, like phonetics or semantics, and work with multiple completely unrelated languages.

Crucially, the job of an academic linguist doesn’t involve learning any of the languages we study, a point which I’ll talk about in more detail in the next section.

Unfortunately, this first meaning of the word “linguist” is not the one the public knows best. Not by a long shot.

The second meaning of “linguist” comes from military jargon, and it’s the one most familiar to laypeople due to its being spread far and wide by films, TV series, books and other popular entertainment media. In the military, a “linguist” is the person tasked with learning the language of the locals during a foreign campaign, with the goal of helping his fellow soldiers interact with them. In short, they’re what in any other field would be called an interpreter. Why the military had to go and rain on our lovely linguistic parade by stealing our name instead of using the proper name for what they do is a mystery, but they’re probably snickering about it as we speak. Regrettably, due to the greater popularity of films and stories set in a military/combative milieu, as opposed to the far superior and more engaging world of academics, with its nail-biting, edge-of-your-seat deadlines and paper-writing all-nighters, the second meaning of the word “linguist” has been cemented in the popular imagination as the primary one, and the rest is history.

It certainly doesn’t help that Hollywood likes to portray their “linguists” as knowing every single language they come into contact with, which has gone a long way towards making The Question™ as popular as it is.

Knowledge is relative, and numbers even more so

If our problem with The Question™ were only a matter of misunderstanding of our job description, it would be no big deal. We’d just list out all the languages we speak and then explain what a linguist actually is to whoever is asking. Problem is, while for a wuggle (non-linguist) listing the languages they know is an easy task, for a linguist it’s absurdly difficult. If you’ve ever exposed a linguist to The Question™, you’ve probably already seen the symptoms of ALLA (Acute Language Listing Anxiety): panicking, profuse sweating, stammering, making of excuses, epistemological asides (“Well, it depends on what you mean by know…”), and existential dread about the possibility of The Followup™ (“So you speak X? Say something in X!”).

What is the reason for this affliction? Well, it all comes down to what I said in the previous section: a linguist might very well study a language, but they are by no means expected to speak it. This gives rise to the apparent paradox of a linguist knowing the grammar of some language extremely well, while not being able to have anything more than the most basic of conversations in it, if even that. Some linguists manage to muscle through the pragmatics of The Question™ and only list the languages they speak fluently (which is what most people are asking, really), but many get stumped by it, because what a linguist means by “knowing a language” is very different from what a wuggle intends.

For example, by a linguist’s conception of “knowing”, I could be said to “know” a couple dozen languages. But before you go all wide-eyed with awe at my intellectual might, know that of those couple dozen I can be said to really speak only five or six. And of those five or six, I’m only really fluent in two, with a decent degree of fluency in a third. To make matters even worse, even the meaning of speaking is vague for a linguist: does “speaking” a language mean I can hold my own in basic conversation, or does it mean I can read a newspaper? Or a novel? Or a treatise on quantum physics?

You see, from a linguistic point of view, “speaking” a language isn’t a binary question: fluency is a spectrum. I can order stuff in a restaurant in German and read some basic texts, but I would never be able to read a novel in it. Do I speak German? I’ve translated an entire comic from Finnish to English for fun with the help of a dictionary, but I wouldn’t be able to talk to a Finnish person in Finnish to save my life. Do I speak Finnish? As you can see, it’s extremely difficult for a linguist to accurately gauge what “speaking” or “knowing” a language actually entails, which is why it takes them an impressively long time to come up with a list, to the puzzlement of wuggles who could list the languages they speak in a heartbeat.

Conversation tactics for wuggles

So, what should you ask a linguist upon meeting them? Well, the safest question is probably a simple “what do you do?”

Us linguists, like most academics, like explaining our jobs very much, and we’d be very happy to have the opportunity to geek out about what we study with an interested person.

Be sure to know when to stop us, though, unless you want to be regaled with a half-hour lecture on the pragmatics of Mixtecan questions.

You’ve been warned.