Do you do ‘do’, or don’t you?

I’m sure you haven’t missed that Sabina recently started a series about the early Germanic languages on this blog? The series will continue in a couple of weeks (you can read the latest post here), but as a short recap: when we talk about the modern Germanic languages, these include English (and Scots), Dutch (and Flemish), German, Icelandic, Faroese, and the mainland Scandinavian languages (Swedish, Norwegian, and Danish). These languages, of course, also have a plethora of dialectal variation under their belts1. Today, I’m gonna tell you about one particular grammatical feature that we find in only a couple of Germanic languages. You see, when it comes to the grammar of the modern Germanic languages, they’re all relatively similar, but one quirky trait sets the ones spoken on the British Isles apart from the rest: do-support.

Before we begin, I want to clarify my terminology: Do-support is a feature of syntax, which means that it’s to do with word order and agreement. The syntax concerns itself with what is grammatical in a descriptive way, not what we prefer in a prescriptive way2. So, when I say something is (un-)grammatical in this post, I mean that it is (dis-)allowed in the syntax.

So what is do-support?

Take a simple sentence like ‘I like cheese’. If a speaker of a non-English (or Scots) Germanic language were to turn that sentence into a question, it would look something like ‘Like you cheese?’, and in most Germanic varieties a (clearly deranged) person who is not fond of cheese would answer this with ‘No, I like not cheese’. In their frustration, the person who asked may shout ‘Eat not cheese then!’ at the deranged person.

But, those sentences look weird in English, both the question and the negative sentence. The weirdness does not only arise from the meaning of these sentence (who doesn’t like cheese?), but they’re, in fact, ungrammatical!

English, and most Scots dialects, require do-support in such sentences:

  • Do you like cheese?
  • No, I do not (or, don’t) like cheese.
  • Don’t eat cheese then!’

The above examples of do-support, interrogative (the question), negative declarative (the negated sentence), and negative imperative (the command) are unique to English and Scots, but there are other environments where do is used, and where we also may find it in other Germanic languages, such as:

  • Tag-questions: ‘You like cheese, don’t you/do you?’
  • Ellipsis: ‘I ate cheese yesterday, and Theo did (so) today’
  • Emphasis: ‘I do like cheese!’
  • Main verb use: ‘I did/am doing a school project on do-support

In all the examples above except for the emphasis and main verb usage, do is essentially meaningless; it doesn’t add any meaningful (semantic) information to the sentence. Therefore, we usually call it a “dummy” auxiliary, or simply dummy do.
(Auxiliary is the name for those little verbs, like do, is, and have, which come before other verbs in a sentence, such as in ‘she is eating cheese’ and ‘I have eaten cheese’)

English and Scots didn’t always have do-support, and sentences like ‘I like not cheese’ used to be completely grammatical. We start to see do-support appearing in English around the 15th century, and in the 16th century for Scots. As is the case with language change, do-support didn’t become the mandatory construction overnight; in both languages we see a period where sentences with and without do-support are used variably which lasts for centuries before do-support eventually wins out (in the 18th-19th century).

Interestingly, in this period of change we also see do-support in non-negated sentences which aren’t intended to be emphatic, looking like: ‘I do like cheese’. These constructions never fully catch on though, and the rise and fall of this affirmative declarative do has been called a “failed change”.

It’s ok, affirmative declarative do, you’ve still contributed greatly to do-support research!

Why did we start using do-support, though?

Well, we aren’t exactly sure yet, but there are theories. Many scholars believe that this is a so-called language-internal development, meaning that this feature developed in English without influence from another language. This is based on that do used to be a causative verb in English (like cause, and make in ‘I made Theo eat cheese’), which became used so frequently that it started to lose its causative meaning and finally became a dummy auxiliary. This process, where a word gradually loses its meaning and gains a purely grammatical function, is called grammaticalisation.

There have also been suggestions that it was contact with Welsh that introduced do-support into English, since Welsh had a similar structure. This account is often met with scepticism, one reason being that we see very little influence from any celtic language, Welsh included, on English and Scots grammar in general. However, new evidence is regularly brought forward to argue this account, and the origin of do-support is by no means a closed chapter in historical linguistics research.

What we do know is that do-support came about in the same time period when English started to use auxiliaries more overall – you may have noticed that, in English, we’re more likely to say ‘I am running to the shop’ than ‘I run to the shop’, the latter being more common for other Germanic languages. So, we can at least fairly safely say that the rise of do-support was part of a greater change of an increased use of auxiliaries overall.

The humble dummy do has baffled historical linguists for generations, and this particular HLC writer has been trying to understand do-support in English and Scots for the past few years, and will most likely continue to do so for a good while longer. Wish me luck!

Footnotes

1I’ve written about the complex matter of language vs. dialect before, here.

2In our very first post on this blog, Riccardo wrote about descriptivism and prescriptivism. Read it here for a recap!

The Dark Arts: How We Know What We Know

If you’ve been following us at the HLC, and especially our Fun Etymologies every Tuesday, you will have noticed that we often reference old languages: the Old English of Beowulf[1], the Latin of Cicero and Seneca, the Ancient Greek of Homer, and in the future (spoiler alert!), even the Classical Chinese of Confucius, the Babylonian of Hammurabi, or the Egyptian of Ramses. These languages all have extensive written records, which allows us to know them pretty much as if they were still spoken today, with maybe a few little doubts here and there for the older ones[2].

Egyptians might have had a bit TOO great of  a passion for writing, if you catch my drift

But occasionally, you’ve seen us reference much, much older languages: one in particular stands out, and it’s called Proto-Indo-European (often shortened to PIE). If you’ve read our post on language families, you’re probably wearily familiar with it by now. However, here’s the problem: the language is 10,000 years old! And writing was invented “just” 5,000 years ago, nowhere near where PIE was spoken.So, you may be asking, how the heck do we know what that language looked like, or if it even existed at all? And what do all those asterisks (as in *ekwom or *wlna) I see on the Fun Etymologies each week mean? Well, buckle up, dear readers, because the HLC will finally reveal it all: the dark magic that makes Historical Linguistics work. It’s time to take a look at…

The Comparative Method of Linguistic Reconstruction

“Linguistic history is basically the darkest of the dark arts, the only means to conjure up the ghosts of vanished centuries.”

-Cola Minis, 1952

If we historical linguists had to go only by written records, we would be wading in shallow waters indeed: the oldest known written language, Sumerian, is only just about 5,000 years old.

The oldest joke we know of is in Sumerian. It’s a fart joke. Humanity never changes.

Wait, “only just”?? Well, consider that modern humans are at least 300,000 years old, and that some theories put the origins of language closer to a million years ago. You could fit the whole of history from the Sumerians to us 200 times in that and still have time to spare!

So, while writing is usually thought of as one of the oldest things we have, it is actually a pretty recent invention in the grand scheme of things. For centuries, it was just taken for granted that language just appeared out of nowhere a few millennia in the past, usually as a gift from some god or other: in Chinese mythology, the invention of language was attributed to an ancient god-king named Fuxi (approximately pronounced “foo-shee”), while in Europe it was pretty much considered obvious that ancient Hebrew was the first language of humankind, and that the proliferation of languages in the world was explained by the biblical story of the Tower of Babel.

Imagine your surprise when the guy who was supposed to pass you the trowel suddenly started speaking Vietnamese

This (and pretty much everything else) changed during the 18th century, with the dawn of the Age of Enlightenment. During this age of bold exploration (and less savoury things done to the people found in the newly “discovered” regions), scholars started to notice something curious: wholly different languages presented interesting similarities with one another and, crucially, could be grouped together based on these similarities. If all the different languages of Earth had truly been created out of nothing on the same day, you would not expect to see such patterns at all.

In what is widely considered to be the founding document of historical linguistics, Sir William Jones, an English scholar living in India in 1786, writes:

The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of the verbs and in the forms of the grammar, than could possibly have been produced by accident; so strong indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists […]”

That source is, of course, PIE. But, again, how can we guess what that language sounded like? People at the time were too busy herding sheep and domesticating horses to worry about paltry stuff like writing.

Enter Jacob Grimm[3] and his Danish colleague Rasmus Rask. They noticed that the similarities between their native German and Danish languages, and other close languages (what we call the Germanic family today), were not only evident, but predictable: if you know how a certain word sounds in one language, you can predict with a reasonable degree of accuracy how its equivalent (or cognate) sounds in another. But their truly revolutionary discovery was that if you carefully compared these changes, you could make an educated guess as to what the sounds and grammar of their common ancestor language were. That’s because the changes that happen to a language over time are mostly regular and predictable. Think how lucky that is! If sounds in a language changed on a random basis, we would have no way of even guessing what any language before Sumerian looked like!

More like HANDSOME and Gretel, amirite?

This was the birth of the comparative method of linguistic reconstruction (simply known as “the comparative method” to friends), the heart of historical linguistics and probably the linguistic equivalent of Newton’s laws of motion or Darwin’s theory of evolution when it comes to world-changing power.

Here, in brief, is how it works:

How the magic happens

So, do we just look at a couple of different languages and guess what their ancestor looked like? Well, it’s a bit more complicated than that. A lot more, in fact.

Not to rain on everyone’s parade before we even begin, but the comparative method is a long, difficult and extremely tedious process, which involves comparing thousands upon thousands of items and keeping reams of notes that would make the Burj Khalifa look like a molehill if stacked on top of each other.

The Burj Khalifa, for reference

What you need to do to reconstruct your very own proto-language is this:

  1. Take a sample of languages you’re reasonably sure are related, the larger the better. The more languages you have in your sample, the more accurate your reconstruction will be, since you might find out features which only a few languages (or even only one!) have retained, but which have disappeared in the others.
  2. Find out which sounds correspond to which in each language. If you do this with a Romance language and a Germanic one, you’ll find that Germanic “f” sounds pretty reliably correspond to Romance “p” sounds, for example (for instance, in the cognate couple padre and father). When you find a correspondance, it usually means that there is an ancestral sound underlying it.
  3. Reconstruct the ancestral sound. This is the trickiest part: there are a few rules which we linguists follow to get an accurate reconstruction. For example, if most languages in a sample have one sound rather than another, it’s more probable that that is the ancestral sound. Another criterion is that certain sound changes usually happen more frequently than others cross-linguistically (across many languages), and are therefore more probable . For example, /p/ becoming /f/ is far more likely than /f/ becoming /p/, for reasons I won’t get into here. That means that in our padre/father pair above, it’s more likely that “p” is the ancestral sound (and it is! The PIE root is *ph2tér[4]) Finally, between two proposed ancestral sounds, the one whose evolution requires the least number of steps is usually the more likely one.
  4. Check that your result is plausible. Is it in accordance with what is generally known about the phonetics and phonology of the language family you’re studying? Does it present some very bizarre or unlikely sounds or phonotactics? Be sure to account for all instances of borrowing, coincidences and scary German-named stuff like Sprachbunds[5]. If you’ve done all that, congratulations! You have an educated guess of what some proto-language might have sounded like! Now submit it to a few journals and see it taken down by three different people, together with your self-esteem.[6]But how do we know this process works? What if we’re just inventing a language which just so happens to look similar to all the languages we have in our sample, but which has nothing to do with what any hypothetical ancestor language of theirs would have looked like?

Well, the first linguists asked these very same questions, and did a simple experiment, which you can do at home yourself[7]: they took many of the modern Romance languages, pooled them together, and tried the method on them. The result was a very good approximation of Vulgar Latin.

Well, it works up to a certain point. See, while the comparative method is powerful, it has its limits. Notice how in the paragraph above I specified that it yielded a very good approximation of Vulgar Latin. You see, sometimes some features of a language get lost in all of its descendants, and there’s no way for us linguists to know they even existed! One example of this is the final consonant sounds in Classical Latin (for example, the -us and -um endings, as in “lupus” and “curriculum”), which were lost in all the modern Romance languages, and are therefore very difficult to reconstruct[8]. What this means is that the further back in time you go the less precise your guess becomes, until you’re at a level of guesswork so high it’s effectively indistinguishable from pulling random sounds out of a bag (i.e. utterly useless). That’s why, to our eternal disappointment, we can’t use the comparative method to go back indefinitely in the history of language[9].

When you use the comparative method, you must always keep in mind that what you end up with is not 100% mathematical truth, but just an approximation, sometimes a very crude one. That’s what all the asterisks are for: in historical linguistics, an asterisk before a word basically means that the word is reconstructed, and that it should therefore be taken with a pinch of salt[10].

The End

And so, now you know how we historical linguists work our spells of time travel and find out what the languages of bronze age people sounded like. It’s tedious work, and very frustrating, but the results are well worth the suffering and the toxic-level intake of caffeine necessary to carry it out. The beauty of all this is that it doesn’t only work with sounds: it has been applied to morphology as well, and in recent years we’ve finally been getting the knack of how to apply it to syntax as well! Isn’t that exciting?

It certainly is for us.

Stay tuned for next week, when we’ll dive into the law that started it all: Grimm’s law!

  1. P.S. Remember that Fun Etymology we did on the word “bear”? Yeah, “Beowulf” is another of those non-god-angering Germanic taboo names for bear! It literally means “bee-wolf”.
  2. Or even some big ones: we know very little about how Egyptian vowels were pronounced and where to put them in words, for example.
  3. Yes, the same guy who wrote the fairy tale books, together with his brother.
  4. I won’t explain the “h2” thing, because that opens a whole other can of worms we haven’t time to dive into here.
  5. We’ll talk about these in a future post.
  6. This doesn’t always happen. Usually.
  7. And it doesn’t involve any explosives or dangerous substances, only long, sleepless nights and the potential for soul-crushing boredom. Hooray!
  8. I don’t say “impossible”, because in some cases a sound lost in all descendant languages can be reconstructed thanks to its influence on neighbouring sounds, or (as in the case of Latin) by comparing with different branches of the family. But this is, like, super advanced über-linguistics.
  9. Which would instantly solve a lot of problems, believe me.
  10. Historical linguistics is an exception here. In most other fields of linguistics, the asterisk means “whatever follows is grammatically impossible”.