Thursday, January 04, 2018

Taleb unintentionally proves Lebanese comes from Arabic

So Taleb has jumped back on his hobbyhorse with yet another post on Lebanese not being Arabic; see my previous posts Why "Levantine" is Arabic, not Aramaic: Part 1, Part 2, Part 3, Zombie hypotheses and the Zeitgeist, On finding the sources of shared items. The funniest thing about this one is that he's been helpful enough to provide a wordlist (for his dialect, I presume) that - despite a number of typos, almost all of which increase the apparent similarity between Levantine and non-Arabic Semitic languages - should be enough all by itself to prove to anyone in doubt that Lebanese is clearly descended primarily from Arabic, with very little Aramaic influence and even less from Canaanite/Phoenician. Unfortunately, he wasn't as helpful on the grammar, not bothering to include equivalents from other Semitic languages for the pronouns and verbal conjugations...
But I don't have all day to spend beating this dead horse, and doing etymology properly takes time. So let's just have a quick look at the first page of his wordlist (well, probably the second one - the real first one seems to be missing), and leave the other pages as an exercise for the reader.

Out of these 39 words, 18 seem to be unambiguously Arabic in origin - either they share specific sound changes with Arabic to the exclusion of the rest of Semitic, or they use a root not used in the appropriate meaning elsewhere in Semitic. Only two look like being Aramaic rather than Arabic in origin (and the evidence in both cases is fairly weak): "hand" and the patently non-basic vocabulary word "image". (Taleb would add a third, zalame "man", but this word has an at least equally plausible Arabic etymology, making it ambiguous at best.) The remaining 19 words are ambiguous, and could in principle derive from any of more than one Semitic languages - but even there, the situation is not symmetrical; all 19 could derive from Arabic, whereas no more than 11 of them could derive from Aramaic. The unambiguous cases give the following ratio: 18 Arabic : 2 Aramaic : 0 everything else. On that basis, we should therefore expect 90% of the ones ambiguous between Arabic and Aramaic (ie all but one) to derive from Arabic, not from Aramaic, and all of the ones ambiguous between Arabic and another Semitic language but not Aramaic to derive from Arabic. For details, see the following table:

1 goat Arabic does not share Canaanite+Aramaic+Ugaritic *nC > CC; does not share Akkadian *ʕa > e
2 god Arabic / Aramaic shows innovative gemination of the l, attested only in Arabic and some dialects of Syriac
3 good innovative the Arabic etymology is obvious, but the root is pan-Semitic so we may generously assume that it could in principle have derived from some other branch
4 grass Arabic does not share Aramaic and Phoenician *ś > s ; does share Arabic *ś > š
5 grind Arabic / Canaanite does not share Akkadian *aħa > ê ; does not share Aramaic CaCVC > CCVC
6 hair Arabic / Ugaritic does not share Aramaic and Phoenician *ś > s ; does share Arabic *ś > š ; does not share Akkadian loss of *ʕ
7 hand Aramaic although a change of *yad > *īd is natural enough that it could easily have happened independently in Arabic...
8 hare Arabic / Canaanite / Aramaic / Akkadian no distinctive innovations
9 he-goat Arabic / Canaanite / Aramaic no distinctive innovations
10 head Arabic / Ugaritic does not share Canaanite *aʔ > *ā > ō nor Aramaic *aʔ > ī nor Akkadian *aʔ > ē ; the form rās (with loss of the glottal stop) is well-attested in early Arabic dialects
11 hear Arabic does not share Aramaic and Phoenician *s > š (I'm going with Huehnergard's reconstruction of proto-Semitic sibilants here). Note that the correct Syriac form is šmaʕ, not sma3 ; likewise the Hebrew
12 heart Arabic The initial glottal stop (still pronounced q in, for example, Alawite dialects) can only be explained from the Arabic form, which is a lexical innovation replacing original *libb
13 honey Arabic 3asal is clearly Arabic, and – as I've pointed out before – dabs is attested in Classical Arabic as well as in Hebrew and Aramaic
14 horn Arabic / Canaanite / Aramaic / Akkadian / Ugaritic no distinctive innovations
15 horse Arabic Syriac ḥsan 'strong' has s, not ṣ, but even if it were cognate, the Classical Arabic and Levantine form still share a semantic shift unattested in Aramaic
16 house Arabic / Canaanite / Aramaic / Ugaritic Akkadian can be ruled out, since it shows a shift *ay > ī which never happened in Levantine.
17 hundred Arabic / Canaanite / Aramaic / Akkadian / Ugaritic The only innovation here, ʔ > y, is not shared with any of the ancient language in question
18 hunger Arabic Even assuming jūʕ has cognates elsewhere in Semitic, the change g > j is specific to Arabic
19 hunt Arabic / Canaanite / Aramaic / Akkadian / Ugaritic The only innovation here, use of the D-stem, is not shared with any of the ancient languages
20 image Aramaic Since when is 'image' basic vocabulary? But yes, assuming we can trust the transcription, it shares the aw with Aramaic
21 inside Arabic / Aramaic Mixed signal here: the meaning looks like Aramaic, but the sound shift g > j is Arabic not Aramaic. In reality, the word *jaww must originally have meant 'inside' in Arabic too; it lost this meaning in Classical Arabic, but kept it in many of the dialects
22 iron Arabic
23 kidney Arabic / Canaanite / Aramaic / Akkadian / Ugaritic The only innovation here, *y > w, is not shared with any of the ancient languages (but _is_ shared with many other modern Arabic dialects...)
24 kill Arabic / Canaanite Does not share Aramaic CaCVC > CCVC
25 king Arabic / Canaanite / Aramaic / Ugaritic Since when is 'king' basic vocabulary?
26 knee Arabic Shares a unique innovation with Arabic – the metathesis brk > rkb
27 know Arabic
28 laugh Arabic Shares a unique innovation with Arabic – the sound shift *ɬ' > ḍ (which came relatively late in Arabic – later than Sibawayh, even – and never happened in any other Semitic language). I can't speak for Amioun, but in general Levantine has ḍaḥak; if Amioun does have ḍaḥaq, the fact that it didn't become *ḍaḥaʔ suggests that the *k > q happened there only after the regular shift *q > ʔ, and hence has nothing to do with the Canaanite or Ugaritic forms.
29 leg innovative The alleged Ugaritic form is nonsense – Ugaritic had no j sound, and the dictionary of Del Olmo Lete and Sanmartin reveals no appropriate Ugaritic form. It is true that the Levantine form seems to be shared with Ethiopic and some Yemeni dialects, but not with any ancient language of the Fertile Crescent.
30 lion Arabic A very problematic choice as 'basic vocabulary'.
31 live Arabic / Canaanite / Aramaic Except that the Levantine form is clearly 'alive', not 'live', making the whole comparison problematic....
32 love Arabic The Arabic is of course mistranscribed - in his terms, it should be 2a7abba, whereas the Hebrew and Aramaic forms really do have a h.
33 make Arabic
34 man innovative 'zalame' is etymologically problematic – both Arabic and Aramaic etymologies have been proposed. 'rejjel' is of course from Arabic. dakar is 'male', not 'man'.
35 many Arabic
36 meat Arabic This shares a specific semantic shift with Arabic to the exclusion of the rest of Semitic : « staple food » > « meat »
37 milk Arabic / Ugaritic The root is common to several Semitic languages, but the use of the passive pattern fa3īl in this word is unique to Arabic
38 month Arabic Pretty sure the normal Levantine form is shahr, not sha7r, not that it makes any difference to the etymology – and for sure Syriac 'moon' below is sahrā, not šahrā.
39 moon Arabic

Wednesday, January 25, 2017

Tigre between ejectives and pharyngealization

There is some debate over the original pronunciation of the "emphatic" consonants (Arabic ط ض ظ ص ق) in Semitic and more generally in Afroasiatic: were they ejective as in Amharic, or pharyngealized/uvular as in Arabic? For a number of reasons, such as that in proto-Semitic they did not show a voicing contrast, the general opinion is that they were glottalized. Yet pharyngealized consonants show up not just in Arabic and neo-Aramaic but even in Berber, which would on the face of it suggest that the feature predates proto-Semitic. Either we have to suppose independent parallel development, or we must assume that Berber ejectives turned into pharyngealized consonants under the influence of Arabic. The latter seems more probable, but only if we can show that it is indeed plausible for a language to make such a change as a result of widespread bilingualism in Arabic.

It turns out that Tigre, the main language of northern Eritrea, offers a concrete example of just that. The inland plateau dialect of the Mansa`, commonly considered as standard, is described by Raz (1983) as having four ejectives k' (usually [ʔ]), t', s', and č̣ , and no pharyngealized or uvular consonants. You can hear an example of standard Tigre here, which seems consistent with his description. The coastal Hirgigo dialect spoken around Massawa, however - as heard in these Learn Tigre YouTube videos, however, show a rather different situation. ḳ is simply [q] (as in "elbow", "neck", "thigh"), ṭ is [tˤ] (as in "goat"), ṣ is [sˤ] (as in "white", "black", "back"); only for č̣ can you occasionally hear a slightly ejective realization [tʃ] ~ [tʃ'] (as in "fingers" or "fingernails"). The result is a good deal easier for an Arabic speaker to pronounce! This should not be too surprising: the port of Massawa has had extensive contact with Arabic speakers for many centuries. In fact, it's said to be the place where some of the first Muslims, seeking refuge from the persecution they were suffering in Mecca, landed on their way to the Abyssinian court. Such a diversity of emphatic consonant realizations within a single language confirms in turn that it is plausible for the habit of pharyngealizing emphatic consonants to be transferred from a language to its neighbors.

Saturday, January 21, 2017

Semitic languages in two Arabic novels

I've been reading two novels in Arabic lately. Frankenstein in Baghdad, by Ahmad Saadawi, reimagines Baghdad's descent into chaos in the mid-2000s, blending gritty realism with semi-allegorical horror. Samraweet, by Hajji Jaber, is an altogether gentler but still cutting narrative of the Eritrean diaspora, interleaving scenes from the narrator's life in Jeddah with ones from his first visit to Asmara as he gradually realizes the difficulty of being part of either place. Both turned out to share a feature I hadn't been expecting to find: dialogue in other Semitic languages.

In Frankenstein in Baghdad, one of the main characters is an elderly Assyrian woman, Elishawa "Umm Daniel". All her relatives have long since moved abroad, and keep begging her to come live with them where it's safe, so there are few occasions for her to speak anything but Arabic. However, the Assyrians of northern Iraq traditionally speak a variety of Neo-Aramaic, and when she meets her grandson from Melbourne, they have the following fairly elementary conversation (pp. 276-277), which I hope I've transcribed correctly:

"داخي إيوَت؟" (Dāx īwat?) "How are you?"
"سباي إيْوَن باسيما" (Spāy īwan basīmā) "I am fine, thanks."

The author of the book seems to be from southern Iraq, so I found it remarkable that he took the trouble to get some Neo-Aramaic dialogue - especially since the copula is appropriately put in the feminine form both times (in Assyrian Neo-Aramaic, even the 1st person singular copula agrees in gender). Probably he felt it would enhance her symbolic status as a reminder of what Iraq once was. Unfortunately, while Aramaic has been spoken in Iraq for almost three millennia, its prospects there are dim: after all these years of war and frequently persecution, most speakers live in Western cities, and unless they're exceptionally good at remaining a distinct, cohesive immigrant group, their descendants seem more likely to speak English or Swedish than Aramaic.

In Eritrea, unlike Iraq, most people have as their first language a Semitic language other than Arabic: Tigre in the north, Tigrinya in the south. So it was less surprising to find an Asmaran waiter on the first page saying "سنّي ما سيام" (?Senni mā syām), which I assume from context means something like "Good afternoon!" However, the occasional glimpses provided into Eritrean sociolinguistics were more eye-opening. The narrator and most of his friends are from a Tigre-speaking background and know how to speak it, but Tigre per se seems to play little part in their linguistic identity. They grew up not only speaking Arabic in the street, but feeling that Arabic is an Eritrean national language, and resenting the government's treatment of it as less central than Tigrinya. When an Eritrean in Jeddah speaks Tigre with him, the narrator assumes it's because he only arrived recently until he finds out, to his surprise, that this person simply "enjoys speaking it, even in Jeddah" (p. 76). It would be interesting to see how this compares to the attitudes of Tigre speakers living in Eritrea: between the prestige of Arabic and the status of Tigrinya, what are the long-term prospects for Tigre?

Monday, December 28, 2015

Raisins from Carthage to Siwa

Most Berber varieties have borrowed the word for "raisin" from Arabic, eg Kabyle azbib, or use a compound "dried grapes", eg Shilha aḍil aqurar. However, in Tunisia, Libya, and Egypt the situation is rather different, as this Facebook post illustrates:
Location Word for "raisin" In Arabic script
Djerba izummucen
Jadu iz/ẓemmuken ايزموكن
Nalut ijemmusen ايجموسن
Zuwara ijemmucen ايجموشن
Yefren, El-Qalaa ijummucen, ijemmac ايجمّوشن, ايجوموشن, اجماش
Siwa ijeṃṃusen إجموسن

The variation in the consonants is not completely regular, but note that there is a regular correspondences between k in Jadu and š in Yefren and Siwa (from palatal *ḱ), and that sibilant harmony is a fairly productive process in Berber.

As far as I know, this word's etymology has not previously been investigated, so I was happy to discover it this morning quite by chance. It happens to be attested in an ostracon from about 2000 years ago (give or take), found at the site of Al Qusbat, on the Libyan coast east of Tripoli:

This line in Neo-Punic - that is, the later Phoenician dialect spoken in North Africa - starts ldn`ṭ' `sr kkr' ṣmq, rendered by Jongeling and Kerr (Late Punic Epigraphy, 2005:24) as "for Donatus, 10 talents of dried fruits". As usual for Phoenician, the interpretation relies mainly on its much better documented close relative Hebrew: in this case, the relevant comparison is to the ṣimmuq-îm צִמֻּקִים֙ "raisins", attested 4 times in the Hebrew Bible. In Hebrew, the root of this word, ṣmq, means "to dry up"; in Arabic, the same root yields the rare forms ṣāmiq "thirsty", ṣamaqah "milk that has gone off". The direction of borrowing is therefore clear: from Phoenician into eastern Berber.

Now most of the attestations of this form are in a region where intense Punic influence is completely unsurprising: the coast of Tripolitania and southern Tunisia. However, any Classicist will remind us that Phoenician rule stopped at the Arae Philaenorum: eastern Libya was in Greek hands, and Phoenician never had any significant presence there. What, then, is this word doing in Siwa? The answer is simple, as I discuss in the introduction to my book Berber and Arabic in Siwa (Egypt): modern Siwi seems to derive mainly from a Berber variety spoken much further west, which reached Siwa only during the Middle Ages. There very probably was a Berber language spoken in Siwa before that, but if so, it has left very few traces.

Sunday, September 14, 2014

On finding the sources of shared items, OR: The irrelevance of anteriority

Similarities between different languages are data. It's easy to come up with any of several wildly different measures of such similarities, typically by applying edit distances to wordlists (as in the ASJP*) or texts, but the result should not be mistaken for an analysis - it's just a measurement, a compression of the data. It doesn't tell you anything about the causes of these similarities on its own. Historical linguistics is not the measurement of similarities, but the effort to find the hypothesis about past events that best explains them. Your H0, of course, is always "coincidence". Once you've rejected that, you're left with the trickier task of disentangling contact from common ancestry - trickier because, quite often, they partially overlap.

To understand linguistic causation in the past, an essential starting point is to look at it in the present. Suppose that you are a native speaker of English:

  1. If you say "football" or "garage" to your child while speaking English, it's because you grew up speaking English, and you know that this is what other English speakers say. The fact that French speakers happen to call it "football" too, if you're even aware of it, has nothing to do with your choice of words.
  2. If you say "football" or "garage" to your child while speaking French, it's because you later studied French, and you know that this is what French speakers say. The fact that it's also what English speakers say no doubt made it easier to memorise, but if French speakers had named them something else, you would be doing the same.

We thus see that, for shared words, inheritance from either of two radically different languages can yield precisely the same outcome. The fact that English and French share these words in the first place is obviously due to contact (in each direction). The fact that your child is growing up with them, however, is because you're faithfully passing on the existing norms of one or the other language, not because you're combining them. In historical linguistic jargon, the use of the word "football" is at this point being inherited, not borrowed. Thus, if an English-monolingual Cajun says "stupid", it's not because he's managed to hold on to his ancestors' French word "stupide", it's because that happens to be the English word for it.

So, if we have a word in language A, and find the same word in two potential source languages B and C, we can't determine which it came from by looking at which language was spoken in the area earlier, or which was spoken by the speakers' ancestors. We can only determine which it came from by determining which language (if either) was transmitted as a whole, and the evidence for that can only come from forms that aren't shared between B and C. I leave the application of this to Levantine ʕāmmiyya as an exercise for the reader.

* It's beating a dead horse at this point, but: this Automated Similarity Judgement Program? It, too, finds that Levantine is way closer to Standard Arabic than to Aramaic, just like any historical linguist could have told you from the start.

Saturday, September 13, 2014

Zombie hypotheses and the Zeitgeist

Everything I've been saying for the past 3 posts is basic textbook stuff, reflecting a stable consensus among Semitic historical linguists over, oh, the past two centuries or so. Why, then, is this zombie hypothesis that Levantine Arabic comes from Aramaic still popular in parts of the Levant? That's no great mystery: it comes from a more general movement to emphasise Levantine (and especially Lebanese) culture's continuity with the pre-Islamic Levant, and downplay the influence of Arabs. (Similar efforts have been made in North Africa, notably Abdou Elimam). As far as I can tell, the unstated reasoning goes something like this:
  1. Levantines are descended from the Aramaic-speaking natives of the land, not from Arab immigrants.
  2. Levantines' language contains a lot that sounds like Aramaic.
  3. Therefore, Levantine is a continuation of Aramaic, not of Arabic.

Step 3, of course, does not follow from Steps 1 and 2. Step 1 is irrelevant to the whole question; the language of your ancestors is very often not the ancestor of your language (ask any Irishman, or any Egyptian). Step 2 is necessary but insufficient for getting to Step 3, since the statement is just as true of Classical Arabic - or of Akkadian, or Ethiopic - as it is of Levantine; we've already seen that deciding linguistic ancestry requires a more sophisticated toolkit.

Nevertheless, this impulse to emphasise continuity and downplay movement deserves more attention. In the Arabic-speaking world, the conspicuous problems with the existing political and economic order, and the humiliating contrasts between the ideals of pan-Arabism and the reality of closed borders and unchallenged occupations, provide an obvious local motivation to downplay Arab identity, and language is so central to pan-Arab identity that it could hardly be left unchallenged. But the impulse is not unique to the region; in some respects, it faithfully reflects wider intellectual trends of the late 20th/early 21st century.

During this era, immediately following some of the largest migrations and invasions in human history, many archeologists and historians have come to feel more and more uncomfortable with the very idea of either. Changes in material culture previously seen as the result of migration were re-explained as diffusion or independent innovation, and reports of barbarian invasions were reinterpreted or dismissed. In some ways, this has been a useful corrective to a previous era's overemphasis on migration; it has arguably made linguists more conscious of the familiar fact that language shift does not necessarily imply invasion, much less population replacement. In others, its influence has been rather less helpful. Linguists reached the late 20th century with a well-tested toolkit for studying the origins of basic vocabulary and morphology, its predictions spectacularly confirmed by such discoveries as laryngeals in Hittite and labiovelars in Mycenaean Greek. Applying this to most Old World languages, and many American or Australian ones, yields a story of discontinuity (be it through language shift or population replacement) that would be familiar to any 19th-century philologist, but that grates somewhat on postmodern ears. Of course, the same toolkit often allows us to detect substrata - elements left over from the population's previous language after they shifted to another one - but that's not enough to satisfy everybody.

A few linguists have responded by trying to change the rules of the game, insisting that the origins of a language should be determined not by vocabulary and morphology, as is normally done, but by purely structural features. This is an important component of Wexler's generally rejected claims that Yiddish is non-Germanic (and that Modern Hebrew is non-Semitic), and is the very essence of Lefebvre's somewhat more popular claims that Haitian Creole is just relexified Fongbe (and almost anything else with "relexification" in the title.) This approach runs into severe problems almost instantly - establishing the history of syntactic or semantic patterns is far more difficult than establishing the history of vocabulary or morphology, simply because the former are far less arbitrary and are chosen from a far smaller set of possibilities. To make matters worse, we also find major discontinuities in such patterns in cases where both the population and the vocabulary were relatively stable, such as the transition from Old English to Modern English. Johanna Nichols' efforts point towards the possibility of getting around this by identifying highly time-stable typological features, but the results, at their best, are not nearly fine-grained enough to support narratives of continuity in any specific location. "Continuitarians" in the Arab world apparently haven't gotten around to adopting this approach yet, except occasionally in Morocco, where academic linguistics is unusually advanced for the region; they surely will, however, when they realise that it could be extended to cases like Egypt, rather than being limited to the Fertile Crescent.

For much of the world, especially Europe, a complete lack of ancient written documentation makes another response available: simply argue that the language currently spoken there must have been spoken far earlier than previously assumed, and hence got there not through invasion but through some more peaceful process. This yields the various Paleolithic Continuity Hypotheses. The main problem with this for linguists is that it forces us to postulate a much lower rate of linguistic change for the past than is observed for languages with a long written history, or even for unwritten languages that happen to have been recorded as long intervals; as a result, these hypotheses have remained fairly unpopular. For the Middle East, however, the point is moot: writing has a longer history there than anywhere else on the planet, and that history reveals regular episodes of language extinction, language shift, invasion, migration, exile, and everything else that we're supposed to be de-emphasising.

So if you really want to emphasise your languages' continuity with your ancestors', these are two more promising ways to do it. But I would suggest that there's no reason to bother. If your current identity isn't working out for you, and you don't think you can reform it, why not work on creating a genuinely new one, rather than perpetuating the obsession with heritage by digging around in history for an even older one? It worked out pretty well for America, after all.

Thursday, September 11, 2014

Why "Levantine" is Arabic, not Aramaic: Part 3

We've seen that historical linguists decide which languages share a more recent common ancestor on the basis of shared innovations (or their absence). But if you're paying attention, you may have noticed a potential problem here: innovations can be shared for at least three reasons:
  • Common ancestry - the reason why, for example, Proto-Indo-European intervocalic *s has changed to r both in Spanish and in French.
  • Contact - for example, the change of r (the rolled r you get in Spanish) to R (the uvular r you get in French) started in French, but spread to other European languages such as German, probably due to the prestige of French among the upper classes (actually there's some debate about the direction of spread - see eg this paper by Kostakis - but either way it spread through contact)
  • Chance - for example, θ (th) has changed to t both in Jamaican English and in Levantine, but not because they share any common history or close ties.

So, when it comes to shared innovations, what can we do to distinguish the "confounding factors" of chance and contact from common ancestry? There are two obvious general approaches. The most securely reliable is to establish relative chronology: if change A was applied to the outputs of change B, then obviously change B is the older. Unfortunately, many pairs of changes are commutative - the relative order makes no difference to the output. That often forces us to resort to the more probabilistic criterion of number of changes: if language A shares a lot of common innovations with language B to the exclusion of C, and only a couple with language C to the exclusion of B, then it's more parsimonious to group A with B and find some other explanation for those shared with C. For better results, we can weight the innovations according to the chances of them occurring independently: for example, a change of ð > d is rather common worldwide, whereas a change of ɬʼ > ʕ is rather unusual.

Levantine Arabic provides a useful case study: as NNT correctly pointed out, it shares a couple of innovative sound changes with Aramaic, in particular θ (th) > t, ð (dh) > d. (The hamza-y correspondence is a different issue - there's massive variation within Classical Arabic on where and whether hamza is realised, as can be seen from the different Qur'an reading traditions, and the consonantal orthography of Classical Arabic obviously reflects a dialect in which, like the majority of present-day dialects but unlike Modern Standard, hamza was hardly ever pronounced). Yet we have seen that Levantine Arabic does not share most of Aramaic's defining innovations, and does share important innovations of Arabic, such as the reflexes of proto-Semitic *g, *θʼ, *ɬʼ, and (depending on reconstruction) , the replacement of "say" (originally 'amar-) with qāl-, the metathesis of ʕam- "with" to maʕ-, or almost every detail of the extremely intricate broken plural system. How can this be explained?

If the explanation is common ancestry, then we should find the changes θ > t, ð > d only in Levantine words that are not Arabic innovations. In fact, however, we find them in words such as itnēn "two", in which the i- is an Arabic innovation - cp. Arabic iθnayni (acc/gen), Aramaic trēn, proto-Semitic *θn-ay-n(a). This hypothesis would also fail to account for the rest of the observations; if Levantine shares a more recent common ancestry with Aramaic than with Arabic, and is spoken exclusively in an area once dominated by Aramaic, then why on earth did it pick up so many innovations from Arabic while remaining immune to practically all the innovations Aramaic went through except these two? Both the criteria given above therefore point away from common ancestry as an explanation.

This suggests that we should consider contact. At first sight, you might think the answer is simple: Aramaic speakers couldn't pronounce interdentals, so they left them out of their Aramaic-accented Arabic. But that hypothesis would be absurd. By the late pre-Islamic era, all known varieties of Aramaic did in fact have the sounds θ and ð, due to a later development of t > θ, d > ð after vowels (except when doubled). We find these sounds alive and well in the only surviving Levantine Aramaic dialect, that of Maaloula: eg xoθla "wall", ḳrīθa "village", eḥða "one (f.)". Why, then, would Aramaic speakers change these sounds to t, d in Arabic?

How about the opposite contact situation: Arabic speakers living on the fringes of the Aramaic-speaking world copied the shift θ > t, ð > d from their neighbours, while those living further inland stuck with the traditional pronunciation? That is more plausible, but still a bit problematic. The development of t > θ, d > ð had already happened by 250 BC in Aramaic, so the shift would have to have been borrowed before that; but Arabic-speaking groups which used Aramaic as their high language, such as the Nabataeans or Petra, are only well-attested later than that.

A third, more subtle contact explanation seems preferable. Aramaic speakers would certainly have taken advantage of the many similarities between Aramaic and Arabic to reduce the burden on their memories. But, whereas θ and ð are extremely common in Aramaic, in Arabic they are quite rare: in the Qur'ān, t is ten times commoner than θ, and while ð is about as common as d overall, practically all of its occurrences are limited to demonstratives. A good rule of thumb for the Aramaic learner of Arabic to apply would therefore be "replace Aramaic θ, ð with t, d except in demonstratives"; 9 times out of 10, the result would be correct Arabic, and the 10th time it would still be comprehensible. In such an environment, where Aramaic-speaking learners of Arabic outnumbered native speakers, it's not hard to imagine the distinction disappearing. If so, the loss of interdentals in Levantine would indeed reflect Aramaic influence - as a result of Aramaic speakers' effort to avoid Aramaic forms!

Tuesday, September 09, 2014

Why "Levantine" is Arabic, not Aramaic: Part 2

Last time, I promised to look at the "ratio of content ⊂ Arabic & ⊄ Aramaic". To do that, we need two things: data on the frequency of different words and morphemes, and etymologies for each word and morpheme. If this were English, I could offer you a 450-million-word online digital corpus for the former, and the OED for the latter. For Levantine Arabic the pickings are a bit scantier. There are indeed several digital corpora of Levantine Arabic, but none of them are publicly available, and none have published any frequency data that I can find offhand; and for etymologies, you have to consult, by hand, as many dictionaries (of several languages) as it takes.

So for present purposes, I will use a much smaller substitute, which can hardly be accused of any partiality to Standard Arabic: namely, a selection from Said Akl's Roomyo w Julyeet (CORRECTION: introduced by Said Akl), which I was lucky enough to run into at an Oxfam a few years ago. I picked a well-known section of the play whose language seemed relatively simple, with little or no visible Standard Arabic influence - the lines starting from "Romeo, Romeo, wherefore art thou Romeo?" (p. 62), including Romeo's reply and Juliet's reply to him (finishing on the second line of p. 63) - and counted morpheme frequencies (retaining his eccentric orthography). The 26 morphemes that occurred more than once account for about two-thirds of the selection, so looking at their etymologies gives us the maximum of information for the minimum of effort - and here they are. Only those that are unambiguously Arabic or unambiguously Aramaic are relevant to our purpose; the rest may be dismissed as "confounding factors":

  1. w(e) و "and" (11 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  2. b(e)- / m- بـ٬ مـ [marker of the indicative imperfect] (10 occurrences): Innovative. This form is found as such neither in Classical Arabic nor in Aramaic, and its etymology poses some difficulties; if you know of any convincing work on this, let me know in the comments.
  3. -aq ـك "you m. sg. oblique" (9 occurrences): Arabic. Both Aramaic and Arabic have cognates of this, but in Aramaic the consonant has changed to kh, whereas Levantine - like Arabic - has kept the original k.
  4. ¢esm اسم "name" (6 occurrences): Arabic. Both Aramaic and Arabic have cognates of this, but in Aramaic the consonant is sh, whereas in Levantine - as in Arabic - it's s. (There is controversy over which value is original.)
  5. la "no, not, neither... nor" (5 occurrences): "Confounding". The form is shared identically by Arabic and Aramaic; the usage is actually closer to Arabic (where it negates verbs only in the imperfect and the negative imperative) than to Aramaic (where it negates verbs in all tenses), but we'll score it as shared.
  6. -u / -h / -vowel length (depending on context) ـه "him, his" (5 occurrences): Arabic. Aramaic -eh could explain the h form and the vowel length form, but the -u can be satisfactorily derived only from Arabic -hu.
  7. quun كون "be" (4 occurrences): "Confounding". In reality this is much more likely to be Arabic, since the normal Aramaic root for "be" is hwy, but kwn is attested in this sense in Aramaic too.
  8. men من "from" (4 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  9. ḍall ضل "remain" (4 occurrences): Arabic. There is no Aramaic source for emphatic D.
  10. (e)l الـ
    • "the" (4 occurrences): Arabic. (Aramaic originally used suffixed -aa, which later lost its definite sense.)
    • [relative marker] (3 occurrences): Innovative, but based on extending the functions of the Arabic definite article, and probably on shortening a form similar to Classical Arabic alladhii, which it resembles rather more than the Aramaic relative marker dh-.)
  11. ¢ent انت "you (m. sg.)" (3 occurrences): Arabic. In Aramaic, the n disappeared, assimilated to the following t.
  12. ma ما "not" (3 occurrences): Arabic. In Aramaic, maa is never used as a negator.
  13. law لو "if" (3 occurrences): Arabic. (Aramaic does not generally use this, but where traces of a cognate are found, as in some frozen combinations, it takes the form luu, not law.)
  14. cu شو "what?" (3 occurrences): Original, from Arabic. Found as such neither in Arabic nor in Aramaic, but its generally accepted etymology is Arabic, from a contraction of أي شي هو "what thing is it?".
  15. sammi "name (v.)" (3 occurrences): Arabic, for the same reason as esm above.
  16. e- / Ø- أـ [first person singular subject marker] (3 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  17. t- تـ [second person masculine singular subject marker] (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  18. -ni ـني "me" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  19. -a ـا "her" (2 occurrences): "Confounding". At first sight the loss of the h makes it appear closer to Aramaic than to Classical Arabic - but the h was also lost in -u "him", which cannot be explained as Aramaic.
  20. -t ـت [feminine singular construct state marker]: "Confounding". The form is compatible with Arabic or Aramaic origins (Aramaic had th, but we would expect that to be turned back into t, since Levantine has no interdentals.) The function straightforwardly existed in Aramaic; in Classical Arabic, it did not, but the pre-pausal pronunciation of -at- as -ah provides an obvious source for it to develop from, and indeed it exists in practically all modern dialects (including those of the Arabian peninsula). If you're feeling really generous, though, you might ignore the latter fact and award this one to Aramaic.
  21. ¢ana أنا "I" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  22. hu هو "he" (2 occurrences): "Confounding". At first sight the Aramaic form huu is closer than Classical Arabic huwa, but loss of final vowels is regular in Levantine Arabic, so you would expect huwa to become hu anyway.
  23. ya يا "oh" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  24. ¢aw أو "or" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  25. xebb حب "love" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  26. jez¢ جزء "part" (2 occurrences): Arabic. I haven't noticed an Aramaic cognate, but even if there is one, the palatalisation of the j (from original g) marks it as Arabic.

So, out of these 26 items - which together account for 107 out of the 161 morphemes in this selection - 10 are unambiguously Arabic (accounting for 46 morphemes), and none are unambiguously Aramaic. 15 items (accounting for 91 of the morphemes) could equally well be Arabic or Aramaic, and as such are irrelevant to determining which one predominates within Lebanese Arabic. (If you decide to be really generous to Aramaic, you might shift -a, hu, and -t to the Aramaic column, accounting for a grand total of 6 morphemes versus Arabic's 46.) The remaining single item, the imperfect prefix b-, is a later innovation whose history is unclear; even if someone found an Aramaic etymology for it and added it to all the unlikely cases mentioned, the ratio of "content ⊂ Arabic & ⊄ Aramaic" to "content ⊂ Aramaic & ⊄ Arabic" for this list would still be about 3:1. On a less generous and more plausible calculation, it's infinite (46:0). Either way, by this criterion, too, Levantine is Arabic, not Aramaic.

If you pick a long enough text, of course, you will eventually find an Aramaic loan or two. There are quite a few Aramaic loans in Levantine Arabic, depending on the dialect, and they must really stand out to a Levantine speaker studying Aramaic. But even in the most heavily Aramaic-influenced dialects, they occur far less frequently than unambiguously Arabic forms. While historical linguists' usual definition of language origin does not rely on any explicit frequency criteria, in all the cases I've seen, the most frequent source of vocabulary by token count for a sufficiently large text turns out to be what historical linguists would consider as that language's parent. In Levantine Arabic the effect is even stronger, since not only is the basic vocabulary of Arabic origin, so is most of the learned vocabulary.

Now, after all those calculations, I'm sure you're eager to read the lovers' dialogue, so here it is:

جلييت: يا روميو! يا روميو! ليش انت روميو؟
نكور بيك٬ ورفود اسمك٬
أو٬ إذا ما بدك٬ حلوف إنك بتحبني
وأنا ببطل كون من عايلت كابيولت.

روميو: بضل عم بسمعا
أو بحكي معا؟

جلييت: إسمك بس عدوي.
انت، بتضل انت زاتك٬ ولو ما كنت منتغيو.
و شو المنتغيو؟ لا هو إيد ولا إجر
ولا دراع ولا وج ولا أي جزء
من جسم الإنسان؟ آه، كون اسم تاني!
و شو فيه الاسم؟ ال منسميه ورد
لو شو ما سمينا بتضل ريحتو حلوة،
و هيك روميو، لو ما تسمى روميو
كان بيضل محتفظ بهالكمال المحبوب
ال بيملكو بدون عيب. يا روميو، تجرد من اسمك،
ومقابل اسمك ال هو مش جزء منك،
خدني أنا كلي!

And in the original orthography:

Sunday, September 07, 2014

Why "Levantine" is Arabic, not Aramaic: Part 1

Following in a long tradition of people imagining that knowing a few languages or a bit of mathematics implies they already know linguistics better than any self-styled specialist, the quasi-celebrity author Nassim Nicholas Taleb recently decided to claim that "Levantine is modernized Aramaic". (Let's not comment further on the attached table, whose attempt at Standard Arabic is painfully bad, and which omits the whole Aramaic column except for the title. Also, let's not confuse it with the separate question of how distant Levantine is from Standard Arabic.) The ensuing Twitter "debate", while of little value in itself, nicely illustrates a number of common misconceptions, some of them worth responding to in a less cramped medium. I'll start with the most explicitly political one, since it's bound to colour responses to any purely academic argument:

You just call it Arabic because Arabic is used for "high" functions in the region; If we were diglossic Levantine/Aramaic instead of Levantine/Arabic you would say the same.

Less than 90 km from NNT's hometown is a village where they do in fact still speak Aramaic, while of course still being diglossic in Arabic: Maaloula, in Syria. Despite heavy Arabic influence, this village's language has never once been mistaken for Arabic; its own people call it siryêni, and European Semitists recognised it as Aramaic as soon as even simple wordlists became available. If you happen to be Levantine, try listening to some of it (eg here) - how much of that do you understand? The same is true of other relict Semitic languages within the Arabic-speaking world, such as Mehri or Jibbali or Soqotri or Neo-Mandaic. I have more than one book in which Soqotri or Jibbali speakers attempt to prove that their languages are really Arabic, for much the same reasons that NNT wants his language not to be Arabic - but, notwithstanding the speakers' desires, Semitists had no trouble proving that these languages were not descended from Arabic. Conversely, the "high" languages of Malta have always been English and Italian, yet, despite Maltese nationalists' best efforts to show that Maltese was really Punic, European Semitists had no difficulty in identifying it as descended from Arabic. So, no, Semitic historical linguists do not base their decisions on what kind of diglossia happens to be around, nor were all those 19th-century German Orientalists secret agents sent back in time by the Baath Party. To the contrary, almost all Semitists I've known would be far more excited to discover that some undocumented variety was a new Semitic language than to find out that it was "just" another dialect of Arabic.

"Proving" Levantine comes from Arabic rather than Aramaic like "proving" Spanish comes from Italian, not latin.

How do linguists know that Spanish is descended directly from Latin, not from Italian? Simple: we look for cases in which Italian has made a change - innovated - and Spanish hasn't. Such cases are easy to find: for example, in Italian original *fl has become *fi (thus fiore "flower") and original long *e in open syllables has become i (thus di "of"), whereas in Spanish original *fl remains fl, and *e e (thus flor, de). If Spanish were descended from Italian, then these changes would all have had to have happened and then reversed themselves in Spain, which is very unlikely. We can know which form was original not just because in this case we have copious ancient data, but also by using comparative-historical reconstruction. The full toolkit would take too long to explain here (my favourite textbook is Lyle Campbell's Historical Linguistics), but basically, we:

  • establish sets of sounds corresponding systematically to one another;
  • figure out whether these correspondence sets systematically occur only in certain environments, and, if so, see whether there are any other correspondence sets occurring only in non-overlapping environments that they can be unified with.
This procedure allows us to prove that the ancestor language must have distinguished at least as many phonemes as members of the resulting set of correspondence sets, and - combined with a large body of knowledge about likely and unlikely sound changes - gives us a good chance of determining what the actual sound of those phonemes were. This technique was, of course, developed mainly for reconstructing unattested languages, but way back in the 1950s, Charles Hall decided to test it by applying it to Romance. The result was, as you might hope, Vulgar Latin.

Now, let us apply this to Levantine, Arabic, and Aramaic. Reconstructing the common ancestor of Aramaic and Arabic (see eg here or even just here) shows that Aramaic features a number of innovations not shared with Arabic; conveniently, many of these are mergers. In particular, in Aramaic *`, *ʁ (gh), and *ɬʼ (lh) all merge to ` (ayin); *x (kh) and *ħ merge to ħ (heth); initial *w and *y merge to *y. In Arabic, all of these distinctions are maintained. Now, the nice thing about mergers is that they can't be reversed; once two formerly distinct word classes feature the same phoneme, there's no way for the ordinary speaker to recover the distinction. A monolingual Aramaic speaker has no way of telling that the ` in 'ar`ā "earth" (< *'arɬʼ- + -ā) used to be pronounced differently from the ` in ṭar`ā "door", or in `aynā "eye". In Levantine, all of these distinctions are normally maintained, just as they are in Arabic; أرض has none of the consonants of عين. QED. (In fact, historical linguists have also succeeded in identifying some Aramaic loans into Levantine Arabic by finding the small minority of words in which these distinctions were lost.) In fact, you don't even need to look at phonology to figure this out; the grammar provides plenty of clues. In Aramaic, for example, almost every noun ends in , except in a few specific contexts. This is an innovation specific to Aramaic, accomplished by gluing a former demonstrative on to the end of the noun, and preserved in every modern spoken Aramaic variety. In Arabic, it never happened - nor, obviously, in Levantine.

Of course, NNT shows no signs of even being aware of the relevance of regular sound correspondences, mergers, or any of the other elements in a historical linguist's toolkit, much less of accepting them as definitive criteria for language classification. At one point, however, he vaguely expresses the criterion he thinks should be definitive:

To prove that Levantine derives from Arabic WITHOUT Aramaic route you need to finds ratio of content ⊂ Arabic & ⊄ Aramaic. Not done.

Now that we've seen a little bit of how linguists determine what comes from Arabic and what comes from Aramaic, we're ready to look at the results of this criterion in the next post. You should be able to guess the answer already...

Monday, August 18, 2014

A South Arabian loan into Libyan Berber?

From Morocco to Oman, there is a long tradition of imagining that the Berbers of North Africa and the Mehris of South Arabia speak the same language. This is by no means confined to pan-Arab nationalists - Siwis have told me more than once that some friend of a friend had met non-Arabic-speaking Yemenis and understood their language, and I'm told many Mehris have the same belief. I've previously discussed some possible reasons for this belief, as well as the more obviously propagandistic claim that Arabic descends from Berber; both are false.

Nevertheless, it is true that significant numbers of Yemenis participated in the Arab migrations to North Africa during the Islamic era, and it's not inherently implausible that some should have brought their languages with them. In fact, I just came across what looks very much like a South Arabian loan into the northwestern Libyan Berber variety of Zuwara (At Willul).

In Zuwara, the usual word for "father" is baba, as in many other Berber varieties, but in a few collocations such as əg tíddart n ḥíbi-s "in her father's house", a different term ḥibi is substituted (Mitchell 2009:303, 341). This word is unlikely to be proto-Berber, since proto-Berber did not have a phoneme /ḥ/ and since it is quite unusual within Berber. And as far as I know, it is not used anywhere in Arabic (although Libyan dialects are not that well documented). One could try to link it to ḥabīb-ī "my beloved", but that would be phonetically irregular and semantically unlikely, since this term is normally used in the context of romantic love or of a child by their parents.

However, the normal word for "father" in Mehri is ḥīb "father" - ḥayb-ī "my father", ḥīb-as "his father" (Watson 2012:149). In fact, Mehri adds this prefix to a number of kinship terms: ḥāmē "mother", ḥabrē "son", ḥabrīt "daughter" (ibid), as well as a number of other common nouns. Its function is to mark definiteness (ibid:64). But no such definite article has ever existed in Arabic or in Berber, so the only possible explanations for the similarity of Zuwara ḥibi are pure coincidence or borrowing from Mehri into Berber (perhaps via an Arabic dialect?). It will be interesting to see if other cases turn up.

And as long as I'm talking about Libyan Berber, I really ought to mention Marijn van Putten's new book A Grammar of Awjila Berber (see his announcement at Oriental Berber).. This careful analysis of all the unfortunately limited data available on the very unusual Berber variety of Awjila, in the far east of Libya, is an important resource for Berber historical linguistics. I hope that things settle down in Libya soon enough to make a fuller description possible, but for the moment, this work appears unlikely to be superseded.

Saturday, August 09, 2014

Some minority languages of the Mosul Plain

For most of the past decade, while first the rest of Iraq and then Syria (150,000 dead, 2.5 million refugees) have burned, Northern Iraq has seemed like a relative oasis of calm. That has changed rather suddenly: with ISIS' religious persecution, and now American airstrikes, Northern Iraq and its minorities are suddenly prominent in the headlines. The headlines throw into sharp relief the region's status as perhaps the most religiously diverse place in the Middle East - but what they may not show is that this region is also a small-scale "residual zone" preserving rather more linguistic diversity than is typical for such a small area in the modern Fertile Crescent (not just Arabic and Kurdish!)

The most endangered language of the region is certainly Northeastern Neo-Aramaic (NENA), or Sûreth (ܣܘܪܝܬ). Once, Aramaic was the lingua franca of the Middle East, spoken in various dialects from Gaza to Basra, and written as far afield as China and India. By the early 20th century, it was restricted to a few hundred far-flung mountain villages; the largest dialect group, NENA, was centered on the Christian (Assyrian and Chaldean) villages of the Mosul Plain, such as Tel Kef (Telkepe) and Qaraqosh, and across the border in Iran and Turkey; a detailed map is available at Cambridge's NENA Database. Today, those who have stayed behind in ever harder conditions are substantially outnumbered by their diaspora in cities such as Detroit or Sydney, whose children increasingly just speak English - and, as of the past couple of days, media accounts suggest that fleeing refugees have left the Mosul Plain villages practically empty. Their exodus is rather reminiscent of what happened about a century ago: during the Armenian/Assyrian Genocide, the NENA-speaking Assyrians of Hakkari fled from Turkey never to return, taking refuge in Iraq and finally in Syria. It remains to be seen whether this exile will be as lasting as the previous one. If you're wondering how the language sounds, the NENA Database site has a number of recordings, some transcribed, such as The Story of the Cobbler; others can be heard at Semitisches Tonarchiv.

While Kurds prefer to consider Kurdish as one language, the two main Kurdish varieties of northern Iraq - Sorani and Kurmanji - are strikingly different from one another, and are usually considered as separate languages by academics. The smaller Gurani language, (see DOBES), spoken in northwestern Iraq and also commonly labelled Kurdish, doesn't even belong to the same branch of Iranian as Sorani and Kurmanji. Many of its speakers belong to loosely Shia-affiliated minority religions, such as the Ahl-i Haqq and the Shabak, considered by ISIS as beyond the pale.

The other minority group unfortunate enough to have been pitched into the headlines, Yezidis, do not have a language of their own; they speak Kurmanji Kurdish. However, the Yezidis are associated with a unique writing system. In the early 20th century, manuscripts summarising Yezidi beliefs written in a unique alphabet (such as the Meshefa Resh "Black Scripture") came into the possession of Western researchers, and the alphabet in question duly found its way into compendia such as Diringer (1968). Later research, though, suggests that both these manuscripts and the alphabet they were written in were created for Western consumption, likely by a non-Yezidi bookseller, rather than representing a Yezidi tradition (Kreyenbrook and Rashow 2005, EI).

The region's Turkmen, many of whom have also apparently been persecuted by ISIS for their Shiism, speak a Turkic variety close to Turkish and Azeri. From what little information I've seen, it seems unlikely to qualify as a separate language, but does not seem to have attracted much research.

The Arabic dialects of northern Iraq - the so-called qeltu dialects, for their unique pronunciation of the word "I said" - are also quite interesting in their own right; the spoken Arabic dialect of Abbasid Baghdad seems likely to have belonged to this group. However, that is another story for another day...

Sunday, July 29, 2012

Arabic /ē/ gets colloquial: the case of al-Kisā'ī

My description of Khalaf's reading in the previous post applies to all readings transmitted from Ḥamzah ibn Ḥabīb al-Taymī of Kūfa: Khalaf, Khallād, Idrīs al-Ḥaddād, and Isħāq al-Warrāq. There is one other set of readings with final -ē, however: those transmitted from ʕAlī ibn Ḥamza al-Kisā'ī of Kūfa, through his students Abū al-Ḥārith and al-Dūrī. Here's an example (sūrat al-Shams, al-Dūrī's reading):

This set shows two interesting differences for the words examined before:

  • Verbs with medial /ē/ in the Ḥamzah tradition simply have [ā]; contrast Ḥamzah's xēba خاب "he lost" with al-Kisā'ī's xāba. In other words, medially *aya and *awa both become ā, just like in the standard Classical pronunciation.
  • Verbs with final /ā/ in the Ḥamzah tradition have /ē/, just like the ones with /ē/; contrast Ḥamzah's talāhā تلاها "it followed it" with al-Kisā'ī's talēhā. In other words, final *aya and *awa both become ē, whereas original *ā remains ā.

The latter development is phonetically quite counterintuitive - why would *awa become ē, when it didn't even contain any front vowel? But it makes more sense when you look at it on a morphogical rather than phonological level. Arabic has a huge number of final-y verbs, and a much smaller number of final-w verbs. In the rather common 3rd-person perfect forms, they are indistinguishable. This makes it tempting to simplify the system by reducing the differences between the two classes, and in fact practically all modern Arabic dialects have taken this to its logical conclusion and simply conjugate all final-w verbs as if they were final-y: thus Algerian Arabic, for instance, has dʕa دعا, dʕit دعيت, yədʕi يدعي instead of daʕā دعا, daʕawtu دعوت, yadʕū يدعو. What al-Kisā'ī is doing looks like an early step along that road.

You may notice that another characteristic of this reading is also distinctly reminiscent of certain modern colloquials, in particular those of Syria: prepausal feminine -ah ة is pronounced -ih.

Tuesday, July 24, 2012

/ē/, Arabic's fourth long vowel

(Warning: This post assumes some knowledge of Arabic, although you should be able to follow the argument even without it.)

Everyone knows that Classical Arabic has three short vowels (a, i, u) and three long (ā, ī, ū). But is this true of all varieties of Classical Arabic? Listen to this recitation of sūrat al-'Aʕlā:

There are several distinct Qur'ān recitation traditions, thought to reflect (in part) early dialectal variation in pronunciation. The best known are Ḥafṣ (Asia and Egypt) and Warsh (mainly North and West Africa); the recitation above is in one of the more obscure ones, Khalaf (ʕan Ḥamzah). In it, you will notice that words like šē'a “he willed” شاء, tansē “you forget” تنسى, appear with ē where more common pronunciations of Classical Arabic would use ā. But not all cases of ā are pronounced ē: contrast for instance ġuθā'-an “chaff” غثاء, “not” لا. Let's try to figure out what's going on here.

Start with the verbs ending in ā. Verbs which end in ā in the 3rd person masculine singular (“he did”), such as hadā “he guided” هدى, ṣallā “he prayed” صلى, daʕā “he invited” دعا, sajā “it covered with darkness” سجا, divide into two classes in other forms, one ending in y, the other in w: haday-ta “you guided” هديت, ṣallay-ta “you prayed” صليت vs. daʕaw-ta “you invite” دعوت, sajaw-ta “you covered in darkness” سجوت. You have just heard that the former set become hadē, ṣallē. For the latter, we will have to examine different sūras: in the 10th verse of sūrat al-Qamar (1:20) we hear daʕā, and in the 2nd of al-Ḍuħā we hear sajā.

Now ordinary three-letter verbs have the same stem throughout: katab-a “he wrote” كتب vs. katab-ta “you wrote” كتبت. What if the same used to be true of these verbs: *haday-a “he guided” vs. *daʕaw-a “he invited”? (The asterisk means that these are just hypothetical forms.) As it happens, that idea is confirmed if you look at one of Arabic's closest relatives. In Ge'ez, the Semitic classical language of Ethiopia, the cognate verbs are pronounced precisely as reconstructed: with aya (eg ṣallaya “he prayed”) and awa (eg ṣalawa “he roasted”, Arabic ṣalā صلا, ṣalaw-ta صلوت). So if we assume those forms were original, then we can easily see what's going on: in ordinary Classical Arabic both original *aya and *awa end up as ā at the end of a word, but in the Khalaf reading they remain distinct: *aya becomes ē, but *awa becomes ā.

A similar division can be made among verbs with medial ā. Verbs with medial ā in the 3rd person masculine singular, such as zāda “he increased” زاد, ħāqa “he surrounded” حاق, kāna “he was” كان, qāla “he said” قال, divide similarly into two classes in their verbal nouns, one in y, the other in w: zayd زيد, ħayq حيق vs. kawn كون, qawl قول. So we might expect a similar original difference: *zayada, ħayaqa vs. *kawana, *qawala. Sure enough, the pronunciation is as expected. Listen to sūrat al-Baqarah, verses 10 and 11 (about 2:00) and sūrat al-'Anʕām, verse 10 (about 3:00): zēda, ħēqa vs. kāna, qāla. A near-minimal pair is provided by sā'a “he was bad” ساء (sūrat al-Munāfiqūn, v. 2, about 0:50) vs. šē'a “he willed” شاء (already heard in sūrat al-'Aʕlā.)

So – depending on how abstract you are willing to make your representations – this variety of classical Arabic seems to have four long vowel phonemes rather than three. It is also unambiguously more conservative in this respect than the mainstream pronunciation reflected both in the Ḥafṣ reading and in educated standard Arabic, which underscores the philological value of such reading traditions.

(Note: The Qur'ānic Arabic Corpus was useful in preparing this post.)

Friday, November 25, 2011

South Arabian languages on YouTube

In eastern Yemen and western Oman, there are spoken several South Arabian languages - Semitic, but more distantly related to Arabic than Arabic is to Aramaic or Hebrew. The largest of these is Mehri. If you speak Arabic and want to learn how to form questions in Mehri (or just want to hear what this language sounds like), there's a recording on YouTube for you: اللغة المهرية - محب اللغة المهرية وليد التميمي. For its rather smaller relative Jibbali, there's some poetry. Someone has even attempted to put up recordings of all the major dialects of Yemen (mainly Arabic.)

A longstanding rumour claims that these languages are mutually comprehensible with Berber. As some listeners will be able to see, this is not correct.

Wednesday, March 02, 2011

From hatred to singing in two easy steps

In Kabyle, the word for "sing" is šnu. No other Berber language is known to have a similar word for sing (see Nait-Zerrad, s.v. CN), and both the verbal noun and its plural are formed on an Arabic pattern (ššna, pl. ššnawi); so one is almost forced to look to Arabic for its origins. But ask the average Arabic-speaker in modern-day Algeria, and they'll tell you they've never heard any such word.

In Classical Arabic, there is a fairly rare verb šani'a شنئ, meaning "to hate", probably best-known from the third verse of Surat al-Kawthar: 'inna šāni'aka huwa l-'abtar "For he who hateth thee, he will be cut off (from Future Hope)". (Cognate words are found elsewhere in Semitic, for example Hebrew śānē', Syriac snā "hate".) This has barely survived in spoken Arabic, but (according to de Prémare) the causative šənnā is still used in Tangier (Morocco), meaning "to taunt someone by showing him something he wants that you won't give him."

Phonetically, šani'a is a perfect match for šnu (the glottal stop/hamza becomes y in colloquials, and Arabic final-y verbs normally end up in Kabyle as final-u, for reasons I won't go into) - but semantically, surely this is absurd?

So I would have thought, until, idly browsing through a glossary of the rather conservative Bedouin Arabic dialect of the Nefzaoua area in southern Tunisia (Boris 1951), I found the following entry:
شنى šnệ... inacc. yẹ́šni...; noms d'act. šänyân et šạ́ni: 1) "critiquer en vers, faire la satire"... 2) "détester".

شنى šnē... impf. yašnī...; verbal nouns šanyān and šany: 1) to criticise in verse, to satirise... 2) to hate
"Hate" to "criticise in verse" is a credible change, and so is "criticise in verse" to "sing". Suddenly, a connection that looked impossible becomes almost obvious.

In this case, as in many others, Kabyle has preserved an Arabic word that almost every Arabic dialect in North Africa has lost - but to make sense of the connection you have to look at a wide range of Arabic dialects, not just checking Classical Arabic and stopping there. The converse also applies: when looking into Berber loans into an Arabic dialect, it's not enough to look just at the Berber spoken next door. People move around, and words that were familiar in one generation may be forgotten in the next one.

Of course, if the Nefzaoua data weren't available, there's no way you could accept a comparison like this - and, if several thousand years had passed since the word was borrowed, instead of less than 1500, that intermediate step probably would not have survived. In other words, semantic change can rather easily erase connections beyond any reasonable hope of retrieval. This is one of the main difficulties in long-range historical linguistics - the further back you go, the more cases like this.

Thursday, June 24, 2010

Why they thought the Berbers came from Yemen

A long-standing tradition in North Africa, convincingly rejected by Ibn Khaldūn but perpetuated by poets and curricula alike, claims that some major Berber tribes descend from Yemeni Arabs through semi-mythical pre-Islamic kings and their wholly mythical vast conquests. This idea has little to support it, and probably became popular because it allowed these tribes to claim prestigious connections in the context of a high culture dominated by Arab ideas; but why should the connection be specifically Yemeni, rather than, say, North Arabian or perhaps Persian? Linguistics suggests a possible answer.

In southern Arabia live several groups, most famously the Mehri tribe, whose languages, though Semitic, are only distantly related to Arabic, and quite incomprehensible to other Arabs. (You can hear recordings of it at SemArch.) Recently I borrowed a copy of the recently published Mehri Language of Oman, by Aaron Rubin; looking through it, I could see several points where Mehri resembles Berber but not Arabic that a traveller might seize on, notably:
  • -s ـس "her", -sən ـسن "their (f.)"; compare Siwi -nn-əs ـنّس "his/her", -n-sən ـنسن "their (m/f)". A 3rd person in -s was found in proto-Semitic, as shown by Akkadian, but was replaced in Arabic.
  • əl ال "not" (preverbal first element of negative); compare Tumzabt ul أُل. Again, this is found in Akkadian and hence must be proto-Semitic.
  • -ət ـت feminine singular; compare Siwi -ət ـت (feminine singular in Arabic borrowings.) Again, the connection is real, but dates back to proto-Semitic rather than indicating any special relationship between the two.
  • -tən ـتن feminine plural; compare Berber -tən ـتن (plural of some masculine nouns)
  • a- أَ used as a definite article for some nouns; compare Berber a- أَ(masculine singular noun prefix). A striking case is Mehri a-məsge:d أَمسجيد vs. Siwi a-məzdəg أمزدج "the mosque". However, in Mehri this indicates definiteness, and does not depend on gender; this is probably a coincidence.
  • tə-...-əm تـ...ـم second person plural imperfective, eg təkə́tbəm تكتبم "you (pl.) write"; compare Berber t-...-m تـ...ـم. The t- is cognate; not sure about the history of the -m offhand.
  • 'ār آر "except, but"; compare Tuareg ar.
  • ā آ "oh" (vocative); compare pan-Berber a أ. (This is actually found in Classical Arabic as well, أ, but is not widely used.)
None of these similarities in fact imply any close relationship between Berber and Mehri, of course; some are coincidental, while others can be traced back to proto-Semitic, and hence constitute evidence connecting Berber with Semitic, not specifically with Mehri. However, a medieval traveller between Yemen and North Africa would not have known that, and could easily have observed similarities like these and leapt to the seemingly plausible conclusion that Berber was connected to the language of these Yemeni tribes, who, like many Berbers, seemed to live just like Arabs yet speak totally differently.

Wednesday, August 20, 2008

Triliterals in strange places

In a grammar I was looking at lately, I came across the following sentences:

"Nouns may be verbalized, or verbs nominalized, simply by bringing the stem into a suitable rhythmic form... Most of the rhythmic patterns call for a tri-consonantal stem. If a stem is di-consonantal in its primary form, a consonant (usually the glottal stop) is added to give it the proper structure... Often in the course of forming derivatives, stems that are too long are forced into one or the other of the regular patterns. They are cut down by the loss of quantity or of vowels or consonants as may be necessary."

Was this a Semitic language, or perhaps some less well known Afro-Asiatic cousin? No: this was Sierra Miwok, the pre-conquest language spoken by the Native Americans of central California inland from the Bay. (See map.) The "rhythmic patterns" only involve changes in quantity and CV>VC metathesis, not insertion of specific vowels as in Semitic, but the parallel is striking. Here are a few examples:

leppa- "to finish", with a CVCVCC pattern imposed, becomes lepa''- (gaining a glottal stop).
ṯolookošu- "three", with a CVCCV pattern imposed, becomes ṯolko- (losing the š).

Compare Arabic:
'ab- "father", with a 'aCCaaC plural template imposed, becomes 'aabaa'- "fathers", gaining a glottal stop (historically a semivowel, but never mind that)
`ankabuut- "spider", with a CaCaaCiC plural template imposed, becomes `anaakib- "spiders", losing the t.

Freeland, L. S. 1951. Language of the Sierra Miwok. Baltimore: Waverley Press.

Wednesday, July 04, 2007

Chenanith b'Libya - in the 11th century AD?

Anyone interested in North African languages who doesn't speak Dutch should immediately check out Bulbul's posting on Latino-Punic. The Phoenicians brought their language with them to North Africa when they founded Carthage and other cities. Carthage was destroyed, of course, but many other cities continued to speak Phoenician for longer; however, like Arabic in more recent times, it changed a lot under Berber influence, and this later dialect is usually called Punic. This language was spoken by St. Augustine, who quotes a number of Phoenician words, such as salus (< shalu:sh < shalo:sh < shala:sh < thala:th) "three", in his works. In eastern Libya, as it happens, Punic continued to be written even after the Phoenician alphabet was forgotten; this body of inscriptions, using the Latin alphabet to write Punic, is called (logically enough) Latino-Punic, and a comprehensive database of such inscriptions is available from Leiden. Recently, as Bulbul points out, a thesis was submitted at Leiden on Latino-Punic and its Linguistic Environment; I would love to read it.

The twist in this tale is that Phoenician may have survived into the 11th century AD! Al-Bakri (whom I've mentioned before) enigmatically says of the inhabitants of Sirt in Libya that:
لهم كلام يراطنون به ليس بعربي ولا عجمي ولا بربري ولا قبطي ولا يعرفه غيرهم
‍They have a speech in which they jabber which is neither Arabic nor Ajami (by which he probably means Latin but might mean Persian) nor Berber nor Coptic, which no one but them knows.
The location (in eastern Tripolitania) is about right for it to be Punic, and if it were Greek you would expect him to know, considering he cites (more or less correctly) the Greek etymology of طرابلس (Tripoli) in the next page. So was Punic still spoken in the 11th century? Your guess is as good as mine, but it looks plausible.

Sunday, June 17, 2007

Ugaritic inscription

Last weekend I got a chance to indulge my longstanding passion for ancient Semitic languages at the Louvre. The Ugaritic collection was, as you might expect, especially good; I took many photographs, including this particularly clear one here, a ceremonial axe from the 13th or 12th century BC. Since the Ugaritic alphabet only contained some 30 letters, it's easy enough to read the inscription (turn it 90 degrees counterclockwise), although no word dividers are present:

xrṣn rb khnm

which the museum caption translates as "la hache du Grand Prêtre". xrṣn is presumably "axe"; I can't find it in my small dictionary, but it looks like it might be related to xurāṣ "gold" (itself cognate not only to Hebrew ḥarūṣ, but also to Greek chrysos, a Semitic loanword.) rabb- means "great one", identical to Arabic ربّ "lord" and cognate to Hebrew rav "great one; rabbi". kāhin- is "priest", identical to the Arabic كاهن "soothsayer" and cognate to Hebrew kohen "priest" - yes, the same word from which the surname "Cohen" comes from. -īm is the oblique plural, identical to Hebrew -īm (which however is no longer inflected for case) and cognate to Arabic -īn. Once you start looking, it's so easy to spot the connections between Semitic languages; no wonder people a thousand years ago noticed.

Friday, March 02, 2007

Destroying Harsusi

I just came across some incredibly unenlightened reporting from Al Watan on one of the more endangered South Arabian languages (not, pace the article, a dialect of Arabic - in fact, it's less closely related to Arabic than Syriac or Hebrew are):

وتحدثنا المعلمة شيخة بنت راشد الهنائي إحدى المشرفات على الفصل التمهيدي ومعلمة مادة التربية الإسلامية بالمدرسة قائلة : الفصول التمهيدية التي سعت إدارة التربية والتعليم بالمنطقة بتنفيذه في مدارسها وللعام الثاني على التوالي يأتي بالعديد من الأهداف والتي تتمحور في الأساس لتشمل فئة من الأطفال الذين يتوقع التحاقهم بالصف الأول الأساسي في العام الدراسي القادم حيث تأتي في مقدمة هذه الأهداف تعويد الطالب على الجو المدرسي من خلال طابور الصباح والانخراط مع الطلبة في المدرسة والفصل الدراسي وتأقلمهم مع المعلمة داخل القاعة الدراسية وغرس التعاون والجو الاجتماعي في نفس الطالب قبل دخوله المدرسة وإكساب الطلبة العديد من المهارات في القراءة والكتابة والعمليات الحسابية وكذلك العمل على القضاء على اللهجة السائدة والطاغية على أهالي هيماء وهي اللهجة الحرسوسية من خلال الحروف والكلمات العربية الصحيحة لأنه في الحقيقة تواجه إدارة المدرسة عند التحاق الطلبة في الصف الأول مشكلة فتجد المعلمة الصعوبة في تفهم هؤلاء الطلبة من خلال هذه اللهجة الحرسوسية
"The teacher of Islamic Upbringing at the school, Sheikha bint Rashid al-Hana'i [s]aid: "The preschools that the Ministry of Education in the area has undertaken to implement in its schools for the second year running will bring about a variety of goals [...] the children will gain many skills in reading, writing, and arithmetic, and we will work on destroying the dialect which is prevalent and rife among the inhabitants of Hayma, the Harsusi dialect, through correct Arabic letters and words, because it truly presents the school administration with a problem when the students enter first grade, because the teacher finds it difficult to understand these students in this Harsusi dialect." (Al Watan, 15 Apr 2005)
I wonder if her echo of the language policies that half-destroyed Welsh or Native American languages is conscious. Somebody get over there and make some recordings of Harsusi before people like this manage to implement these goals!