Showing posts with label Aramaic. Show all posts
Showing posts with label Aramaic. Show all posts

Thursday, January 04, 2018

Taleb unintentionally proves Lebanese comes from Arabic

So Taleb has jumped back on his hobbyhorse with yet another post on Lebanese not being Arabic; see my previous posts Why "Levantine" is Arabic, not Aramaic: Part 1, Part 2, Part 3, Zombie hypotheses and the Zeitgeist, On finding the sources of shared items. The funniest thing about this one is that he's been helpful enough to provide a wordlist (for his dialect, I presume) that - despite a number of typos, almost all of which increase the apparent similarity between Levantine and non-Arabic Semitic languages - should be enough all by itself to prove to anyone in doubt that Lebanese is clearly descended primarily from Arabic, with very little Aramaic influence and even less from Canaanite/Phoenician. Unfortunately, he wasn't as helpful on the grammar, not bothering to include equivalents from other Semitic languages for the pronouns and verbal conjugations...
But I don't have all day to spend beating this dead horse, and doing etymology properly takes time. So let's just have a quick look at the first page of his wordlist (well, probably the second one - the real first one seems to be missing), and leave the other pages as an exercise for the reader.

Out of these 39 words, 18 seem to be unambiguously Arabic in origin - either they share specific sound changes with Arabic to the exclusion of the rest of Semitic, or they use a root not used in the appropriate meaning elsewhere in Semitic. Only two look like being Aramaic rather than Arabic in origin (and the evidence in both cases is fairly weak): "hand" and the patently non-basic vocabulary word "image". (Taleb would add a third, zalame "man", but this word has an at least equally plausible Arabic etymology, making it ambiguous at best.) The remaining 19 words are ambiguous, and could in principle derive from any of more than one Semitic languages - but even there, the situation is not symmetrical; all 19 could derive from Arabic, whereas no more than 11 of them could derive from Aramaic. The unambiguous cases give the following ratio: 18 Arabic : 2 Aramaic : 0 everything else. On that basis, we should therefore expect 90% of the ones ambiguous between Arabic and Aramaic (ie all but one) to derive from Arabic, not from Aramaic, and all of the ones ambiguous between Arabic and another Semitic language but not Aramaic to derive from Arabic. For details, see the following table:

1 goat Arabic does not share Canaanite+Aramaic+Ugaritic *nC > CC; does not share Akkadian *ʕa > e
2 god Arabic / Aramaic shows innovative gemination of the l, attested only in Arabic and some dialects of Syriac
3 good innovative the Arabic etymology is obvious, but the root is pan-Semitic so we may generously assume that it could in principle have derived from some other branch
4 grass Arabic does not share Aramaic and Phoenician *ś > s ; does share Arabic *ś > š
5 grind Arabic / Canaanite does not share Akkadian *aħa > ê ; does not share Aramaic CaCVC > CCVC
6 hair Arabic / Ugaritic does not share Aramaic and Phoenician *ś > s ; does share Arabic *ś > š ; does not share Akkadian loss of *ʕ
7 hand Aramaic although a change of *yad > *īd is natural enough that it could easily have happened independently in Arabic...
8 hare Arabic / Canaanite / Aramaic / Akkadian no distinctive innovations
9 he-goat Arabic / Canaanite / Aramaic no distinctive innovations
10 head Arabic / Ugaritic does not share Canaanite *aʔ > *ā > ō nor Aramaic *aʔ > ī nor Akkadian *aʔ > ē ; the form rās (with loss of the glottal stop) is well-attested in early Arabic dialects
11 hear Arabic does not share Aramaic and Phoenician *s > š (I'm going with Huehnergard's reconstruction of proto-Semitic sibilants here). Note that the correct Syriac form is šmaʕ, not sma3 ; likewise the Hebrew
12 heart Arabic The initial glottal stop (still pronounced q in, for example, Alawite dialects) can only be explained from the Arabic form, which is a lexical innovation replacing original *libb
13 honey Arabic 3asal is clearly Arabic, and – as I've pointed out before – dabs is attested in Classical Arabic as well as in Hebrew and Aramaic
14 horn Arabic / Canaanite / Aramaic / Akkadian / Ugaritic no distinctive innovations
15 horse Arabic Syriac ḥsan 'strong' has s, not ṣ, but even if it were cognate, the Classical Arabic and Levantine form still share a semantic shift unattested in Aramaic
16 house Arabic / Canaanite / Aramaic / Ugaritic Akkadian can be ruled out, since it shows a shift *ay > ī which never happened in Levantine.
17 hundred Arabic / Canaanite / Aramaic / Akkadian / Ugaritic The only innovation here, ʔ > y, is not shared with any of the ancient language in question
18 hunger Arabic Even assuming jūʕ has cognates elsewhere in Semitic, the change g > j is specific to Arabic
19 hunt Arabic / Canaanite / Aramaic / Akkadian / Ugaritic The only innovation here, use of the D-stem, is not shared with any of the ancient languages
20 image Aramaic Since when is 'image' basic vocabulary? But yes, assuming we can trust the transcription, it shares the aw with Aramaic
21 inside Arabic / Aramaic Mixed signal here: the meaning looks like Aramaic, but the sound shift g > j is Arabic not Aramaic. In reality, the word *jaww must originally have meant 'inside' in Arabic too; it lost this meaning in Classical Arabic, but kept it in many of the dialects
22 iron Arabic
23 kidney Arabic / Canaanite / Aramaic / Akkadian / Ugaritic The only innovation here, *y > w, is not shared with any of the ancient languages (but _is_ shared with many other modern Arabic dialects...)
24 kill Arabic / Canaanite Does not share Aramaic CaCVC > CCVC
25 king Arabic / Canaanite / Aramaic / Ugaritic Since when is 'king' basic vocabulary?
26 knee Arabic Shares a unique innovation with Arabic – the metathesis brk > rkb
27 know Arabic
28 laugh Arabic Shares a unique innovation with Arabic – the sound shift *ɬ' > ḍ (which came relatively late in Arabic – later than Sibawayh, even – and never happened in any other Semitic language). I can't speak for Amioun, but in general Levantine has ḍaḥak; if Amioun does have ḍaḥaq, the fact that it didn't become *ḍaḥaʔ suggests that the *k > q happened there only after the regular shift *q > ʔ, and hence has nothing to do with the Canaanite or Ugaritic forms.
29 leg innovative The alleged Ugaritic form is nonsense – Ugaritic had no j sound, and the dictionary of Del Olmo Lete and Sanmartin reveals no appropriate Ugaritic form. It is true that the Levantine form seems to be shared with Ethiopic and some Yemeni dialects, but not with any ancient language of the Fertile Crescent.
30 lion Arabic A very problematic choice as 'basic vocabulary'.
31 live Arabic / Canaanite / Aramaic Except that the Levantine form is clearly 'alive', not 'live', making the whole comparison problematic....
32 love Arabic The Arabic is of course mistranscribed - in his terms, it should be 2a7abba, whereas the Hebrew and Aramaic forms really do have a h.
33 make Arabic
34 man innovative 'zalame' is etymologically problematic – both Arabic and Aramaic etymologies have been proposed. 'rejjel' is of course from Arabic. dakar is 'male', not 'man'.
35 many Arabic
36 meat Arabic This shares a specific semantic shift with Arabic to the exclusion of the rest of Semitic : « staple food » > « meat »
37 milk Arabic / Ugaritic The root is common to several Semitic languages, but the use of the passive pattern fa3īl in this word is unique to Arabic
38 month Arabic Pretty sure the normal Levantine form is shahr, not sha7r, not that it makes any difference to the etymology – and for sure Syriac 'moon' below is sahrā, not šahrā.
39 moon Arabic

Saturday, January 21, 2017

Semitic languages in two Arabic novels

I've been reading two novels in Arabic lately. Frankenstein in Baghdad, by Ahmad Saadawi, reimagines Baghdad's descent into chaos in the mid-2000s, blending gritty realism with semi-allegorical horror. Samraweet, by Hajji Jaber, is an altogether gentler but still cutting narrative of the Eritrean diaspora, interleaving scenes from the narrator's life in Jeddah with ones from his first visit to Asmara as he gradually realizes the difficulty of being part of either place. Both turned out to share a feature I hadn't been expecting to find: dialogue in other Semitic languages.

In Frankenstein in Baghdad, one of the main characters is an elderly Assyrian woman, Elishawa "Umm Daniel". All her relatives have long since moved abroad, and keep begging her to come live with them where it's safe, so there are few occasions for her to speak anything but Arabic. However, the Assyrians of northern Iraq traditionally speak a variety of Neo-Aramaic, and when she meets her grandson from Melbourne, they have the following fairly elementary conversation (pp. 276-277), which I hope I've transcribed correctly:

"داخي إيوَت؟" (Dāx īwat?) "How are you?"
"سباي إيْوَن باسيما" (Spāy īwan basīmā) "I am fine, thanks."

The author of the book seems to be from southern Iraq, so I found it remarkable that he took the trouble to get some Neo-Aramaic dialogue - especially since the copula is appropriately put in the feminine form both times (in Assyrian Neo-Aramaic, even the 1st person singular copula agrees in gender). Probably he felt it would enhance her symbolic status as a reminder of what Iraq once was. Unfortunately, while Aramaic has been spoken in Iraq for almost three millennia, its prospects there are dim: after all these years of war and frequently persecution, most speakers live in Western cities, and unless they're exceptionally good at remaining a distinct, cohesive immigrant group, their descendants seem more likely to speak English or Swedish than Aramaic.

In Eritrea, unlike Iraq, most people have as their first language a Semitic language other than Arabic: Tigre in the north, Tigrinya in the south. So it was less surprising to find an Asmaran waiter on the first page saying "سنّي ما سيام" (?Senni mā syām), which I assume from context means something like "Good afternoon!" However, the occasional glimpses provided into Eritrean sociolinguistics were more eye-opening. The narrator and most of his friends are from a Tigre-speaking background and know how to speak it, but Tigre per se seems to play little part in their linguistic identity. They grew up not only speaking Arabic in the street, but feeling that Arabic is an Eritrean national language, and resenting the government's treatment of it as less central than Tigrinya. When an Eritrean in Jeddah speaks Tigre with him, the narrator assumes it's because he only arrived recently until he finds out, to his surprise, that this person simply "enjoys speaking it, even in Jeddah" (p. 76). It would be interesting to see how this compares to the attitudes of Tigre speakers living in Eritrea: between the prestige of Arabic and the status of Tigrinya, what are the long-term prospects for Tigre?

Saturday, April 23, 2016

Arabic substrate etymologies as urban legends

In Arabic as in English, social networks have a constantly flowing undercurrent of poorly sourced, manipulative stories being shared and reshared by people who vaguely think they sound right. Over the past, say, five years, I've noticed the emergence of a linguistically interesting new subgenre within this miasma of lies and half-truths: etymological tables purporting to prove the massive contribution of Berber, or Syriac, or (more rarely) Coptic, or perhaps some other pre-Arab substrate to the local Arabic dialect. These tables, in my experience, never cite an academic source, and rarely cite anything at all; closer examination generally reveals a farrago of correct etymologies and bad guesses. For example (from the preceding links):
  • Tunisian məlɣiɣa ملغيغة "fontanelle" really is from Berber tamelɣiɣt, a word widely attested in Berber and with no obvious Classical Arabic counterpart...
  • but Tunisian gdər قدر "pot" is of course from the Classical Arabic qidr قِدْرٌ, which ought to be familiar even to elementary school students; the Berber cognates cited are borrowings from Arabic.
  • Tunisian bəkkuš بكّوش "dumb, mute" is slightly less obvious, but again from Arabic: it's an irregular expressive formation from 'abkam أَبْكَمُ, substituting the dialectally rather productive suffix -uš. The suffix might be from Berber, but the root is not.
  • Syrian (and Algerian) dālye دالية "grape-vine" may well be from Aramaic; the word is attested in Syriac with the right meaning (dālī-ṯ-ā "vine-branch, vine"), and belongs to a semantic field where Aramaic borrowings are to be expected from a very early period. Within Arabic, this word was already noted as a regional synonym of karmah in the 10th century by the Palestinian geographer al-Maqdisi.
  • However, Syrian mnīħ منيح "good" has nothing to do with Aramaic; it's a local version of widespread dialectal Arabic malīħ مليح, with nasality assimilation. This adjective exists both in Classical Arabic (malīħ) and in Syriac (malīħ-ā) with the meaning of "salty"; in an era where salt was more expensive than now, this naturally tended to imply "tasty". There is no reason to assume either language borrowed this word from the other, since the root is proto-Semitic and the template is productive in both languages. However, only in dialectal Arabic did it go on to develop the sense of "good", which it now has in a wide variety of dialects including North Africa.
  • More problematic is Syrian wāwā, a baby-talk word for "pain" used (as far as I can see) neither in Syriac nor in Classical Arabic. Syriac does have wāy "woe!", but so does Coptic - and, if it comes to it, English "waaah!" is closer than either. Onomatopeia is a better explanation than borrowing or inheritance in this case.

The optimistic take on this is that it shows that there's a real public demand in the Arabic-speaking world for information on etymology and on substrate influence. The pessimistic take is that people just want "information" confirming what they want to believe - in this case, that they're not really that Arab after all. (The converse case also exists, of course - recall Othmane Saadi - but I haven't seen as much of it circulating on social media, though that may just reflect my own bubble.) The reality is probably somewhere in the middle.

Sunday, September 14, 2014

On finding the sources of shared items, OR: The irrelevance of anteriority

Similarities between different languages are data. It's easy to come up with any of several wildly different measures of such similarities, typically by applying edit distances to wordlists (as in the ASJP*) or texts, but the result should not be mistaken for an analysis - it's just a measurement, a compression of the data. It doesn't tell you anything about the causes of these similarities on its own. Historical linguistics is not the measurement of similarities, but the effort to find the hypothesis about past events that best explains them. Your H0, of course, is always "coincidence". Once you've rejected that, you're left with the trickier task of disentangling contact from common ancestry - trickier because, quite often, they partially overlap.

To understand linguistic causation in the past, an essential starting point is to look at it in the present. Suppose that you are a native speaker of English:

  1. If you say "football" or "garage" to your child while speaking English, it's because you grew up speaking English, and you know that this is what other English speakers say. The fact that French speakers happen to call it "football" too, if you're even aware of it, has nothing to do with your choice of words.
  2. If you say "football" or "garage" to your child while speaking French, it's because you later studied French, and you know that this is what French speakers say. The fact that it's also what English speakers say no doubt made it easier to memorise, but if French speakers had named them something else, you would be doing the same.

We thus see that, for shared words, inheritance from either of two radically different languages can yield precisely the same outcome. The fact that English and French share these words in the first place is obviously due to contact (in each direction). The fact that your child is growing up with them, however, is because you're faithfully passing on the existing norms of one or the other language, not because you're combining them. In historical linguistic jargon, the use of the word "football" is at this point being inherited, not borrowed. Thus, if an English-monolingual Cajun says "stupid", it's not because he's managed to hold on to his ancestors' French word "stupide", it's because that happens to be the English word for it.

So, if we have a word in language A, and find the same word in two potential source languages B and C, we can't determine which it came from by looking at which language was spoken in the area earlier, or which was spoken by the speakers' ancestors. We can only determine which it came from by determining which language (if either) was transmitted as a whole, and the evidence for that can only come from forms that aren't shared between B and C. I leave the application of this to Levantine ʕāmmiyya as an exercise for the reader.

* It's beating a dead horse at this point, but: this Automated Similarity Judgement Program? It, too, finds that Levantine is way closer to Standard Arabic than to Aramaic, just like any historical linguist could have told you from the start.

Saturday, September 13, 2014

Zombie hypotheses and the Zeitgeist

Everything I've been saying for the past 3 posts is basic textbook stuff, reflecting a stable consensus among Semitic historical linguists over, oh, the past two centuries or so. Why, then, is this zombie hypothesis that Levantine Arabic comes from Aramaic still popular in parts of the Levant? That's no great mystery: it comes from a more general movement to emphasise Levantine (and especially Lebanese) culture's continuity with the pre-Islamic Levant, and downplay the influence of Arabs. (Similar efforts have been made in North Africa, notably Abdou Elimam). As far as I can tell, the unstated reasoning goes something like this:
  1. Levantines are descended from the Aramaic-speaking natives of the land, not from Arab immigrants.
  2. Levantines' language contains a lot that sounds like Aramaic.
  3. Therefore, Levantine is a continuation of Aramaic, not of Arabic.

Step 3, of course, does not follow from Steps 1 and 2. Step 1 is irrelevant to the whole question; the language of your ancestors is very often not the ancestor of your language (ask any Irishman, or any Egyptian). Step 2 is necessary but insufficient for getting to Step 3, since the statement is just as true of Classical Arabic - or of Akkadian, or Ethiopic - as it is of Levantine; we've already seen that deciding linguistic ancestry requires a more sophisticated toolkit.

Nevertheless, this impulse to emphasise continuity and downplay movement deserves more attention. In the Arabic-speaking world, the conspicuous problems with the existing political and economic order, and the humiliating contrasts between the ideals of pan-Arabism and the reality of closed borders and unchallenged occupations, provide an obvious local motivation to downplay Arab identity, and language is so central to pan-Arab identity that it could hardly be left unchallenged. But the impulse is not unique to the region; in some respects, it faithfully reflects wider intellectual trends of the late 20th/early 21st century.

During this era, immediately following some of the largest migrations and invasions in human history, many archeologists and historians have come to feel more and more uncomfortable with the very idea of either. Changes in material culture previously seen as the result of migration were re-explained as diffusion or independent innovation, and reports of barbarian invasions were reinterpreted or dismissed. In some ways, this has been a useful corrective to a previous era's overemphasis on migration; it has arguably made linguists more conscious of the familiar fact that language shift does not necessarily imply invasion, much less population replacement. In others, its influence has been rather less helpful. Linguists reached the late 20th century with a well-tested toolkit for studying the origins of basic vocabulary and morphology, its predictions spectacularly confirmed by such discoveries as laryngeals in Hittite and labiovelars in Mycenaean Greek. Applying this to most Old World languages, and many American or Australian ones, yields a story of discontinuity (be it through language shift or population replacement) that would be familiar to any 19th-century philologist, but that grates somewhat on postmodern ears. Of course, the same toolkit often allows us to detect substrata - elements left over from the population's previous language after they shifted to another one - but that's not enough to satisfy everybody.

A few linguists have responded by trying to change the rules of the game, insisting that the origins of a language should be determined not by vocabulary and morphology, as is normally done, but by purely structural features. This is an important component of Wexler's generally rejected claims that Yiddish is non-Germanic (and that Modern Hebrew is non-Semitic), and is the very essence of Lefebvre's somewhat more popular claims that Haitian Creole is just relexified Fongbe (and almost anything else with "relexification" in the title.) This approach runs into severe problems almost instantly - establishing the history of syntactic or semantic patterns is far more difficult than establishing the history of vocabulary or morphology, simply because the former are far less arbitrary and are chosen from a far smaller set of possibilities. To make matters worse, we also find major discontinuities in such patterns in cases where both the population and the vocabulary were relatively stable, such as the transition from Old English to Modern English. Johanna Nichols' efforts point towards the possibility of getting around this by identifying highly time-stable typological features, but the results, at their best, are not nearly fine-grained enough to support narratives of continuity in any specific location. "Continuitarians" in the Arab world apparently haven't gotten around to adopting this approach yet, except occasionally in Morocco, where academic linguistics is unusually advanced for the region; they surely will, however, when they realise that it could be extended to cases like Egypt, rather than being limited to the Fertile Crescent.

For much of the world, especially Europe, a complete lack of ancient written documentation makes another response available: simply argue that the language currently spoken there must have been spoken far earlier than previously assumed, and hence got there not through invasion but through some more peaceful process. This yields the various Paleolithic Continuity Hypotheses. The main problem with this for linguists is that it forces us to postulate a much lower rate of linguistic change for the past than is observed for languages with a long written history, or even for unwritten languages that happen to have been recorded as long intervals; as a result, these hypotheses have remained fairly unpopular. For the Middle East, however, the point is moot: writing has a longer history there than anywhere else on the planet, and that history reveals regular episodes of language extinction, language shift, invasion, migration, exile, and everything else that we're supposed to be de-emphasising.

So if you really want to emphasise your languages' continuity with your ancestors', these are two more promising ways to do it. But I would suggest that there's no reason to bother. If your current identity isn't working out for you, and you don't think you can reform it, why not work on creating a genuinely new one, rather than perpetuating the obsession with heritage by digging around in history for an even older one? It worked out pretty well for America, after all.

Thursday, September 11, 2014

Why "Levantine" is Arabic, not Aramaic: Part 3

We've seen that historical linguists decide which languages share a more recent common ancestor on the basis of shared innovations (or their absence). But if you're paying attention, you may have noticed a potential problem here: innovations can be shared for at least three reasons:
  • Common ancestry - the reason why, for example, Proto-Indo-European intervocalic *s has changed to r both in Spanish and in French.
  • Contact - for example, the change of r (the rolled r you get in Spanish) to R (the uvular r you get in French) started in French, but spread to other European languages such as German, probably due to the prestige of French among the upper classes (actually there's some debate about the direction of spread - see eg this paper by Kostakis - but either way it spread through contact)
  • Chance - for example, θ (th) has changed to t both in Jamaican English and in Levantine, but not because they share any common history or close ties.

So, when it comes to shared innovations, what can we do to distinguish the "confounding factors" of chance and contact from common ancestry? There are two obvious general approaches. The most securely reliable is to establish relative chronology: if change A was applied to the outputs of change B, then obviously change B is the older. Unfortunately, many pairs of changes are commutative - the relative order makes no difference to the output. That often forces us to resort to the more probabilistic criterion of number of changes: if language A shares a lot of common innovations with language B to the exclusion of C, and only a couple with language C to the exclusion of B, then it's more parsimonious to group A with B and find some other explanation for those shared with C. For better results, we can weight the innovations according to the chances of them occurring independently: for example, a change of ð > d is rather common worldwide, whereas a change of ɬʼ > ʕ is rather unusual.

Levantine Arabic provides a useful case study: as NNT correctly pointed out, it shares a couple of innovative sound changes with Aramaic, in particular θ (th) > t, ð (dh) > d. (The hamza-y correspondence is a different issue - there's massive variation within Classical Arabic on where and whether hamza is realised, as can be seen from the different Qur'an reading traditions, and the consonantal orthography of Classical Arabic obviously reflects a dialect in which, like the majority of present-day dialects but unlike Modern Standard, hamza was hardly ever pronounced). Yet we have seen that Levantine Arabic does not share most of Aramaic's defining innovations, and does share important innovations of Arabic, such as the reflexes of proto-Semitic *g, *θʼ, *ɬʼ, and (depending on reconstruction) , the replacement of "say" (originally 'amar-) with qāl-, the metathesis of ʕam- "with" to maʕ-, or almost every detail of the extremely intricate broken plural system. How can this be explained?

If the explanation is common ancestry, then we should find the changes θ > t, ð > d only in Levantine words that are not Arabic innovations. In fact, however, we find them in words such as itnēn "two", in which the i- is an Arabic innovation - cp. Arabic iθnayni (acc/gen), Aramaic trēn, proto-Semitic *θn-ay-n(a). This hypothesis would also fail to account for the rest of the observations; if Levantine shares a more recent common ancestry with Aramaic than with Arabic, and is spoken exclusively in an area once dominated by Aramaic, then why on earth did it pick up so many innovations from Arabic while remaining immune to practically all the innovations Aramaic went through except these two? Both the criteria given above therefore point away from common ancestry as an explanation.

This suggests that we should consider contact. At first sight, you might think the answer is simple: Aramaic speakers couldn't pronounce interdentals, so they left them out of their Aramaic-accented Arabic. But that hypothesis would be absurd. By the late pre-Islamic era, all known varieties of Aramaic did in fact have the sounds θ and ð, due to a later development of t > θ, d > ð after vowels (except when doubled). We find these sounds alive and well in the only surviving Levantine Aramaic dialect, that of Maaloula: eg xoθla "wall", ḳrīθa "village", eḥða "one (f.)". Why, then, would Aramaic speakers change these sounds to t, d in Arabic?

How about the opposite contact situation: Arabic speakers living on the fringes of the Aramaic-speaking world copied the shift θ > t, ð > d from their neighbours, while those living further inland stuck with the traditional pronunciation? That is more plausible, but still a bit problematic. The development of t > θ, d > ð had already happened by 250 BC in Aramaic, so the shift would have to have been borrowed before that; but Arabic-speaking groups which used Aramaic as their high language, such as the Nabataeans or Petra, are only well-attested later than that.

A third, more subtle contact explanation seems preferable. Aramaic speakers would certainly have taken advantage of the many similarities between Aramaic and Arabic to reduce the burden on their memories. But, whereas θ and ð are extremely common in Aramaic, in Arabic they are quite rare: in the Qur'ān, t is ten times commoner than θ, and while ð is about as common as d overall, practically all of its occurrences are limited to demonstratives. A good rule of thumb for the Aramaic learner of Arabic to apply would therefore be "replace Aramaic θ, ð with t, d except in demonstratives"; 9 times out of 10, the result would be correct Arabic, and the 10th time it would still be comprehensible. In such an environment, where Aramaic-speaking learners of Arabic outnumbered native speakers, it's not hard to imagine the distinction disappearing. If so, the loss of interdentals in Levantine would indeed reflect Aramaic influence - as a result of Aramaic speakers' effort to avoid Aramaic forms!

Tuesday, September 09, 2014

Why "Levantine" is Arabic, not Aramaic: Part 2

Last time, I promised to look at the "ratio of content ⊂ Arabic & ⊄ Aramaic". To do that, we need two things: data on the frequency of different words and morphemes, and etymologies for each word and morpheme. If this were English, I could offer you a 450-million-word online digital corpus for the former, and the OED for the latter. For Levantine Arabic the pickings are a bit scantier. There are indeed several digital corpora of Levantine Arabic, but none of them are publicly available, and none have published any frequency data that I can find offhand; and for etymologies, you have to consult, by hand, as many dictionaries (of several languages) as it takes.

So for present purposes, I will use a much smaller substitute, which can hardly be accused of any partiality to Standard Arabic: namely, a selection from Said Akl's Roomyo w Julyeet (CORRECTION: introduced by Said Akl), which I was lucky enough to run into at an Oxfam a few years ago. I picked a well-known section of the play whose language seemed relatively simple, with little or no visible Standard Arabic influence - the lines starting from "Romeo, Romeo, wherefore art thou Romeo?" (p. 62), including Romeo's reply and Juliet's reply to him (finishing on the second line of p. 63) - and counted morpheme frequencies (retaining his eccentric orthography). The 26 morphemes that occurred more than once account for about two-thirds of the selection, so looking at their etymologies gives us the maximum of information for the minimum of effort - and here they are. Only those that are unambiguously Arabic or unambiguously Aramaic are relevant to our purpose; the rest may be dismissed as "confounding factors":

  1. w(e) و "and" (11 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  2. b(e)- / m- بـ٬ مـ [marker of the indicative imperfect] (10 occurrences): Innovative. This form is found as such neither in Classical Arabic nor in Aramaic, and its etymology poses some difficulties; if you know of any convincing work on this, let me know in the comments.
  3. -aq ـك "you m. sg. oblique" (9 occurrences): Arabic. Both Aramaic and Arabic have cognates of this, but in Aramaic the consonant has changed to kh, whereas Levantine - like Arabic - has kept the original k.
  4. ¢esm اسم "name" (6 occurrences): Arabic. Both Aramaic and Arabic have cognates of this, but in Aramaic the consonant is sh, whereas in Levantine - as in Arabic - it's s. (There is controversy over which value is original.)
  5. la "no, not, neither... nor" (5 occurrences): "Confounding". The form is shared identically by Arabic and Aramaic; the usage is actually closer to Arabic (where it negates verbs only in the imperfect and the negative imperative) than to Aramaic (where it negates verbs in all tenses), but we'll score it as shared.
  6. -u / -h / -vowel length (depending on context) ـه "him, his" (5 occurrences): Arabic. Aramaic -eh could explain the h form and the vowel length form, but the -u can be satisfactorily derived only from Arabic -hu.
  7. quun كون "be" (4 occurrences): "Confounding". In reality this is much more likely to be Arabic, since the normal Aramaic root for "be" is hwy, but kwn is attested in this sense in Aramaic too.
  8. men من "from" (4 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  9. ḍall ضل "remain" (4 occurrences): Arabic. There is no Aramaic source for emphatic D.
  10. (e)l الـ
    • "the" (4 occurrences): Arabic. (Aramaic originally used suffixed -aa, which later lost its definite sense.)
    • [relative marker] (3 occurrences): Innovative, but based on extending the functions of the Arabic definite article, and probably on shortening a form similar to Classical Arabic alladhii, which it resembles rather more than the Aramaic relative marker dh-.)
  11. ¢ent انت "you (m. sg.)" (3 occurrences): Arabic. In Aramaic, the n disappeared, assimilated to the following t.
  12. ma ما "not" (3 occurrences): Arabic. In Aramaic, maa is never used as a negator.
  13. law لو "if" (3 occurrences): Arabic. (Aramaic does not generally use this, but where traces of a cognate are found, as in some frozen combinations, it takes the form luu, not law.)
  14. cu شو "what?" (3 occurrences): Original, from Arabic. Found as such neither in Arabic nor in Aramaic, but its generally accepted etymology is Arabic, from a contraction of أي شي هو "what thing is it?".
  15. sammi "name (v.)" (3 occurrences): Arabic, for the same reason as esm above.
  16. e- / Ø- أـ [first person singular subject marker] (3 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  17. t- تـ [second person masculine singular subject marker] (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  18. -ni ـني "me" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  19. -a ـا "her" (2 occurrences): "Confounding". At first sight the loss of the h makes it appear closer to Aramaic than to Classical Arabic - but the h was also lost in -u "him", which cannot be explained as Aramaic.
  20. -t ـت [feminine singular construct state marker]: "Confounding". The form is compatible with Arabic or Aramaic origins (Aramaic had th, but we would expect that to be turned back into t, since Levantine has no interdentals.) The function straightforwardly existed in Aramaic; in Classical Arabic, it did not, but the pre-pausal pronunciation of -at- as -ah provides an obvious source for it to develop from, and indeed it exists in practically all modern dialects (including those of the Arabian peninsula). If you're feeling really generous, though, you might ignore the latter fact and award this one to Aramaic.
  21. ¢ana أنا "I" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  22. hu هو "he" (2 occurrences): "Confounding". At first sight the Aramaic form huu is closer than Classical Arabic huwa, but loss of final vowels is regular in Levantine Arabic, so you would expect huwa to become hu anyway.
  23. ya يا "oh" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  24. ¢aw أو "or" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  25. xebb حب "love" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
  26. jez¢ جزء "part" (2 occurrences): Arabic. I haven't noticed an Aramaic cognate, but even if there is one, the palatalisation of the j (from original g) marks it as Arabic.

So, out of these 26 items - which together account for 107 out of the 161 morphemes in this selection - 10 are unambiguously Arabic (accounting for 46 morphemes), and none are unambiguously Aramaic. 15 items (accounting for 91 of the morphemes) could equally well be Arabic or Aramaic, and as such are irrelevant to determining which one predominates within Lebanese Arabic. (If you decide to be really generous to Aramaic, you might shift -a, hu, and -t to the Aramaic column, accounting for a grand total of 6 morphemes versus Arabic's 46.) The remaining single item, the imperfect prefix b-, is a later innovation whose history is unclear; even if someone found an Aramaic etymology for it and added it to all the unlikely cases mentioned, the ratio of "content ⊂ Arabic & ⊄ Aramaic" to "content ⊂ Aramaic & ⊄ Arabic" for this list would still be about 3:1. On a less generous and more plausible calculation, it's infinite (46:0). Either way, by this criterion, too, Levantine is Arabic, not Aramaic.

If you pick a long enough text, of course, you will eventually find an Aramaic loan or two. There are quite a few Aramaic loans in Levantine Arabic, depending on the dialect, and they must really stand out to a Levantine speaker studying Aramaic. But even in the most heavily Aramaic-influenced dialects, they occur far less frequently than unambiguously Arabic forms. While historical linguists' usual definition of language origin does not rely on any explicit frequency criteria, in all the cases I've seen, the most frequent source of vocabulary by token count for a sufficiently large text turns out to be what historical linguists would consider as that language's parent. In Levantine Arabic the effect is even stronger, since not only is the basic vocabulary of Arabic origin, so is most of the learned vocabulary.

Now, after all those calculations, I'm sure you're eager to read the lovers' dialogue, so here it is:

جلييت: يا روميو! يا روميو! ليش انت روميو؟
نكور بيك٬ ورفود اسمك٬
أو٬ إذا ما بدك٬ حلوف إنك بتحبني
وأنا ببطل كون من عايلت كابيولت.

روميو: بضل عم بسمعا
أو بحكي معا؟

جلييت: إسمك بس عدوي.
انت، بتضل انت زاتك٬ ولو ما كنت منتغيو.
و شو المنتغيو؟ لا هو إيد ولا إجر
ولا دراع ولا وج ولا أي جزء
من جسم الإنسان؟ آه، كون اسم تاني!
و شو فيه الاسم؟ ال منسميه ورد
لو شو ما سمينا بتضل ريحتو حلوة،
و هيك روميو، لو ما تسمى روميو
كان بيضل محتفظ بهالكمال المحبوب
ال بيملكو بدون عيب. يا روميو، تجرد من اسمك،
ومقابل اسمك ال هو مش جزء منك،
خدني أنا كلي!

And in the original orthography:

Sunday, September 07, 2014

Why "Levantine" is Arabic, not Aramaic: Part 1

Following in a long tradition of people imagining that knowing a few languages or a bit of mathematics implies they already know linguistics better than any self-styled specialist, the quasi-celebrity author Nassim Nicholas Taleb recently decided to claim that "Levantine is modernized Aramaic". (Let's not comment further on the attached table, whose attempt at Standard Arabic is painfully bad, and which omits the whole Aramaic column except for the title. Also, let's not confuse it with the separate question of how distant Levantine is from Standard Arabic.) The ensuing Twitter "debate", while of little value in itself, nicely illustrates a number of common misconceptions, some of them worth responding to in a less cramped medium. I'll start with the most explicitly political one, since it's bound to colour responses to any purely academic argument:

You just call it Arabic because Arabic is used for "high" functions in the region; If we were diglossic Levantine/Aramaic instead of Levantine/Arabic you would say the same.

Less than 90 km from NNT's hometown is a village where they do in fact still speak Aramaic, while of course still being diglossic in Arabic: Maaloula, in Syria. Despite heavy Arabic influence, this village's language has never once been mistaken for Arabic; its own people call it siryêni, and European Semitists recognised it as Aramaic as soon as even simple wordlists became available. If you happen to be Levantine, try listening to some of it (eg here) - how much of that do you understand? The same is true of other relict Semitic languages within the Arabic-speaking world, such as Mehri or Jibbali or Soqotri or Neo-Mandaic. I have more than one book in which Soqotri or Jibbali speakers attempt to prove that their languages are really Arabic, for much the same reasons that NNT wants his language not to be Arabic - but, notwithstanding the speakers' desires, Semitists had no trouble proving that these languages were not descended from Arabic. Conversely, the "high" languages of Malta have always been English and Italian, yet, despite Maltese nationalists' best efforts to show that Maltese was really Punic, European Semitists had no difficulty in identifying it as descended from Arabic. So, no, Semitic historical linguists do not base their decisions on what kind of diglossia happens to be around, nor were all those 19th-century German Orientalists secret agents sent back in time by the Baath Party. To the contrary, almost all Semitists I've known would be far more excited to discover that some undocumented variety was a new Semitic language than to find out that it was "just" another dialect of Arabic.

"Proving" Levantine comes from Arabic rather than Aramaic like "proving" Spanish comes from Italian, not latin.

How do linguists know that Spanish is descended directly from Latin, not from Italian? Simple: we look for cases in which Italian has made a change - innovated - and Spanish hasn't. Such cases are easy to find: for example, in Italian original *fl has become *fi (thus fiore "flower") and original long *e in open syllables has become i (thus di "of"), whereas in Spanish original *fl remains fl, and *e e (thus flor, de). If Spanish were descended from Italian, then these changes would all have had to have happened and then reversed themselves in Spain, which is very unlikely. We can know which form was original not just because in this case we have copious ancient data, but also by using comparative-historical reconstruction. The full toolkit would take too long to explain here (my favourite textbook is Lyle Campbell's Historical Linguistics), but basically, we:

  • establish sets of sounds corresponding systematically to one another;
  • figure out whether these correspondence sets systematically occur only in certain environments, and, if so, see whether there are any other correspondence sets occurring only in non-overlapping environments that they can be unified with.
This procedure allows us to prove that the ancestor language must have distinguished at least as many phonemes as members of the resulting set of correspondence sets, and - combined with a large body of knowledge about likely and unlikely sound changes - gives us a good chance of determining what the actual sound of those phonemes were. This technique was, of course, developed mainly for reconstructing unattested languages, but way back in the 1950s, Charles Hall decided to test it by applying it to Romance. The result was, as you might hope, Vulgar Latin.

Now, let us apply this to Levantine, Arabic, and Aramaic. Reconstructing the common ancestor of Aramaic and Arabic (see eg here or even just here) shows that Aramaic features a number of innovations not shared with Arabic; conveniently, many of these are mergers. In particular, in Aramaic *`, *ʁ (gh), and *ɬʼ (lh) all merge to ` (ayin); *x (kh) and *ħ merge to ħ (heth); initial *w and *y merge to *y. In Arabic, all of these distinctions are maintained. Now, the nice thing about mergers is that they can't be reversed; once two formerly distinct word classes feature the same phoneme, there's no way for the ordinary speaker to recover the distinction. A monolingual Aramaic speaker has no way of telling that the ` in 'ar`ā "earth" (< *'arɬʼ- + -ā) used to be pronounced differently from the ` in ṭar`ā "door", or in `aynā "eye". In Levantine, all of these distinctions are normally maintained, just as they are in Arabic; أرض has none of the consonants of عين. QED. (In fact, historical linguists have also succeeded in identifying some Aramaic loans into Levantine Arabic by finding the small minority of words in which these distinctions were lost.) In fact, you don't even need to look at phonology to figure this out; the grammar provides plenty of clues. In Aramaic, for example, almost every noun ends in , except in a few specific contexts. This is an innovation specific to Aramaic, accomplished by gluing a former demonstrative on to the end of the noun, and preserved in every modern spoken Aramaic variety. In Arabic, it never happened - nor, obviously, in Levantine.

Of course, NNT shows no signs of even being aware of the relevance of regular sound correspondences, mergers, or any of the other elements in a historical linguist's toolkit, much less of accepting them as definitive criteria for language classification. At one point, however, he vaguely expresses the criterion he thinks should be definitive:

To prove that Levantine derives from Arabic WITHOUT Aramaic route you need to finds ratio of content ⊂ Arabic & ⊄ Aramaic. Not done.

Now that we've seen a little bit of how linguists determine what comes from Arabic and what comes from Aramaic, we're ready to look at the results of this criterion in the next post. You should be able to guess the answer already...