Thursday, December 26, 2013

Does Arabic have the most words? Don't believe the hype.

For some time, I've been hearing rumours (from Arabs, of course) that Arabic has the largest number of words of any language. Recently I found one vector for this rumour: Comparison of the Number of Words in Languages of the World, a poster put together by Azzam Aldakhil which has the merit of at least giving the sources for its figures, namely Muʕjam ʕAjā'ib al-Lughah by Shawqī Ḥamādah, 2000. (In a follow-up comment he gives the page numbers, 83-84.) This poster claims that "Arabic has 25 times as many words as English".

Unfortunately for this claim, if you go to the book cited, what you actually find is a calculation of the number of possible roots in Arabic, without regard to whether or not the root actually has a meaning. Such a count includes huge numbers of unused roots such as بزح bzḥ or قذب qḏb, while at the same time lumping together all words derived from the same root; كتاب book, كاتب writer, and مكتب office are three words, but only one root. The result of such a calculation might tell us something about the potential for expanding Arabic, but absolutely nothing about the state of the Arabic language. And since in practice both Arabic and the languages it is being compared to on that poster allow arbitrary long words without real roots, if only in loanwords, it doesn't even tell us much about its potential.

Both the number of Classical Arabic roots with actual meanings and the number of words can be estimated from the classic dictionaries: according to Sakhr's statistics, there seem to be around 10,000 roots, and up to 200,000 distinct words. Roots don't play such a major role in the lexicography of most non-Semitic languages, so it's difficult to compare the number of roots cross-linguistically. But in terms of words, that would be slightly fewer than English (250,000 in the OED, although the poster cites 600,000) and slightly higher than French (over 100,000 excluding proper nouns, according to the Académie Française).

However, such comparisons can hardly fail to be misleading. For one thing, English is much more hospitable towards dialectal and colloquial usages than Arabic is – the OED is full of words marked as Scottish or Northern or slang or whatnot, the equivalents of which would never be accepted by an Arabic dictionary. For another thing, the whole enterprise of counting words across languages runs into apparently insuperable problems, especially when it comes to compounds, which Arabic dictionaries do not normally treat as words. If you include compounds, then compound-friendly languages like German or Turkish or Inuktitut are automatically going to beat all the rest – and all the available statistics that I've seen for, say, English happen to include compounds.

So the best answer is that we don't really know, and that word count, even if we could measure it better, is not a very good measure of a language's expressive power anyway. Some missing words make a genuine difference, as I've discussed here before. But is English really missing out by not having distinct words for male camels (جمل) vs. female camels (ناقة)? Is Arabic really missing out by not having a special word for cornpone, or for scones?

45 comments:

PhoeniX said...

It's interesting these 'vocabulary sizes'. Disregarding any form of counting, I'm often confronted in English with a much broader range of choice in certain basic vocabulary which I would never have in my native language Dutch.

That is mostly because a lot of the basic vocabulary in English has at least two pairs, one of Germanic origin and one of Romance origin.

Pairs like "to eat/to consume" are quite rare to find in Dutch (although technically, consumeren might be a word in Dutch... but well... can't think of a better example right now).

Another pair would be "drink/beverage".

That kind of 'richness' in options which, if anything, only have a semantic distinction in how sophisticated it sound, is almost absent in Dutch. That really does give the impression to me sometimes, that the English vocabulary is much greater than the Dutch is.

Impressionistic measurements probably make a lot more sense to evaluate these kind of things, than numeral evaluations. Haha.

Lameen Souag الأمين سواق said...

Of course Arabic doesn't have an equivalent of English's quasi-systematic Germanic/Romance doublets (as I discussed a while back). On the other hand, it does have quite a lot of other doublets or near-doublets which I take to result from dialect mixing at various periods, encouraged by the demands of rhetorical style – eg "be able" yaqdiru vs. yastaṭīʕu, or all the famous synonyms for "lion" – 'asad, sabʕ, ghaḍanfar... and these have sometimes been supplemented by pairs regionally borrowed from different languages, eg bandūra vs. ṭamāṭim for "tomato", or quraydis vs. jambarī vs. rubyān for "shrimp". Impressionistically I would also say Arabic tends to make fine distinctions of meaning more often than many languages, but that would be difficult to demonstrate. In practice, of course, there's a very good reason why learning the Arabic lexicon should be harder than most other languages: so much of it is much more than doubled by diglossia!

Anonymous said...

Dear Lameen,
I have send you a message on Facebook please check your inbox (other box of unknown members)

Piotr Gąsiorowski said...

In my native Polish (an in many other languages, e.g. Italian and Spanish) it's easy to inflate the noun inventory by the liberal use of expressive derivatives (diminutives, augmentatives, pejoratives). Standard dictionaries normally ignore them unless they have a really distinct meaning or are very frequent. Thus, from żaba 'frog' we get żabka, żabcia, żabusia, żabeńka, żabeczka, żabunia, żabuńka, żabiątko, żabula, żabulka, żabuleńka (dim., often used as terms of endearment), żabsko, żabisko (aug.), etc.

David Marjanović said...

English doesn't have a special word for the opposite of "loud" (German leise – "quiet" has to cover it, which must result in unintended implications sometimes. French even extends a word with an even wider range of meanings (doucement) that include "slow down!".

German, in turn, lacks a word for "turd". In cases of emergency "sausage" or its diminutives are stretched to cover it, disgusting everyone. :-)

David Marjanović said...

Oops. Forgot to close the parenthesis after leise.

qqq said...

Regardless of the number of words as a possible measure of language capabilities to express, I'd say one of the most amazing features in Arabic is the flexibility in sentence construction. Not only that, but also the ability to provide a very distinct meaning and focusing the sentence to a single unique meaning. This is hardly achievable in English. Not only that, did like English more only after I read English Translation of Qur'an Through which I have seen how Arabic construction given much eloquent tone to traditional English. Finally, I have also seen how difficult it is to translate Qur'an to English and how translators suffer to transfer the whole meaning with all included possibilities and shades to English. Arabic is a very clean, logical and expressive language if compared to English.

Anonymous said...

That is correct. I have seen so many translators suffer.

Anthonie said...

The Greek language is officially the richest language with 5.000.000 words and 70.000.000 word types.Even stated in the Guinness Book of Records. The distant number 2 is English language with 490.000 words of which 54.000 is Greek of origin.


1). Guinness book of Records: The Greek language is the richest language in the world
http://eurotalk.com/blog/2013/02/08/so-did-you-know-you-can-speak-greek/

2). The Greek language is the first world language in the world and was spoken throughout the entire ancient world.

3). On top of that. The entire medical, mathematical, scientific, astronomic world uses Greek words. Which does not exist in ANY other language.

4). The Greek language has been in use for at least 3600 years and is the longest continuous still living language in the world.
"Greek has been spoken in the Balkan Peninsula since around the late 3rd millennium BC. The earliest written evidence is a Linear B clay tablet found in Messenia which dates to between 1450 and 1350 BC,[12] making Greek the world's oldest recorded living language." http://en.wikipedia.org/wiki/Greek_language

5).Homer wrote one of the most complex poetry in the entire ancient world is 850 BC, 2850 years ago.

6). The Greek language is the basis of all European languages and in 54.000 words in English is Greek of origin. The total amount of words of Greek origin in all European languages (German, French, Spanish, English, etc) are 500,000 words.

Not only is there not a single language in the world that is the root and has influenced 60% of all world languages(East European, French, Latin, German, English, Spanish, Celtic, Russian, Asian, etc), but it's also the oldest living language still in use. and with the entire scientific, medic, astronomic world using the Greek language, it can only be the Greek language as the richest language in the world, which is logically officially recognized as the richest language in the world.


-Greek language: 5,000,000 words and. and contains approximately 70,000,000 words including, derivatives, medical terms and scientific expressions.
http://hellasfrappe.blogspot.nl/2011/03/over-70000000-words-in-greek-language.html

Lameen Souag الأمين سواق said...

Aaand thank you for demonstrating that it's not just Arabs who feel the need to bolster their self-worth with made-up statistics about numbers of words. For a more serious examination of the question for Greek, see Nick Nicholas:

Lerna I

Lerna II

Anonymous said...

Well said

Anonymous said...

Well said

Unknown said...

Yeah English is poor language if it has to be compared to Arabic. A small example to that is when you refer to relationships:
Aunt (goes to both parents sisters)
But in Arabic it's more specific
خاله which goes to mother's sister
And
عمه which goes to the father's
And the list goes on and on.
If we were to say that Arabic is more specific and English is more general.

Lameen Souag الأمين سواق said...

Kinship terms are certainly more specific in Arabic than in English; in that domain, English is relatively impoverished. But picking a single semantic domain tells us nothing about how the two languages' vocabularies compare overall.

Ignat831 said...

Please don't compare recently colloquialisms which are not included in major Arabic dictionaries such as Lisan AlArab and use that to show how Arabic borrows from other languages. One commenters playing on the ignorance of the reader has claimed that Arabic words have no real root. In fact, any word which done not have an etymology us clearly a non-Arabic word. Each Arabic word can not only be traced to its root, but to the circumstances under which it was created be thousands of years back. Arabic, which has a known estimated number of between 90 to 500,000,000 words. As I am a teacher of English linguistics as well as Arabic and Hebrew, which is in fact a diminutive form of Canaanite Arabic, both acquired languages, I can at test to the stark inferiority of all Indo-European languages to Semitic languages, and to Arabic in particular which has a grammatical feature known as "i'raab". I'raab in the Arabic language gives it clarity and order to such the extent that it is nearly impossible to have a misinterpretation which is unique in the Muslim Holy book, the Qur'an. Many Western Islamophobic apologetics attempt to diminish the overwhelming superiority of a language from which most of their axiomatic expression have been borrowed as well as the very concept of poetry as was borne out of the Islamic inspired European Renaissance. The attempt to present Indo-European languages as superior to Arabic is as futile as comparing Greek mathematics with its stick numbers to algorithms of Arabic, both terms of which Gabriel roots in Arabic.

Lameen Souag الأمين سواق said...

The comment of "Ignat831" is thoroughly confused. Arabic certainly does not have 500 million words, as discussed in detail above, nor is i3raab a particularly special feature - it's just the morphological marking of case and mood, both of which are similarly morphologically marked in many other languages, such as Latin or Sanskrit or Japanese. And, needless to say, poetry long predates both Islam and the Renaissance.

Anonymous said...
This comment has been removed by a blog administrator.
Lameen Souag الأمين سواق said...

Unknown: "You should've probably site that economist article "the biggest vocabulary""

You mean the one specifically linked in my post under "apparently insuperable problems"?

Anyway, Godwin's Law violation + swearing at me = deletion. If you have a point to make, try reposting and making it in a more venue-appropriate fashion - this is not Reddit.

Chris said...

The question seems to have moved from word count to expressiveness where one could argue, English wins since far more people will understand it. Not that I would assert such a specious argument.

qahwagi said...

You pointed out that, according to Sakhr's statistics, Arabic has "up to 200,000 distinct words". However, according to an archived version of the page you linked to (which is currently not operable) (https://web.archive.org/web/*/http://lexicons.sakhr.com/default.aspx), the figure shown under عدد الكلمات is 2,000,000 for the dictionary of الغني and 3.948.160 for the dictionary تاج العروس. Where do you get the figure 200,000? Thank you for your attention.

qahwagi said...

P.S. The link I posted to Sakhr's archived page did not show up correctly. It should be https://web.archive.org/web/20140116232116/http://lexicons.sakhr.com/default.aspx

Lameen Souag الأمين سواق said...

Thanks for the archive link. I understand عدد المشتقات as referring to the number of words defined in the dictionary, and عدد الكلمات as referring to the total number of words in the dictionary including those in the definitions and examples. عدد المشتقات is 195,000, so I rounded up slightly.

qahwagi said...

Thanks for the reply. I should have looked at that chart more carefully; I totally missed the عدد المشتقات column.

Anonymous said...

John
Arabic has 12 million words plus, more than all European languages added together.

qahwagi said...

@Anonymous John ---- and what is the source of your claim?

Anonymous said...

bro you got this wrong

arabic does have the largest amount of words
it has 12,302,912

without roots or something like that

if i were to say with the roots of the word it will be

from 90,000,000 to 500,000,000 words

we have like a 1000 name for lion only

Anonymous said...

Languages such as Greek have similar flexibility in sentence construction.
As for the difficulty of translation, that plays true when translating between any 2 languages that are from different language families. Try translating a complex English novel into Arabic, and you will likewise find that Arabic doesn't have the words to express the original idea as elequontly and accurately. But you can derive the same meaning

Anonymous said...

Prove that it has 12,302,912 words. Where is that list?

flashdrive said...

Refering to comments on Greek being the oldest continuous language are we discounting Chinese which as a written laguage is over 6,000 years old?

RobertP said...

I think many comments are made without the knowledge of other languages and the richness of linguistic constructs that cultures the world over have produced. Just because a language does not have, for example, a dual does not mean it is imprecise. There are just other, potentially less elegant ways of saying the same thing. This also holds true for family relationships, which were in a previous comment noted as particular improverishment of English. One simply says 'my fathers sister' if it really does matter or 'he is my maternal grandmothers nephew'.
I find it alarming that Ignat831, a self-professed "teacher of English linguistics" with a disappointing number of linguistic errors in his English comment, tries to impress the reader with inaccurate details meant to show the superiority of Arabic over English. An actual subject matter expert is expected to give dispassionate and balanced matter of fact accounts, and even if polemic need never resort to inaccurate statements.
The "mine is bigger than yours" attitude or any cultural exceptionalism for that matter seems rather misplaced.

Anonymous said...

Thank goodness RobertP for your levelheadedness in this amazingly puerile thread, even if I do think I learned a couple of things along the way, your entry seemed the very first metathought to appear in an overly long time. I wince at the seemingly unwitting overlay of a domination framework on cultural modes of being - a sorry sight. I would much rather learn about the hidden riches of Arabic in a warm embrace of beauty and sharing amongst ourselves. Each has his and (prolly mostly male thread here, I betcha!) her take on the human condition and you are neither to feel pride nor shame for what each language has flourished into. Bemoan if you wish the extinction of hundreds of languages in our lifetimes. Open curiosity I revel in.

Unknown said...

No disrespect to the to any one , but before you actually learn it , all your reading are wrong , yes arabic as aramaic, assyrian, and sryani, these are the strongest languages, and yes they contain more words , if you just started to learn arabic and you have studied 2 years , you didnt even scratch the cover of how strong deep and heave language is , i can give even native arabic speakers a sentence, and ask them to translate it in English or Latin, it would take at least paragraph to translate, it is not divine, but it is so beautiful it explains it self

Unknown said...

If having 1000 names for lion makes it very rich then Sanskrit has 4000 words for elephant only.

patricianboy44 said...

Ignat831 should stick to Arabic. He absolutely butchers English. He should be “apologetic” for his ridiculous and pretentious commentary.

Unknown said...

Arabic is larger yes, but English is the language of technologu it's so easy abd simple.
Just imagine a computer app in arabic, lol it would go crazy

Anonymous said...

Terrible argument.. China is advancing technology at a fast pace and they have a crazy language and writing system!!! Ghana only uses English in school and I dare you to name me a single Ghanian invention, with all respect to Ghana!

Abdulmalik Yakubu said...

Well, I did try to read the Arabic translation of Harry Potter and it was grossly lacking in expression. The reason? Arabic and English are two languages that belong to very different families.

Translating from one language to another (even from English to Pidgin English spoken in Nigeria) presents many difficulties.

Unknown said...

Look man I need proof of wht ur saying ur these word counts are wrong I need proof u think I'm dumb I'll take what ever u say srry man but I can believe this none sense if u have no proof look bro I'm arab.and my grand father was one of the best Arabic teachers in my country so Ik that wht ur saying isn't true

Anonymous said...

And arabic word for cousin? OH THAT'S RIGHT THERE IS NONE

Anonymous said...

Oh yeah? Finnish has 10000 words for snow

Anonymous said...

Well your writing skills sure make you look dumb

Anonymous said...

It is divine. Like all things.

Anonymous said...

As a born American who also speaks Arabic... English and any Frankish culture/language to Arabic is like comparing McDonalds is to a Michelin Star restaurant.

Anonymous said...

Arabic friends think I'm doolally to refer to my aunty's husband as my uncle.

Anonymous said...

Which cousin? Mother side, father side, mom sister son or daughter...lol
In Arabic they are back to the uncle or aunt which as described above is specific, so you know exactly which cousin you are talking about.
Arabic does have more word than English and French, I know that for sure. When translating Arabic to English some times I have to compromise or explain what I am trying to translate.
In Arabic you can take a root work and make so many words from it. If I take the word elevate maybe I can come up with 4 or 5 words in English derived from elevate, like elevator, elevation...etc. but in Arabic you can make a lot more words from each root. of course the words would have meanings unlike what is described above.