CAPHI: CAMeL Arabic Phonetic Inventory

1. Mission Statement

CAPHI is designed to provide a system for transcribing all sounds found in all the dialects of Arabic, including Modern Standard Arabic (MSA) in a simple and objective way, but still maintaining enough complexity to distinguish meaningful differences between dialects.


2. Goals

  1. Coverage - cover all known dialects
  2. Simplicity - ignore minor differences in sounds that only linguists are trained to notice
  3. Extensibility - CAPHI can be used to describe dialectal differences that have not been previously documented or studied
  4. Representation - The writing system is precise enough to facilitate the above goals without being unnecessarily complex
  5. Intuitive form - The CAPHI alphabet was designed to have an intuitive relationship between the letters and the sounds they stand for, making it easy to learn and use

3. Explanation

The CAMeL Arabic Phonetic Inventory (CAPHI) is a system for representing, i.e. transcribing, the production of Arabic utterances in any dialect, from Modern Standard Arabic (MSA) to the regional colloquial varieties. CAPHI represents every significant sound in all Arabic dialects with a unique letter, meaning that it can be used to represent different pronunciations of words that would otherwise be spelled in the same way according to MSA, CODA, Arabizi, or other Arabic spelling standards. Furthermore, the one-to-one relationship between the letters of CAPHI and the sounds they represent means that there will never be uncertainty regarding how to spell a given utterance. The sounds represented in CAPHI consist of all the sounds that can be used to distinguish meaning in any dialect as well as other sounds which are confusable with sounds that distinguish meaning. A sound distinguishes meaning if you can exchange it for another sound in some context and change the meaning of the word in question. For Example:

  1. Sounds that distinguish meaning: /q/, commonly represented as ق; and /k/, commonly represented as ك, are two sounds used to distinguish meaning because exchanging one for the other can change the meaning of a word, i.e. /q a l b/, قلب, “heart”; and /k a l b/, كلب, “dog”. If a sound distinguishes meaning in any single dialect, we include it as a sound in CAPHI, even if it is not used by any other dialect to distinguish meaning.
  2. Sounds that do not distinguish meaning but are included in CAPHI: any sound in one dialect which does not distinguish meaning but is confusable with a meaning-distinguishing sound of another dialect is included in CAPHI. Iraqi Arabic has an emphatic p-sound, /p./ as well as a regular p-sound, /p/. The second of which is the only one that is used to distinguish meaning, but by listening closely, one can identify certain environments that cause the speaker to pronounce /p./ instead of /p/. Because many other dialects do not have a /p/, but only a b-sound, /b/, the /p./ is often confused with /b/, which is the most similar sound in many dialects. Thus, /p./ is included in CAPHI as it is useful in describing the dialectal differences between Iraqi and other dialects.
  3. Sounds that do not distinguish meaning and are not included in CAPHI: a sound will not be included in CAPHI if it does not distinguish meaning and is not confusable with any other sound that does. All known Arabic dialects have emphatic consonants which affect the pronunciation of surrounding vowels. Thus the emphatic ص, /s./ in صار, /s. aa r/, “he became” causes the word to be pronounced something like /s. aa. r/ with a long emphatic /a/. On the other hand, in the word سار, /s aa r/ “he marched” the pronunciation of the vowel is unaffected, /s aa r/. While the emphaticness of the consonant distinguishes meaning in these two words, the resulting emphaticness that is pronounced when uttering the /a/ sound in صار, /s. aa. r/ is never used to distinguish meaning. It is just the result of the position of the vocal tract being affected by the preceding consonant. Thus, /aa./ does not exist in the CAPHI alphabet and neither do any such emphatic vowels because they would never be confused with any meaning-distinguishing sound except for the very sound from which they were derived. That is to say that /aa./ would only ever be confused with /aa/ and /uu./ with /uu/, etc. so we do not ask users to make these distinctions in pronunciations which can be very difficult to detect. However, in some words an emphatic consonant will cause some of its non-adjacent neighbors, even consonants, to become emphatic (التفخيم بالمجاورة). For example, the words صوت and سوط are pronounced the same in MSA: /s. a w t./. For such words, all emphasized consonants will be indicated regardless of the spelling of the word. For example the Levantine word فستان is pronounced /f u s. t. aa n/, despite being spelled without any emphatic consonants.
  4. Foreign and Borrowed words: occasionally foreign words are borrowed but retain some foreign pronunciation features. For such words, we will force the CAPHI representation to be limited to the CAPHI phonological symbol set. A special symbol (~) will be added to mark that this is an approximation.

    • Examples:
      1. blanc (French 'white', IPA /blɑ̃/) ==> /~ b l o n/
      2. train (French 'train', IPA /trɛ̃/) ==> /~ t r a n/
      3. musée (French 'museum', IPA /myze/) ==> /~ m y u z e/

4. Guidelines for writing in CAPHI

4.1 General Instructions:

  • Memorize the CAPHI Phonetic Inventory (see bottom of this page) and refer back to it as often as necessary.
  • Separate all sounds and boundary markers with a space. Begin and end CAPHI representations with a “/” (no need to put a space before or after these slashes).
  • Word boundaries can be represented with a “#” when necessary. Be sure to describe the entire phrase as you would pronounce it naturally without thinking of orthographic boundaries, and only place the # marker afterwards.
  • Say the word or utterance out loud and try to imagine a natural situation in which you would say it.
  • Pay attention to how an utterance is said and not to how it is written, keeping in mind that pronunciation can change depending on the context of the utterance.
Arabic (Levantine) English CAPHI
باكْتِب اِسْمِك يا بِلادِي
عالشَمس الِي ما بِتغِيب
I write your name o’ my country
on the sun that never sets
/b a k t i b # 2 i s m i k # y a # b l aa d i #
3 a sh sh a m s # i l # m aa # b i t gh ii b/

4.2 Special Considerations:

  • Long vowels: long vowels are not separated by a space /aa/ , /ii/ , /ee/, /uu/, and /oo/ , keep in mind that these are representations of vowel length and are not two separate vowels.
  • Gemminations: if you are not sure whether a consonant at the end of a word is doubled or not, such as in the Egyptian word سِمّ, “poison”, you can reassure yourself by comparing it to a similar word such as سِمسِم “sesame” and notice how the length of the /m/ sound decreases.
Arabic (Egyptian) English CAPHI
Often Doubled سِمّ poison /s i m m/
Not doubled سِمسِم sesame /s i m s i m/
  • Doubling of Affricates (like /dj/ or /tsh/): these sounds can be represented phonologically as doubles i.e, /dj dj/ (Erwin 2013).
  • Sun and moon letters: the definite article ال is often pronounced when followed by a moon letter, such as in MSA الكتاب, “the book”, /2 a l k i t aa b /, but is not pronounced when followed by a sun letter such as in الشمس, “the sun”, /2 a sh sh a m s/. Notice that different dialects may differ in how they choose to pronounce these letters. Egyptians for example are known to say both /i k k i t aa b/ and /i l k i t aa b/, neither of which is necessarily wrong. The goal is to capture how words are pronounced, not how any one person believes they should be pronounced.
  • Vowel length and sentence syllabic rhythm: pay attention to vowel length as it can vary depending on the context, such as when a word is attached to affixes. Listen carefully to how vowels and the rhythm of an utterance are modified by other parts of the sentence. Make sure to represent the vowel length in the particular context in which it is presented.
Arabic (Egyptian) English CAPHI
شَاف He saw /sh aa f/
ما شافش He did not see /m a # sh a f sh/
  • Word initial vowels and the glottal stop /2/ : utterances beginning with a vowel have a physiological proclivity to begin with the /2/ associated with the hamza and might disappear in the middle of an utterance. Write it only if you hear it.
  • Multiple pronunciations: CAPHI allows you to represent the phrase in the way it is actually uttered. The following examples show a couple of different ways in which a person may pronounce a phrase. Either one can be correct, depending on how the speaker produces it.
Dialect Arabic English CAPHI
EGY يا اهبل O’ stupid /y a # 2 a h b a l/
EGY يا اهبل O’ stupid /y a # h b a l/
LEV عِنْدِي اِمْتِحان I have a test /3 i n d i # 2 i m t i 7 aa n/
LEV عِنْدِي اِمْتِحان I have a test /3 i n d i # m t i 7 aa n/
  • Vowel clusters: if you think you hear two vowels one after another, make sure that there is no glide consonant /w/ or /y/, or a glottal stop separating them, because most if not all Arabic dialects do not have adjacent vowels as a phonological rule.
  • In the middle of a word: to transcribe the MSA word ثور (bull), we prefer /th a w r/ and not /th a u r/. Similarly, to represent how ثور (bull) is pronounced in the Egyptian dialect, we prefer /t. oo r/ over /t. o u r/.
Dialect Arabic English Correct CAPHI Incorrect CAPHI
MSA ثور bull /th a w r/ /th a u r/
EGY ثور bull /t. o o r/ /t. o u r/
  • Long vowel vs short vowel-glide: for long vowels that fall in the middle of the word, such as in the Egyptian ثور (bull) example, avoid writing /t. o w r/, as /t. oo r/ is preferred. This distinction is difficult to make and should not produce any minimal pairs. Thus we make this simplification to avoid any confusions in transcription. This applies to all such long vowel constructions, where we encourage you to write /oo/ and /uu/, not /o w/ or /u w/; as well as /ee/ and /ii/, not /e y/ and /i y/ as is shown in the following examples:
Dialect Arabic English Correct CAPHI Incorrect CAPHI
MSA بيت house /b a y t/
EGY بيت house /b ee t/ /b e y t/
MSA ثور bull /th a w r/
EGY ثور bull /t. oo r/ /t. o w r/
  • Vowels in the middle of the word and the glottal stop: sometimes in rapid speech, the glottal stop between two identical vowels is not fully pronounced. Consider the MSA word سَأل, “asked”, which in Egyptian, may be pronounced with a less distinct /2/ sound. However if you pay close attention, the glottal stop is still there, just enunciated less. Thus /s a 2 a l/ is a better transcription than /s a a l/, or /s aa l/, which would be the spelling for an entirely different word سَال, “flowed”, and you can hear the difference in these Egyptian pronunciations if you pay close attention.
Dialect Arabic English Correct CAPHI Incorrect CAPHI
EGY سَأل asked /s a 2 a l/ /s a a l/
  • Vowels at the end of a word: Keep in mind that not all words that are spelled with a word-final long vowel are necessarily pronounced as such. Most if not all dialects will have short vowel endings for words such as ذَكِي, “smart”, صَبِي, “boy”, etc. That said, في, “there is”, /f ii/, is often pronounced with a long vowel in many dialects, while في, “in”, is often shortened when uttered in context. Also, sometimes word final short vowels can be elongated when a suffix is added. For example, the Egyptian word عَمَلِي (practical m.) is usually pronounced /3 a m a l i/, while عَمَلِيَّة (practical f.) is usually pronounced /3 a m a l i y y a/. Again, listen to how it is said and not how it is written.
Dialect Arabic English CAPHI
EGY عَمَلِي Practical (m.) /3 a m a l i/
EGY عَمَلِيَّة Practical (f.) /3 a m a l i y y a/
EGY فِي عَرَبِيَّة There is a car /f ii # 3 a r a b i yy a/
EGY فِي الكَبارَيه In the cabaret /f # i l k a b a r ee/
LEV انا ما فِيِّ احِبَّك اكثَر It’s not in me to love you more /a n a # m aa # f i y y i # h i b b a k # a k t a r/
  • MSA has word final glide consonants that can be geminated, or doubled, whereas this may not be the case with many of these words’ cognates in other dialects, i.e in MSA ذَكِيّ, “smart”, is usually pronounced /dh a k i y y/ and not /dh a k i/ as it would be pronounced in many dialects; عَلِيّ, “Ali” in MSA is usually pronounced /3 a l i y y/ and not /3 a l i/.
Dialect Arabic English CAPHI
Correct MSA عَمَلِيّ Practical (m.) /3 a m a l i y y/
Incorrect MSA ذَكِيّ smart /dh a k ii/
Correct LEV ذَكِيّ smart /z a k i/
Correct MSA ذَكِيّ smart /dh a k i y y/
Incorrect MSA عَلِيّ Ali /3 a l ii/
Correct MSA عَلِيّ Ali /3 a l i y y/
  • Distinguishing between final /h/ and post-vocalic aspirations: many dialectal words are spelled with a word final ـه or ـة which are not actually pronounced, or may be pronounced with a slight puff of air called an aspiration that does not constitute a full /h/ sound, such as the following words: Egyptian Arabic ده “this” /d a/, and proper name سميرة “Sameera”, /s a m ii r a/. To distinguish between an aspiration and a fully pronounced /h/, consider the following test to reset your sense about the difference in pronunciation between a true /h/ and a post-vocalic aspiration. Compare the pronunciation of نبّى /n a b b a/ “to make a prophet” and نبّه /n a b b a h/ “to alert”.

5. CAPHI Phonetic Inventory