Language Tag: AR
Language Description: Arabic
Version: 1.0
Effective Date: 20 July 2007
    Registry: Saudi Network Information Center
Contact: Abdulaziz Al-Zoman
Address: SaudiNIC, General directorate of Internet services, CITC, P.O. Box 75606, Riyadh 11588, Saudi Arabia
Telephone: +966-1-263-9392 Fax: +966-1-263-9393

Relevant Policy Document URL:

This document provides a description of the IDN (Internationalized Domain Names) Language Table to be used by SaudiNIC the .sa TLD registry for the registration of Arabic language .sa domains. These are based on the recommendation of the Arabic Domain Name Pilot Project (

#Characters from Unicode Arabic Table (0600–06FF)
U+0621 Arabic Letter HAMZA
U+0622 Arabic Letter ALEF with MADDA above
U+0623 Arabic Letter ALEF with HAMZA above
U+0624 Arabic Letter WAW with HAMZA above
U+0625 Arabic Letter ALEF with HAMZA below
U+0626 Arabic Letter YEH with HAMZA above
U+0627 Arabic Letter ALEF
U+0628 Arabic Letter BEH
U+0629 Arabic Letter TEH MARBUTA
U+062A Arabic Letter TEH
U+062B Arabic Letter THEH
U+062C Arabic Letter JEEM
U+062D Arabic Letter HAH
U+062E Arabic Letter KHAH
U+062F Arabic Letter DAL
U+0630 Arabic Letter THAL
U+0631 Arabic Letter REH
U+0632 Arabic Letter ZAIN
U+0633 Arabic Letter SEEN
U+0634 Arabic Letter SHEEN
U+0635 Arabic Letter SAD
U+0636 Arabic Letter DAD
U+0637 Arabic Letter TAH
U+0638 Arabic Letter ZAH
U+0639 Arabic Letter AIN
U+063A Arabic Letter GHAIN
U+0641 Arabic Letter FEH
U+0642 Arabic Letter QAF
U+0643 Arabic Letter KAF
U+0644 Arabic Letter LAM
U+0645 Arabic Letter MEEM
U+0646 Arabic Letter NOON
U+0647 Arabic Letter HEH
U+0648 Arabic Letter WAW
U+0649 Arabic Letter ALEF MAKSURA
U+064A Arabic Letter YEH
U+0660 Arabic-Indic Digit Zero
U+0661 Arabic-Indic Digit One
U+0662 Arabic-Indic Digit Two
U+0663 Arabic-Indic Digit Three
U+0664 Arabic-Indic Digit Four
U+0665 Arabic-Indic Digit Five
U+0666 Arabic-Indic Digit Six
U+0667 Arabic-Indic Digit Seven
U+0668 Arabic-Indic Digit Eight
U+0669 Arabic-Indic Digit Nine
# Characters from Unicode Basic Latin Table (0000–007F):
U+002D Hyphen-Minus
U+002E Full Stop (Dot)
U+0030 Digit Zero
U+0031 Digit One
U+0032 Digit Two
U+0033 Digit Three
U+0034 Digit Four
U+0035 Digit Five
U+0036 Digit Six
U+0037 Digit Seven
U+0038 Digit Eight
U+0039 Digit Nine

Some Linguistic Issues

  1. Tashkeel (Diacritics) and Shadda

    They are small singes that are usually put on top or under an Arabic letter for the purpose of correct 
pronunciation which may leads to a different meaning. Al-tashkeel is not a letter by itself but it is a 
mean to correctly pronounce a letter. It is not widely used except incase of the possibility of 
mispronouncing words that have the same letters but with different pronunciations, 
and hence having different meanings.

    Therfore, Tashkeel and Shadda should not be supported in IDN, yet they can be supported only in 
the user interface, and stripped off at the preparation of internationalized strings (stringprep) phase.

  2. Kasheeda or Tatweel (Horizontal Character Size Extension)

    Kasheeda is not a letter. It is a horizontal line (like dash) used to lengthen the connection line between
letters. It is used sometimes to enhance the display of Arabic words on screens or printouts.

    Hence, Kasheeda (Tatweel) should not be used in IDN.

  3. Character folding

    Character folding is the process where multiple letters (that may have some similarity with respect 
to their shapes) are folded into one shape. This includes:

    With respect to the Arabic language, character folding is not acceptable because it changes the meaning
of the words and it is against the simplest spelling rules.

    Therfore, character folding should not be allowed.

  4. Numbers

    In the Arab world, there are two sets of numerical digits used:

    1. From U+0030 (Digit Zero) to U+0039 (Digit Nine)

      Mostly used in the western part of the Arab world (al-maghrib al-arabi).

    2. From U+0660 (Arabic-Indic Digit Zero) to U+0669 (Arabic-Indic Digit Nine),

      Mostly used in the eastern part of the Arab world (al-mashriq al-arabi).

    Hence, both sets should be supported in the user interface and both are folded to one set (Set I) 
at the preparation of internationalized strings (e.g., "stringprep") phase.

  5. Connecting Multiple Words

    In the Arab language words are separated by spaces. Connecting words without spaces is 
usually not acceptable. Therefore, a single space is the best word separator in an Arabic 
domain name with multiple words.

    Since it is technially not visable to use space as word separator, then multiple words are 
separated by the character "-" dash.