IDN Table for the Arabic Language
TLD: .sap
Language Tag: ar
Version: 1.0
Effective Date: 2012-03-30
Registry: SAP AG
The table is based on the allowed set of characters from the .sa TLD.
Apart from the general rules defined in IDNA2008 Knipp Medien und Kommunikation GmbH implements some further rules. Those rules (given in the following section) specifically prohibit a certain combination of Arabic characters within a domain label. Such labels will also be unavailable for registration.
For every registered Arabic language IDN, a canonical string will be computed and stored in the SRS database. The canonical string is produced by mapping every character in the IDN U-label to the corresponding canonical character as specified in the character table below. Once an Arabic language IDN is registered, other domain names that map to the same canonical string will be unavailable for registration.
The table presented here is in compliance with the ICANN Guidelines for the Implementation of Internationalized Domain Names Version 3.0 and is intended for publication in the IANA Repository of TLD IDN Practices, for the information of prospective holders of domains in .sap and for the users of resources within those domains.
1. Rules Describing Invalid Labels
The following rules will have a pattern describing a character from the above table to be matched. Since the Arabic characters cannot be considered on a one-by-one basis, a look-behind and look-ahead pattern is (optionally) added to each rule. A rule applies to a domain label if the given pattern matches a character within the label and the characters before and after match the patterns given in look-behind and look-ahead, respectively. The terms 'before' and 'after' are considered to be in relation to the on-the-wire order of the characters.
The following character properties will be used in the patterns.
* {1} denotes the ASCII digits Zero - Nine (i.e., 0030..0039)
* {2} denotes the Arabic digits Zero - Nine (i.e., 0660..0669)
* character classes can be combined, e.g. {12} denotes characters that are either ASCII digits or Arabic digits
Apart from the character properties necessary for the Arabic characters, the used pattern syntax is defined by the following syntax elements which are based on the regular expression syntax.
* . denotes any character
* ^ denotes the beginning of the label
* an asterisk (*) denotes any number of occurrences, including zero
* U+XXXX denotes the Unicode character with the hexadecimal code point XXXX
If any of the following rules is matched, the respective domain label is rejected and may not be registered. Note that some special cases of the rules below are already included in IDNA2008. Excluding those special cases would only have made the rules more complicated and less intuitive so they were kept.
Comment: a label may not start with a digit
Look-behind: ^
Pattern: {12}
Look-ahead:
Action: reject
Comment: consecutive hyphens are not allowed in a label
Look-behind: U+002D
Pattern: U+002D
Look-ahead:
Action: reject
Comment: no mixing of the two digit sets is allowed
Look-behind: {2} .*
Pattern: {1}
Look-ahead:
Action: reject
Comment: no mixing of the two digit sets is allowed
Look-behind: {1} .*
Pattern: {2}
Look-ahead:
Action: reject
2. Table of Allowed Characters
The table below lists the characters allowed in the Unicode representation of IDNs associated with the Arabic language. Columns are delimited by semicolons. The "#" symbol denotes start of a comment that continues to the end of line.
# Valid code point ; canonical code point ; variant
U+002D;U+002D # HYPHEN-MINUS
U+0030;U+0030;U+0660 # DIGIT ZERO
U+0031;U+0031;U+0661 # DIGIT ONE
U+0032;U+0032;U+0662 # DIGIT TWO
U+0033;U+0033;U+0663 # DIGIT THREE
U+0034;U+0034;U+0664 # DIGIT FOUR
U+0035;U+0035;U+0665 # DIGIT FIVE
U+0036;U+0036;U+0666 # DIGIT SIX
U+0037;U+0037;U+0667 # DIGIT SEVEN
U+0038;U+0038;U+0668 # DIGIT EIGHT
U+0039;U+0039;U+0669 # DIGIT NINE
U+0621;U+0621 # ARABIC LETTER HAMZA
U+0622;U+0622 # ARABIC LETTER ALEF WITH MADDA ABOVE
U+0623;U+0623 # ARABIC LETTER ALEF WITH HAMZA ABOVE
U+0624;U+0624 # ARABIC LETTER WAW WITH HAMZA ABOVE
U+0625;U+0625 # ARABIC LETTER ALEF WITH HAMZA BELOW
U+0626;U+0626 # ARABIC LETTER YEH WITH HAMZA ABOVE
U+0627;U+0627 # ARABIC LETTER ALEF
U+0628;U+0628 # ARABIC LETTER BEH
U+0629;U+0629 # ARABIC LETTER TEH MARBUTA
U+062A;U+062A # ARABIC LETTER TEH
U+062B;U+062B # ARABIC LETTER THEH
U+062C;U+062C # ARABIC LETTER JEEM
U+062D;U+062D # ARABIC LETTER HAH
U+062E;U+062E # ARABIC LETTER KHAH
U+062F;U+062F # ARABIC LETTER DAL
U+0630;U+0630 # ARABIC LETTER THAL
U+0631;U+0631 # ARABIC LETTER REH
U+0632;U+0632 # ARABIC LETTER ZAIN
U+0633;U+0633 # ARABIC LETTER SEEN
U+0634;U+0634 # ARABIC LETTER SHEEN
U+0635;U+0635 # ARABIC LETTER SAD
U+0636;U+0636 # ARABIC LETTER DAD
U+0637;U+0637 # ARABIC LETTER TAH
U+0638;U+0638 # ARABIC LETTER ZAH
U+0639;U+0639 # ARABIC LETTER AIN
U+063A;U+063A # ARABIC LETTER GHAIN
U+0641;U+0641 # ARABIC LETTER FEH
U+0642;U+0642 # ARABIC LETTER QAF
U+0643;U+0643 # ARABIC LETTER KAF
U+0644;U+0644 # ARABIC LETTER LAM
U+0645;U+0645 # ARABIC LETTER MEEM
U+0646;U+0646 # ARABIC LETTER NOON
U+0647;U+0647 # ARABIC LETTER HEH
U+0648;U+0648 # ARABIC LETTER WAW
U+0649;U+0649 # ARABIC LETTER ALEF MAKSURA
U+064A;U+064A # ARABIC LETTER YEH
U+0660;U+0030;U+0030 # ARABIC-INDIC DIGIT ZERO
U+0661;U+0031;U+0031 # ARABIC-INDIC DIGIT ONE
U+0662;U+0032;U+0032 # ARABIC-INDIC DIGIT TWO
U+0663;U+0033;U+0033 # ARABIC-INDIC DIGIT THREE
U+0664;U+0034;U+0034 # ARABIC-INDIC DIGIT FOUR
U+0665;U+0035;U+0035 # ARABIC-INDIC DIGIT FIVE
U+0666;U+0036;U+0036 # ARABIC-INDIC DIGIT SIX
U+0667;U+0037;U+0037 # ARABIC-INDIC DIGIT SEVEN
U+0668;U+0038;U+0038 # ARABIC-INDIC DIGIT EIGHT
U+0669;U+0039;U+0039 # ARABIC-INDIC DIGIT NINE