# IDN Table for the Arabic Script # #TLD: .sport # #Script Tag: Arab # #Script Description: Arabic # #Version: 1.3 # #Effective Date: 2015-10-21 # #Registry: Global Association of International Sports Federations (GAISF) # #https://mic.sport # #dnsmaster@corenic.org # #Apart from the general rules defined in IDNA2008 CORE Internet Council of Registrars implements some further rules. These will be described in the following sections. # #For every registered Arabic script IDN, an index (or canonical) string will be computed and stored in the SRS database. The index string #is produced by applying the rules from the section below to the IDN U-label. If no rule applies or if all applying rules map their respective character to itself, then the index #string will be the same as the registered string. Once an Arabic script IDN is registered, other domain names that map to the same index string will be unavailable for #registration. Additionally, there are some rules that specifically prohibit a certain combination of Arabic characters within a domain label. Such labels will also be #unavailable for registration. # #The table presented here is in compliance with the ICANN Guidelines for the Implementation of #Internationalized Domain Names Version 3.0 and is intended for publication in the #IANA Repository of TLD IDN Practices, for the information of prospective holders of domains #in .sport and for the users of resources within those domains. # # #1.1 Table of Allowed Characters and their Joining Property # #The table below lists the characters allowed in the Unicode representation of IDNs associated with the Arabic script. The second column shows the character property used in the #rules below. Columns are delimited by semicolons. The "#" symbol denotes start of a comment that continues to the end of line. # # Valid code point ; character property U+002D; U # HYPHEN-MINUS U+0030; U # DIGIT ZERO U+0031; U # DIGIT ONE U+0032; U # DIGIT TWO U+0033; U # DIGIT THREE U+0034; U # DIGIT FOUR U+0035; U # DIGIT FIVE U+0036; U # DIGIT SIX U+0037; U # DIGIT SEVEN U+0038; U # DIGIT EIGHT U+0039; U # DIGIT NINE U+0621; U # ARABIC LETTER HAMZA U+0622; R # ARABIC LETTER ALEF WITH MADDA ABOVE U+0623; R # ARABIC LETTER ALEF WITH HAMZA ABOVE U+0624; R # ARABIC LETTER WAW WITH HAMZA ABOVE U+0625; R # ARABIC LETTER ALEF WITH HAMZA BELOW U+0626; D # ARABIC LETTER YEH WITH HAMZA ABOVE U+0627; R # ARABIC LETTER ALEF U+0628; D # ARABIC LETTER BEH U+0629; R # ARABIC LETTER TEH MARBUTA U+062A; D # ARABIC LETTER TEH U+062B; D # ARABIC LETTER THEH U+062C; D # ARABIC LETTER JEEM U+062D; D # ARABIC LETTER HAH U+062E; D # ARABIC LETTER KHAH U+062F; R # ARABIC LETTER DAL U+0630; R # ARABIC LETTER THAL U+0631; R # ARABIC LETTER REH U+0632; R # ARABIC LETTER ZAIN U+0633; D # ARABIC LETTER SEEN U+0634; D # ARABIC LETTER SHEEN U+0635; D # ARABIC LETTER SAD U+0636; D # ARABIC LETTER DAD U+0637; D # ARABIC LETTER TAH U+0638; D # ARABIC LETTER ZAH U+0639; D # ARABIC LETTER AIN U+063A; D # ARABIC LETTER GHAIN U+0641; D # ARABIC LETTER FEH U+0642; D # ARABIC LETTER QAF U+0643; D # ARABIC LETTER KAF U+0644; D # ARABIC LETTER LAM U+0645; D # ARABIC LETTER MEEM U+0646; D # ARABIC LETTER NOON U+0647; D # ARABIC LETTER HEH U+0648; R # ARABIC LETTER WAW U+0649; D # ARABIC LETTER ALEF MAKSURA U+064A; D # ARABIC LETTER YEH U+0660; U # ARABIC-INDIC DIGIT ZERO U+0661; U # ARABIC-INDIC DIGIT ONE U+0662; U # ARABIC-INDIC DIGIT TWO U+0663; U # ARABIC-INDIC DIGIT THREE U+0664; U # ARABIC-INDIC DIGIT FOUR U+0665; U # ARABIC-INDIC DIGIT FIVE U+0666; U # ARABIC-INDIC DIGIT SIX U+0667; U # ARABIC-INDIC DIGIT SEVEN U+0668; U # ARABIC-INDIC DIGIT EIGHT U+0669; U # ARABIC-INDIC DIGIT NINE U+0679; D # ARABIC LETTER TTEH U+067E; D # ARABIC LETTER PEH U+0686; D # ARABIC LETTER TCHEH U+0688; R # ARABIC LETTER DDAL U+0691; R # ARABIC LETTER RREH U+0695; R # ARABIC LETTER REH WITH SMALL V BELOW U+0698; R # ARABIC LETTER JEH U+069C; D # ARABIC LETTER SEEN WITH THREE DOTS BELOW AND THREE DOTS ABOVE U+069E; D # ARABIC LETTER SAD WITH THREE DOTS ABOVE U+06A0; D # ARABIC LETTER AIN WITH THREE DOTS ABOVE U+06A2; D # ARABIC LETTER FEH WITH DOT MOVED BELOW U+06A4; D # ARABIC LETTER VEH U+06A5; D # ARABIC LETTER FEH WITH THREE DOTS BELOW U+06A7; D # ARABIC LETTER QAF WITH DOT ABOVE U+06A8; D # ARABIC LETTER QAF WITH THREE DOTS ABOVE U+06A9; D # ARABIC LETTER KEHEH U+06AE; D # ARABIC LETTER KAF WITH THREE DOTS BELOW U+06AF; D # ARABIC LETTER GAF U+06B4; D # ARABIC LETTER GAF WITH THREE DOTS ABOVE U+06B5; D # ARABIC LETTER LAM WITH SMALL V U+06BA; D # ARABIC LETTER NOON GHUNNA U+06BD; D # ARABIC LETTER NOON WITH THREE DOTS ABOVE U+06BE; D # ARABIC LETTER HEH DOACHASHMEE U+06C1; D # ARABIC LETTER HEH GOAL U+06C6; R # ARABIC LETTER OE U+06C7; R # ARABIC LETTER U U+06CA; R # ARABIC LETTER WAW WITH TWO DOTS ABOVE U+06CC; D # ARABIC LETTER FARSI YEH U+06CE; D # ARABIC LETTER YEH WITH SMALL V U+06CF; R # ARABIC LETTER WAW WITH DOT ABOVE U+06D2; R # ARABIC LETTER YEH BARREE U+06F0; U # EXTENDED ARABIC-INDIC DIGIT ZERO U+06F1; U # EXTENDED ARABIC-INDIC DIGIT ONE U+06F2; U # EXTENDED ARABIC-INDIC DIGIT TWO U+06F3; U # EXTENDED ARABIC-INDIC DIGIT THREE U+06F4; U # EXTENDED ARABIC-INDIC DIGIT FOUR U+06F5; U # EXTENDED ARABIC-INDIC DIGIT FIVE U+06F6; U # EXTENDED ARABIC-INDIC DIGIT SIX U+06F7; U # EXTENDED ARABIC-INDIC DIGIT SEVEN U+06F8; U # EXTENDED ARABIC-INDIC DIGIT EIGHT U+06F9; U # EXTENDED ARABIC-INDIC DIGIT NINE U+0762; D # ARABIC LETTER KEHEH WITH DOT ABOVE U+200C; U # ZERO WIDTH NON-JOINER # # #1.2 Pattern Syntax # #The following rules will have a pattern describing a character from the above table to be matched. Since the Arabic characters cannot be considered on a one-by-one basis, a #look-behind and look-ahead pattern is (optionally) added to each rule. A rule applies to a domain label if the given pattern matches a character within the label and the #characters before and after match the patterns given in look-behind and look-ahead, respectively. The terms 'before' and 'after' are considered to be in relation to the on-the-#wire order of the characters. # #The following character properties will be used in the patterns. #* {L} denotes characters that are left-joining #* {R} denotes characters that are right-joining #* {D} denotes characters that are dual-joining (i.e. both left- and right-joining) #* {U} denotes characters that are unable to join #* {T} denotes combining characters #* {1} denotes the ASCII digits Zero - Nine (i.e., 0030..0039) #* {2} denotes the Arabic digits Zero - Nine (i.e., 0660..0669) #* {3} denotes the extended Arabic digits Zero - Nine (i.e., 06F0..06F9) #* {} denotes the empty word (no character) #* character classes can be combined, e.g. {LD} denotes characters that are either left-joining or dual-joining # #Apart from the character properties necessary for the Arabic characters, the used pattern syntax is defined by the following syntax elements which are based on the regular #expression syntax. # #* . denotes any character #* ^ denotes the beginning of the label #* $ denotes the end of the label #* | denotes an alternative #* (...) groups a subpattern #* an asterisk (*) denotes any number of occurrences, including zero #* ZWNJ denotes the zero width non-joiner character (U+200C) #* U+XXXX denotes the Unicode character with the hexadecimal code point XXXX # # #1.3 Rules Describing Invalid Labels # #If any of the following rules is matched, the respective domain label is rejected and may not be registered. Note that some special cases of the rules below are already included #in IDNA2008. Excluding those special cases would only have made the rules more complicated and less intuitive so they were kept. # #Comment: prevent confusion that may arise in conjunction with certain fonts #Look-behind: #Pattern: U+0637 | U+0638 | U+06BE #Look-ahead: ZWNJ {T}* {RD} #Action: reject # #Comment: a label may not start with a digit #Look-behind: ^ #Pattern: {123} #Look-ahead: #Action: reject # #Comment: consecutive hyphens are not allowed in a label #Look-behind: U+002D #Pattern: U+002D #Look-ahead: #Action: reject # #Comment: no mixing of the three digit sets is allowed (part 1) #Look-behind: {23} .* #Pattern: {1} #Look-ahead: #Action: reject # #Comment: no mixing of the three digit sets is allowed (part 2) #Look-behind: {13} .* #Pattern: {2} #Look-ahead: #Action: reject # #Comment: no mixing of the three digit sets is allowed (part 3) #Look-behind: {12} .* #Pattern: {3} #Look-ahead: #Action: reject # # #1.4 Rules for Variant and Index Generation # #If any of the following rules matches, the character from the pattern will have an index character. This index character is used to determine the index string for the whole #label (simply by replacing each character by its index character). Furthermore, labels with at least one matching rule will have label variants. The variants are determined by #replacing the character by any one of the allowed variants. # #Note that for every variant, the previous rules to determine invalid labels also have to be checked to make sure a variant is a valid domain label. # # #1.4.1 ZWNJ IDNA2003 workaround # #The rule is due to the fact that several browsers still do not support IDNA2008 in respect of the ZWNJ character. By implementing the older version IDNA2003 they simply remove #any ZWNJ occurrence. The following rule removes every ZWNJ from a domain name and thus makes two domains that only differ in the appearance of ZWNJ characters variants of each #other. # #Comment: ZWNJ IDNA2003 workaround #Look-behind: #Pattern: ZWNJ #Look-ahead: #Index: {} #Variants: {}, ZWNJ # # #1.4.2 YEH Group # #Comment: YEH Group (part 1) #Look-behind: #Pattern: U+064A | U+06CC #Look-ahead: {T}* {RD} #Index: U+064A #Variants: U+064A, U+06CC # #Comment: YEH Group (part 2) #Look-behind: #Pattern: U+0649 | U+06CC #Look-ahead: {T}* {U} | $ #Index: U+0649 #Variants: U+0649, U+06CC # # #1.4.3 HEH Group # #Comment: HEH Group #Look-behind: #Pattern: U+0647 | U+06BE | U+06C1 #Look-ahead: #Index: U+0647 #Variants: U+0647, U+06BE, U+06C1 # # #1.4.4 NOON Group # #Comment: NOON Group #Look-behind: #Pattern: U+0646 | U+06BA #Look-ahead: {T}* {RD} #Index: U+0646 #Variants: U+0646, U+06BA # # #1.4.5 KEH Group # #Comment: KEH Group #Look-behind: #Pattern: U+0643 | U+06A9 #Look-ahead: {T}* {RD} #Index: U+0643 #Variants: U+0643, U+06A9 # # #1.4.6 FEH Group # #Comment: FEH Group #Look-behind: #Pattern: U+0641 | U+06A7 #Look-ahead: {T}* {RD} #Index: U+0641 #Variants: U+0641, U+06A7 # # #1.4.7 PEH Group # #Comment: PEH Group #Look-behind: #Pattern: U+067E | U+06BD #Look-ahead: {T}* {RD} #Index: U+067E #Variants: U+067E, U+06BD # # #1.4.8 "6A0" / VEH Group # #Comment: 6A0 Group (part 1) #Look-behind: {LD} {T}* #Pattern: U+06A0 | U+06A4 | U+06A8 #Look-ahead: {T}* {RD} #Index: U+06A0 #Variants: U+06A0, U+06A4, U+06A8 # #Comment: 6A0 Group (part 2) #Look-behind: ^ | {RU} {T}* #Pattern: U+06A4 | U+06A8 #Look-ahead: {T}* {RD} #Index: U+06A4 #Variants: U+06A4, U+06A8 # # #1.4.9 Digits # #Comment: Digit Zero #Look-behind: #Pattern: U+0030 | U+0660 | U+06F0 #Look-ahead: #Index: U+0030 #Variants: U+0030, U+0660, U+06F0 # #Comment: Digit One #Look-behind: #Pattern: U+0031 | U+0661 | U+06F1 #Look-ahead: #Index: U+0031 #Variants: U+0031, U+0661, U+06F1 # #Comment: Digit Two #Look-behind: #Pattern: U+0032 | U+0662 | U+06F2 #Look-ahead: #Index: U+0032 #Variants: U+0032, U+0662, U+06F2 # #Comment: Digit Three #Look-behind: #Pattern: U+0033 | U+0663 | U+06F3 #Look-ahead: #Index: U+0033 #Variants: U+0033, U+0663, U+06F3 # #Comment: Digit Four #Look-behind: #Pattern: U+0034 | U+0664 | U+06F4 #Look-ahead: #Index: U+0034 #Variants: U+0034, U+0664, U+06F4 # #Comment: Digit Five #Look-behind: #Pattern: U+0035 | U+0665 | U+06F5 #Look-ahead: #Index: U+0035 #Variants: U+0035, U+0665, U+06F5 # #Comment: Digit Six #Look-behind: #Pattern: U+0036 | U+0666 | U+06F6 #Look-ahead: #Index: U+0036 #Variants: U+0036, U+0666, U+06F6 # #Comment: Digit Seven #Look-behind: #Pattern: U+0037 | U+0667 | U+06F7 #Look-ahead: #Index: U+0037 #Variants: U+0037, U+0667, U+06F7 # #Comment: Digit Eight #Look-behind: #Pattern: U+0038 | U+0668 | U+06F8 #Look-ahead: #Index: U+0038 #Variants: U+0038, U+0668, U+06F8 # #Comment: Digit Nine #Look-behind: #Pattern: U+0039 | U+0669 | U+06F9 #Look-ahead: #Index: U+0039 #Variants: U+0039, U+0669, U+06F9