New collations are added to this registry through a process of expert review. Proposals for new collations are to be formatted using the template defined in: [RFC4790] and sent to [mailto:firstname.lastname@example.org]. Documents are then passed to the designated expert for review.
[RFC4790] creates an abstraction framework so that application protocols can precisely identify a comparison function and the repertoire of comparison functions can be extended in the future. This document defines an IANA-maintained registry of collations for comparing, searching and sorting international strings. The following is the list of comparators:
|i;ascii-numeric||The "i;ascii-numeric" collation is a simple collation intended for use with arbitrary sized unsigned decimal integer numbers stored as octet strings. US-ASCII digits (0x30 to 0x39) represent digits of the numbers. Before converting from string to integer, the input string is truncated at the first non-digit character. All input is valid; strings which do not start with a digit represent positive infinity.||[RFC4790]|
|i;ascii-casemap||The "i;ascii-casemap" collation is a simple collation which operates on octet strings and treats US-ASCII letters case-insensitively. It provides equality, substring and ordering operations. All input is valid. Note that letters outside ASCII are not treated case- insensitively.||[RFC4790]|
|i;octet||The "i;octet" collation is a simple and fast collation intended for use on binary octet strings rather than on character data. Protocols that want to make this collation available have to do so by explicitly allowing it. If not explicitly allowed, it MUST NOT be used. It never returns an "undefined" result. It provides equality, substring and ordering operations.||[RFC4790]|
The "i;unicode-casemap" collation is a simple collation which is case-insensitive in its treatment of characters. It provides equality, substring, and ordering operations. The validity test operation returns "valid" for any input.
This collation allows strings in arbitrary (and mixed) character sets, as long as the character set for each string is identified and it is possible to convert the string to Unicode. Strings which have an unidentified character set and/or cannot be converted to Unicode are not rejected, but are treated as binary.