Internet Assigned Numbers Authority

Collation Registry for Internet Application Protocols

Last Updated
Registration Procedure(s)
Expert Review
[RFC4790] creates an abstraction framework so that application protocols
can precisely identify a comparison function and the repertoire of comparison
functions can be extended in the future. This document defines an IANA-maintained 
registry of collations for comparing, searching and sorting international strings.
Available Formats




Plain text
Collation Description Reference
i;ascii-numeric The "i;ascii-numeric" collation is a simple collation intended for use with arbitrary sized unsigned decimal integer numbers stored as octet strings. US-ASCII digits (0x30 to 0x39) represent digits of the numbers. Before converting from string to integer, the input string is truncated at the first non-digit character. All input is valid; strings which do not start with a digit represent positive infinity. [RFC4790]
i;ascii-casemap The "i;ascii-casemap" collation is a simple collation which operates on octet strings and treats US-ASCII letters case-insensitively. It provides equality, substring and ordering operations. All input is valid. Note that letters outside ASCII are not treated case- insensitively. [RFC4790]
i;octet The "i;octet" collation is a simple and fast collation intended for use on binary octet strings rather than on character data. Protocols that want to make this collation available have to do so by explicitly allowing it. If not explicitly allowed, it MUST NOT be used. It never returns an "undefined" result. It provides equality, substring and ordering operations. [RFC4790]

The "i;unicode-casemap" collation is a simple collation which is case-insensitive in its treatment of characters. It provides equality, substring, and ordering operations. The validity test operation returns "valid" for any input.

This collation allows strings in arbitrary (and mixed) character sets, as long as the character set for each string is identified and it is possible to convert the string to Unicode. Strings which have an unidentified character set and/or cannot be converted to Unicode are not rejected, but are treated as binary.