Namespace Registration for Self-Addressing Identifiers (SAID) Namespace Identifier: said Version: 1 Date: 2026-03-16 Registrant: Sam Smith, Wenjing Chu, and Carly Huitema on behalf of ToIP (Trust over IP), a project of the Linux Foundation Decentralized Trust https://trustoverip.org/ Contact email: (Wenjing Chu) trustoverip&lfdecentralizedtrust.org Purpose: Self-Addressing Identifiers (SAIDs) are self-referential, content-addressable identifiers based on Composable Event Streaming Representation (CESR) [1] encoded cryptographic digest. This proposal defines a method to unambiguously identify digital assets that contain their SAIDs via their SAID identifiers in the urn:said namespace. Such urn:said identifiers can ease the ways of adopting SAIDs anywhere URNs are accepted and can also improve interoperability. These URNs are typically non-resolvable, serving as unique identifiers. However, optional mechanisms could be introduced in the future in addressing such requirements. Another future improvement may introduce optional SAID location/format indicator to facilitate structure-free verification. Syntax: A `said` URN shall consist of two mandatory components in the following order, with `:` characters between each section. - URN identifier (`urn:said`): REQUIRED - SAID, in string representation as per Composable Event Streaming Representation (CESR) [1], Section 11.6 and represented with Base64URLSafe alphabet of RFC 4648 (with the exception of `=` pad character which is not used in CESR). REQUIRED The ABNF of a `said` URN is as follows: ```ABNF said-urn = "urn:said:" said said = cesr-code cesr-digest-value cesr-code = cesr-digest-value = ; Base64URLSafe characters (RFC 4648, excluding padding) base64urlsafe = ALPHA / DIGIT / "-" / "_" ; The complete SAID primitive MUST conform to CESR code table [2], CESR ; spec Section 11.4.2. The following currently defined digest codes, for ; example, produce SAIDs of 44 or 88 characters total. ; ; 256-bit SAIDs: 44 characters total (1 char code + 43 Base64URLSafe) ; one-char-code = "E" / "F" / "G" / "H" / "I" ; said-256 = one-char-code 43base64urlsafe ; 512-bit SAIDs: 88 characters total (2 char code + 86 Base64URLSafe) ; two-char-code = "0D" / "0E" / "0F" / "0G" ; said-512 = two-char-code 86base64urlsafe ``` Here are two examples: - `urn:said:E8wYuBjhslETYaLZcxMkWrhVbMcA8RS1pKYl7nJ77ntA` (44 characters, Blake3-256) - `urn:said:0FCNcm3MGi3efpdqsmmzGU2tnEPpAndgeCQErutCuu82VfaZqc1BbxL0a2-fOrGilCK2XuHcMqtILo2nc7M2mUuw` (88 characters, SHA3-512) The first example is 44 characters long in text representation, where `E` indicates Blake3-256 digest and the remaining 43 characters encode 256 bit digest. The second example has a two character code `0F`, followed by 86 characters encoding 512-bit SHA3-512 digest. Additional digest functions are listed in CESR Spec code table, Section 11.4.2 [2]. Assignment: The SAID strings conforming to this scheme are algorithmically generated, based on the derivation of the SAID as per CESR Spec [1]. CESR (Composable Event Streaming Representation) is a dual text-binary encoding format providing lossless round-trip conversion between text (Base64URLSafe) and binary domains. CESR primitives are self-framing: each primitive includes a prepended type code identifying both the cryptographic algorithm and value length, enabling stream parsing without external delimiters. A Self-Addressing Identifier (SAID) is derived by: (1) designating a location within the data for the SAID, (2) inserting a placeholder of the appropriate length at that location, (3) computing a cryptographic digest over the entire byte sequence, and (4) replacing the placeholder with the CESR-encoded digest. The placeholder length equals the final SAID length: 33 bytes in the binary domain or 44 characters in the text domain for a 256-bit digest. Verification reverses this process. The CESR type code (e.g., `E` for Blake3-256) makes the digest algorithm self-describing, providing cryptographic agility. The digital assets identified by an `urn:said` identifier must contain the SAID identifier in itself (i.e. self-referential). The SAID derivation procedure defined by CESR Spec [1] (Section 11.6) requires a consistent serialization scheme for the digital assets in order to ensure correct representation and verification. In other words, a SAID identifies the serialized digital asset where it is also contained. The serialization scheme used for SAID derivation must be known to verifiers. This is typically established by the application context or protocol in which the SAID appears. The serialization scheme information is not encoded in the SAID. Note that applications can choose their own serialization schemes, but for reproducibility and verification the chosen scheme must preserve the size and order of data fields in the structure. For interoperability over a network, standardized serialization methods such as JSON, CBOR, MessagePack, and CESR can be used. Note that this requirement may be loosened in the future by introducing an optional SAID location/format indicator within the string to facilitate verification without prior knowledge of the serialization scheme. To illustrate, here are some examples of how the derivation works (from CESR [1], Section 11.6): (1) CESR format example Suppose the initial value of the fixed field serialization is the 76-character string as follows: field_0_01234567field_1_ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789field_2_98765432 where: - field0 is the 16-character string "field_0_01234567" - field1 is the 44-character placeholder string "field_1_ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" - field2 is the 16 character string "field_2_98765432" The first step in generating the SAID for this serialization is to replace the placeholder contents of field1 with a dummy string of '#' characters of length 44. This produces a dummied 76-character string as follows: field_0_01234567############################################field_2_98765432 Let's say we choose to use Blake3-256. The digest is then computed on the above string and encoded in CESR format. Thus the SAID of the 76-character string is as follows: ENI2bDYghiu1KYYkFrPofH8tJ5tNiNt8WrTIc4s_5IIH where the first letter 'E' signifies the use of Blake3-256 Digest as specified in the code table [2] (Section 11.4.2 of the CESR specification where all codes for supported digest functions are defined) and the rest of the string is the Blake3-256 digest of the string "field_0_01234567############################################field_2_98765432". Replacing the 44 dummy characters with the SAID of the same length produces the final "SAID-ified" string as follows: field_0_01234567ENI2bDYghiu1KYYkFrPofH8tJ5tNiNt8WrTIc4s_5IIHfield_2_98765432 To verify the embedded SAID with respect to its encompassing serialization above, just reverse the generation steps. In other words, replace the SAID in the string with dummy characters of the same length, compute the Blake3 digest as the SAID of this dummied version, and then compare the SAIDs. The resulting URN is: `urn:said:ENI2bDYghiu1KYYkFrPofH8tJ5tNiNt8WrTIc4s_5IIH` (2) JSON serialized data example Suppose the initial value of Python dict data structure is as follows: { "said": "", "first": "Sue", "last": "Smith", "role": "Founder" } If we choose the 44 CESR character Blake3-256 digest for SAID derivation and use JSON serialization, the first step of the derivation procedure is to insert the `#` character in place of the future SAID string: { "said": "############################################", "first": "Sue", "last": "Smith", "role": "Founder" } For consistent JSON serialization, we remove all extra white space: {"said":"############################################","first":"Sue","last":"Smith","role":"Founder"} Applying Blake3-256 digest algorithm to this representation, we obtain the SAID (encoded in CESR) string (in text format): `EJymtAC4piy_HkHWRs4JSRv0sb53MZJr8BQ4SMixXIVJ` Now, replacing the `#` spaceholder with the SAID string, the data asset with the SAID becomes: {"said":"EJymtAC4piy_HkHWRs4JSRv0sb53MZJr8BQ4SMixXIVJ","first":"Sue","last":"Smith","role":"Founder"} The URN for the above data representation is: `urn:said:EJymtAC4piy_HkHWRs4JSRv0sb53MZJr8BQ4SMixXIVJ` Now, the Python data structure may be updated to: { "said": "EJymtAC4piy_HkHWRs4JSRv0sb53MZJr8BQ4SMixXIVJ", "first": "Sue", "last": "Smith", "role": "Founder" } The verification of SAID (therefore `urn:said`) reverses the generation process. Note that the data fields do not have to be text; they can be binary, but for verification to work they must be fixed in their size and consistent in their order. The resulting URN is: `urn:said:EJymtAC4piy_HkHWRs4JSRv0sb53MZJr8BQ4SMixXIVJ` (3) JSON Schema $id example Applying the same procedure as the above example, we produce this self-referential JSON Schema: { "$id": "EGU_SHY-8ywNBJOqPKHr4sXV9tOtOwpYzYOM63_zUCDW", "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "full_name": { "type": "string" } } } The resulting URN is: `urn:said:EGU_SHY-8ywNBJOqPKHr4sXV9tOtOwpYzYOM63_zUCDW` (4) Binary data example Assume a fixed-format binary record with the following layout: Offset Size Field 0 2 Record type (big-endian) 2 33 SAID slot (256-bit, binary domain) 35 4 Payload Step 1: Initial data with placeholder The application defines a record with type 0x0001 and payload 0xDEADBEEF. Insert 33 placeholder bytes (0x23 = #) at the SAID slot: Hex: 0001 23232323232323232323232323232323 23232323232323232323232323232323 23 DEADBEEF Total: 39 bytes Step 2: Compute digest Apply Blake2b-256 over the 39-byte sequence, yielding the 32-byte digest: 116a3b1b50225060df5c2bf4154a2539b5ae9345ddc68f0c4fd59f5bd64757b2 Step 3: Encode as CESR Text domain (44 characters): FBFqOxtQIlBg31wr9BVKJTm1rpNF3caPDE_Vn1vWR1ey Note that we choose Blake2b-256 in this example whose one-character code is `F` (as defined in the CESR code table [2]). Binary domain (33 bytes): 14116a3b1b50225060df5c2bf4154a2539b5ae9345ddc68f0c4fd59f5bd64757b2 Step 4: Replace placeholder with SAID Final binary record (39 bytes): Hex: 0001 14116a3b1b50225060df5c2bf4154a25 39b5ae9345ddc68f0c4fd59f5bd64757 b2 DEADBEEF The resulting URN is: `urn:said:FBFqOxtQIlBg31wr9BVKJTm1rpNF3caPDE_Vn1vWR1ey` Note that the `said` identifier is a Base64URLSafe character string for `urn:said` identifiers. The SAID contained in the binary data structure is in binary. Security and Privacy: `urn:said` identifiers are designed to identify digital assets. Do not assume they are random or hard to guess. In fact, if the digital content itself is known they can be deterministically derived for a given digest algorithm. Such identifiers, therefore, should not be naively used for security capabilities (e.g., identifiers whose mere possesion grants privileged access). The SAID must be encoded with one of the digest algorithms provided in the CESR code tables, as defined in Section 11.4.2 of CESR specification [2] (which may be viewed as the security considerations for SAID). This normative requirement is that cryptographic primitives that are entered in the table must maintain 128 bits of cryptographic strength. This strength protects against attempts to alter the binding between a `urn:said` identifier and its self-referenced content. Additional digest algorithms may be added to the code table in the future, e.g. for approved NIST post-quantum resistant cryptographic operations. Adding the `urn:said` identifier to the self-referenced digital asset does not modify its privacy considerations. Interoperability: A SAID string MUST be CESR-encoded, which self-identifies the digest algorithm used to generate it. This greatly enhances interoperability and future adaptability. In addition, the use of CESR encoding allows lossless transformation between binary and text domains. This property is useful in digital assets which may be most optimally represented in binary or serialization schemes such as CBOR. The serialization method used MUST be known out of band or by context to the verifiers. A future revision may extend the urn:said string to specify an optional SAID location and format indicator in order to facilitate verification and further improve interoperability. Adopters MAY consider narrowing selections of digest functions to reduce complexity and improve interoperability with some cost in flexibility. Resolution: These URNs are non-resolvable and serve as globally unique identifiers. A future version of this registration may define optional information that makes it easier to resolve SAID URNs. Documentation: - [1] Smith, S., & Feairheller, P. (2026). Composable Event Streaming Representation (CESR). Zenodo. https://doi.org/10.5281/zenodo.18879946 See also: https://trustoverip.github.io/kswg-cesr-specification - [2] Section 11.4.2 of reference [1], "Master code table for genus/version -_AAACAA (KERI/ACDC protocol stack Version 2.00)". See also: https://trustoverip.github.io/kswg-cesr-specification/#keriacdc-protocol-genus-version-table - [3] RFC 4648: The Base16, Base32, and Base64 Data Encodings, https://datatracker.ietf.org/doc/html/rfc4648 Additional Information: NONE Revision Information: N/A