Skip to content

Commit

Permalink
update links
Browse files Browse the repository at this point in the history
  • Loading branch information
adraffy committed Sep 14, 2024
1 parent c1a0df1 commit c62d51d
Showing 1 changed file with 35 additions and 31 deletions.
66 changes: 35 additions & 31 deletions docs/ensip/15.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,19 @@ export const meta = {
ensip: {
status: 'draft',
created: '2023-04-03',
updated: '2023-09-18',
updated: '2024-09-14',
}
};

# ENSIP-15: ENS Name Normalization Standard

## Abstract

This ENSIP standardizes Ethereum Name Service (ENS) name normalization process outlined in [ENSIP-1 § Name Syntax](./ensip-1-ens.md#name-syntax).
This ENSIP standardizes Ethereum Name Service (ENS) name normalization process outlined in [ENSIP-1 § Name Syntax](./1#name-syntax).

## Motivation

* Since [ENSIP-1](./ensip-1-ens.md) (originally [EIP-137](https://eips.ethereum.org/EIPS/eip-137)) was finalized in 2016, Unicode has [evolved](https://unicode.org/history/publicationdates.html) from version 8.0.0 to 15.0.0 and incorporated many new characters, including complex emoji sequences.
* Since [ENSIP-1](./1) (originally [EIP-137](https://eips.ethereum.org/EIPS/eip-137)) was finalized in 2016, Unicode has [evolved](https://unicode.org/history/publicationdates.html) from version 8.0.0 to 15.0.0 and incorporated many new characters, including complex emoji sequences.
* ENSIP-1 does not state the version of Unicode.
* ENSIP-1 implies but does not state an explicit flavor of IDNA processing.
* [UTS-46](https://unicode.org/reports/tr46/) is insufficient to normalize emoji sequences. Correct emoji processing is only possible with [UTS-51](https://www.unicode.org/reports/tr51/).
Expand All @@ -34,10 +34,10 @@ This ENSIP standardizes Ethereum Name Service (ENS) name normalization process o

## Specification

* Unicode version `15.1.0`
* Unicode version `16.0.0`
* Normalization is a living specification and should use the latest stable version of Unicode.
* [`spec.json`](./ensip-15/spec.json) contains all [necessary data](#description-of-specjson) for normalization.
* [`nf.json`](./ensip-15/nf.json) contains all [necessary data](#description-of-nfjson) for [Unicode Normalization Forms](https://unicode.org/reports/tr15/) NFC and NFD.
* [`spec.json`](https://github.com/adraffy/ens-normalize.js/blob/main/derive/output/spec.json) contains all [necessary data](#description-of-specjson) for normalization.
* [`nf.json`](https://github.com/adraffy/ens-normalize.js/blob/main/derive/output/nf.json) contains all [necessary data](#description-of-nfjson) for [Unicode Normalization Forms](https://unicode.org/reports/tr15/) NFC and NFD.

### Definitions

Expand Down Expand Up @@ -67,18 +67,18 @@ This ENSIP standardizes Ethereum Name Service (ENS) name normalization process o
* All **Emoji Sequence** have explicit emoji-presentation.
* The convention of ignoring presentation is difficult to change because:
* Presentation characters (`FE0F` and `FE0E`) are **Ignored**
* [ENSIP-1](./ensip-1-ens.md) did not treat emoji differently from text
* [ENSIP-1](./1) did not treat emoji differently from text
* Registration hashes are immutable
* [Beautification](#annex-beautification) can be used to restore emoji-presentation in normalized names.

### Algorithm

* Normalization is the process of canonicalizing a name before for [hashing](./ensip-1-ens.md#namehash-algorithm).
* Normalization is the process of canonicalizing a name before for [hashing](./1#namehash-algorithm).
* It is idempotent: applying normalization multiple times produces the same result.
* For user convenience, leading and trailing whitespace should be trimmed before normalization, as all whitespace codepoints are disallowed. Inner characters should remain unmodified.
* No string transformations (like case-folding) should be applied.

1. [Split](#split) the name into [labels](./ensip-1-ens.md#name-syntax).
1. [Split](#split) the name into [labels](./1#name-syntax).
1. [Normalize](#normalize) each label.
1. [Join](#join) the labels together into a name again.

Expand All @@ -103,7 +103,7 @@ Examples:

### Tokenize

Convert a label into a list of `Text` and `Emoji` tokens, each with a payload of codepoints. The complete list of character types and [emoji sequences](./ensip-15/emoji.md#valid-emoji-sequences) can be found in [`spec.json`](#description-of-specjson).
Convert a label into a list of `Text` and `Emoji` tokens, each with a payload of codepoints. The complete list of character types and [emoji sequences](#appendix-additional-resources) can be found in [`spec.json`](#description-of-specjson).

1. Allocate an empty codepoint buffer.
1. Find the longest **Emoji Sequence** that matches the remaining input.
Expand Down Expand Up @@ -258,7 +258,7 @@ A label composed of confusable characters isn't necessarily confusable.

## Description of `spec.json`

* **Groups** (`"groups"`) — [groups](./ensip-15/groups.md) of characters that can constitute a label
* **Groups** (`"groups"`) — [groups](#appendix-additional-resources) of characters that can constitute a label
* `"name"` — ASCII name of the group (or abbreviation if **Restricted**)
* Examples: *Latin*, *Japanese*, *Egyp*
* **Restricted** (`"restricted"`) — **`true`** if [Excluded](https://www.unicode.org/reports/tr31#Table_Candidate_Characters_for_Exclusion_from_Identifiers) or [Limited-Use](https://www.unicode.org/reports/tr31/#Table_Limited_Use_Scripts) script
Expand All @@ -272,7 +272,7 @@ A label composed of confusable characters isn't necessarily confusable.
* Example: `à̀̀``E0 300 300`
* Currently, every group that is **CM Whitelist** has zero compound sequences.
* **CM Whitelisted** is effectively **`true`** if `[]` otherwise **`false`**
* **Ignored** (`"ignored"`) — [characters](./ensip-15/ignored.csv) that are ignored during normalization
* **Ignored** (`"ignored"`) — [characters](#appendix-additional-resources) that are ignored during normalization
* Example: `34F (�) COMBINING GRAPHEME JOINER`
* **Mapped** (`"mapped"`) — characters that are mapped to a sequence of **valid** characters
* Example: `41 (A) LATIN CAPITAL LETTER A``[61 (a) LATIN SMALL LETTER A]`
Expand All @@ -282,15 +282,15 @@ A label composed of confusable characters isn't necessarily confusable.
* Example: `34 (4) DIGIT FOUR`
* **Confused** (`"confused"`) — subset of confusable characters that confuse
* Example: `13CE (Ꮞ) CHEROKEE LETTER SE`
* **Fenced** (`"fenced"`) — [characters](./ensip-15/fenced.csv) that cannot be first, last, or contiguous
* **Fenced** (`"fenced"`) — [characters](#appendix-additional-resources) that cannot be first, last, or contiguous
* Example: `2044 (⁄) FRACTION SLASH`
* **Emoji Sequence(s)** (`"emoji"`) — valid [emoji sequences](./ensip-15/emoji.md#valid-emoji-sequences)
* **Emoji Sequence(s)** (`"emoji"`) — valid [emoji sequences](#appendix-additional-resources)
* Example: `👨‍💻 [1F468 200D 1F4BB] man technologist`
* **Combining Marks / CM** (`"cm"`) — [characters](./ensip-15/cm.csv) that are [Combining Marks](https://unicode.org/faq/char_combmark.html)
* **Non-spacing Marks / NSM** (`"nsm"`) — valid [subset](./ensip-15/nsm.csv) of **CM** with general category (`"Mn"` or `"Me"`)
* **Combining Marks / CM** (`"cm"`) — [characters](#appendix-additional-resources) that are [Combining Marks](https://unicode.org/faq/char_combmark.html)
* **Non-spacing Marks / NSM** (`"nsm"`) — valid [subset](#appendix-additional-resources) of **CM** with general category (`"Mn"` or `"Me"`)
* **Maximum NSM** (`"nsm_max"`) — maximum sequence length of unique **NSM**
* **Should Escape** (`"escape"`) — [characters](./ensip-15/escape.csv) that shouldn't be printed
* **NFC Check** (`"nfc_check"`) — valid [subset](./ensip-15/nfc_check.csv) of characters that [may require NFC](https://unicode.org/reports/tr15/#NFC_QC_Optimization)
* **Should Escape** (`"escape"`) — [characters](#appendix-additional-resources) that shouldn't be printed
* **NFC Check** (`"nfc_check"`) — valid [subset](#appendix-additional-resources) of characters that [may require NFC](https://unicode.org/reports/tr15/#NFC_QC_Optimization)

## Description of `nf.json`

Expand Down Expand Up @@ -343,7 +343,7 @@ A label composed of confusable characters isn't necessarily confusable.
* `3002 (。) IDEOGRAPHIC FULL STOP`
* `FF0E (.) FULLWIDTH FULL STOP`
* `FF61 (。) HALFWIDTH IDEOGRAPHIC FULL STOP`
* [Many characters](./ensip-15/disallowed.csv) are **disallowed** for various reasons:
* [Many characters](#appendix-additional-resources) are **disallowed** for various reasons:
* Nearly all punctuation are **disallowed**.
* Example: `589 (։) ARMENIAN FULL STOP`
* All parentheses and brackets are **disallowed**.
Expand Down Expand Up @@ -379,7 +379,7 @@ A label composed of confusable characters isn't necessarily confusable.
* `2E3A (⸺) TWO-EM DASH``"--"`
* `2E3B (⸻) THREE-EM DASH``"---"`
* Characters are assigned to **Groups** according to [Unicode Script_Extensions](https://www.unicode.org/reports/tr24/#Script_Extensions_Def).
* **Groups** may contain [multiple scripts](./ensip-15/groups.md):
* **Groups** may contain [multiple scripts](#appendix-additional-resources):
* Only *Latin*, *Greek*, *Cyrillic*, *Han*, *Japanese*, and *Korean* have access to *Common* characters.
* *Latin*, *Greek*, *Cyrillic*, *Han*, *Japanese*, *Korean*, and *Bopomofo* only permit specific **Combining Mark** sequences.
* *Han*, *Japanese*, and *Korean* have access to `a-z`.
Expand All @@ -390,9 +390,9 @@ A label composed of confusable characters isn't necessarily confusable.
* Ethereum symbol (`39E (Ξ) GREEK CAPITAL LETTER XI`) is case-folded and *Common*.
* Emoji:
* All emoji are [fully-qualified](https://www.unicode.org/reports/tr51/#def_fully_qualified_emoji).
* Digits (`0-9`) are [not emoji](./ensip-15/emoji.md#demoted-unchanged).
* Emoji [mapped to non-emoji by IDNA](./ensip-15/emoji.md#demoted-mapped) cannot be used as emoji.
* Emoji [disallowed by IDNA](./ensip-15/emoji.md#disabled-emoji-characters) with default text-presentation are **disabled**:
* Digits (`0-9`) are [not emoji](#appendix-additional-resources).
* Emoji [mapped to non-emoji by IDNA](#appendix-additional-resources) cannot be used as emoji.
* Emoji [disallowed by IDNA](#appendix-additional-resources) with default text-presentation are **disabled**:
* `203C (‼️) double exclamation mark`
* `2049 (⁉️) exclamation question mark `
* Remaining emoji characters are marked as **disallowed** (for text processing).
Expand All @@ -418,7 +418,7 @@ A label composed of confusable characters isn't necessarily confusable.

* 99% of names are still valid.
* Preserves as much [Unicode IDNA](https://unicode.org/reports/tr46/) and [WHATWG URL](https://url.spec.whatwg.org/#idna) compatibility as possible.
* Only [valid emoji sequences](./ensip-15/emoji.md#valid-emoji-sequences) are permitted.
* Only [valid emoji sequences](#appendix-additional-resources) are permitted.

## Security Considerations

Expand Down Expand Up @@ -454,7 +454,7 @@ Copyright and related rights waived via [CC0](https://creativecommons.org/public
## Appendix: Reference Specifications

* [EIP-137: Ethereum Domain Name Service](https://eips.ethereum.org/EIPS/eip-137)
* [ENSIP-1: ENS](./ensip-1-ens.md)
* [ENSIP-1: ENS](./1)
* [UAX-15: Normalization Forms](https://unicode.org/reports/tr15/)
* [UAX-24: Script Property](https://www.unicode.org/reports/tr24/)
* [UAX-29: Text Segmentation](https://unicode.org/reports/tr29/)
Expand All @@ -471,15 +471,19 @@ Copyright and related rights waived via [CC0](https://creativecommons.org/public

## Appendix: Additional Resources

* [Supported Groups](./ensip-15/groups.md)
* [Supported Emoji](./ensip-15/emoji.md)
* [Additional Disallowed Characters](./ensip-15/disallowed.csv)
* [**Ignored** Characters](./ensip-15/ignored.csv)
* [**Should Escape** Characters ](./ensip-15/ignored.csv)
* [Supported Groups](https://github.com/adraffy/ens-normalize.js/blob/main/tools/ensip/groups.md)
* [Supported Emoji](https://github.com/adraffy/ens-normalize.js/blob/main/tools/ensip/emoji.md)
* [Additional Disallowed Characters](https://github.com/adraffy/ens-normalize.js/blob/main/tools/ensip/disallowed.csv)
* [Ignored Characters](https://github.com/adraffy/ens-normalize.js/blob/main/tools/ensip/ignored.csv)
* [Should Escape Characters ](https://github.com/adraffy/ens-normalize.js/blob/main/tools/ensip/escape.csv)
* [Combining Marks](https://github.com/adraffy/ens-normalize.js/blob/main/tools/ensip/cm.csv)
* [Non-spacing Marks](https://github.com/adraffy/ens-normalize.js/blob/main/tools/ensip/nsm.csv)
* [Fenced Characters](https://github.com/adraffy/ens-normalize.js/blob/main/tools/ensip/fenced.csv)
* [NFC Quick Check](https://github.com/adraffy/ens-normalize.js/blob/main/tools/ensip/nfc_check.csv)

## Appendix: Validation Tests

A list of [validation tests](./ensip-15/tests.json) are provided with the following interpretation:
A list of [validation tests](https://github.com/adraffy/ens-normalize.js/blob/main/validate/tests.json) are provided with the following interpretation:

* Already Normalized: `{name: "a"}``normalize("a")` is `"a"`
* Need Normalization: `{name: "A", norm: "a"}``normalize("A")` is `"a"`
Expand Down

0 comments on commit c62d51d

Please sign in to comment.