Page MenuHomePhabricator

[lib] Define a function that detects when an ENS name is typed
ClosedPublic

Authored by rohan on Nov 14 2023, 9:08 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jan 5, 11:31 AM
Unknown Object (File)
Thu, Jan 2, 11:59 PM
Unknown Object (File)
Tue, Dec 31, 3:02 PM
Unknown Object (File)
Tue, Dec 31, 3:02 PM
Unknown Object (File)
Tue, Dec 31, 3:02 PM
Unknown Object (File)
Tue, Dec 31, 3:02 PM
Unknown Object (File)
Tue, Dec 31, 3:02 PM
Unknown Object (File)
Tue, Dec 31, 3:02 PM
Subscribers

Details

Summary

We perform two checks here to ensure the validity of an ENS name that is typed in the search experiences.

The first one is a general check to make sure the name adheres to the requirements listed in the Ethereum spec. Calling toAscii will throw an error if there is an invalid character, and at that point we can decide it's an invalid ENS name and not do anything special.

The second check is more specific and is a requirement we're enforcing to limit the number of name-to-address lookups that need to be done. We want to ensure that the TLD is .eth and the SLD has at least three characters, as that seems to be a general standard enforced by the ENS team for now. To do this, a regex pattern is easy enough to check on the input text. If there is a match, this means we have a valid ENS name typed out in the search, and only at this point do we want to do a forward lookup.

Addresses ENG-5411 and ENG-5413

Test Plan

Wrote unit tests and all pass. Tried the valid/invalid names in the ENS domain app. Also played around with the regex in this playground to make sure it matched only valid ens names.

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

rohan held this revision as a draft.
rohan published this revision for review.Nov 15 2023, 7:32 AM
rohan edited the summary of this revision. (Show Details)
ashoat requested changes to this revision.Nov 15 2023, 8:04 AM

Have you checked the spec?

For Javascript implementations, a library is available that normalises and checks names.

Should we consider just using that? It doesn't have a ton of downloads, but maybe we can find another one that's better.

eth-ens-namehash (a package we already use) lists a dependency on idna-uts46-hx, which appears to be a better-maintained fork with many more weekly downloads.

This revision now requires changes to proceed.Nov 15 2023, 8:04 AM

Have you checked the spec?

For Javascript implementations, a library is available that normalises and checks names.

Should we consider just using that? It doesn't have a ton of downloads, but maybe we can find another one that's better.

eth-ens-namehash (a package we already use) lists a dependency on idna-uts46-hx, which appears to be a better-maintained fork with many more weekly downloads.

Sure! A maintained package could be better here. I'll look into it and I'll update this diff

Should we consider just using that? It doesn't have a ton of downloads, but maybe we can find another one that's better.

Ok, I've done some research and here is what I found:

idna-uts46-hx
The API exposes two methods to use: toAscii and toUnicode. Both of these use an unexposed method validateLabel to process the labels:

validate.png (1×1 px, 316 KB)

According to the spec, it says we should provide transitional=false and useSTD3AsciiRules=true as options. So what we can do is in a try/catch, call toAscii with the input text, and class it valid if an error is not thrown. Two problems I've noticed with this though are:

  • that it doesn't perform a length verification on the labels. So something like a.eth does not throw an error.
  • it doesn't care about what it ends with. So a.e2 also doesn't throw an error

Wonder if we should do a combined approach that tailors this towards what we're specifically trying to do (executing an address lookup when a name ending in .eth is typed)? I.e. use the toAscii first to handle the heavy lifting on checking the specific rules that adhere to the spec, and if that passes, use regex/vanilla JS checks to make sure the name ends in .eth and has at least three characters?

Both of these use an unexposed method validateLabel to process the labels:

What do you mean by "unexposed"? Are we able to access it?

So something like a.eth does not throw an error.

Huh, is a.eth an invalid ENS name? Didn't realize!

Wonder if we should do a combined approach

That sounds reasonable... you could do a first pass using your own (very general) logic, and then a second pass to verify the name is actually valid using idna-uts46-hx. Mostly want to make sure that we are correctly matching ENS names according to the spec, so I figure using something like idna-uts46-hx makes sense.

Both of these use an unexposed method validateLabel to process the labels:

What do you mean by "unexposed"? Are we able to access it?

The package doesn't export it and the API docs don't mention it, so I'm thinking we cannot access it. It seems more trouble than its worth to try to, and we should probably stick with toAscii.

So something like a.eth does not throw an error.

Huh, is a.eth an invalid ENS name? Didn't realize!

I thought it was since it's < 3 characters. This is what https://app.ens.domains/ shows me (as well as when I try to type anything less than 3 characters before .eth):

Screenshot 2023-11-16 at 4.00.32 PM.png (738×1 px, 187 KB)

Wonder if we should do a combined approach

That sounds reasonable... you could do a first pass using your own (very general) logic, and then a second pass to verify the name is actually valid using idna-uts46-hx. Mostly want to make sure that we are correctly matching ENS names according to the spec, so I figure using something like idna-uts46-hx makes sense.

Ok that makes sense.

The spec does not mention a requirement for a name to end in .eth, and the ENS docs say 'not all ENS names end with .eth', so maybe we should skip enforcing that check before executing a lookup?

Also similarly, the spec doesn't mention any length requirements for the part before the .eth part, but https://app.ens.domains/ calls anything less than three characters 'too short'... so maybe the only additional validation we should do is check that the length of the part before .eth is >= 3 chars?

As it is, if we only use the toAscii from idna-uts46-hx, I'm just worried about the number of name to address lookup calls being made since even regular strings like hello or a obviously won't have an error thrown. So any extra validation we can do might be useful to limiting how often we have to perform a lookup

The package doesn't export it and the API docs don't mention it, so I'm thinking we cannot access it.

If we can access it by importing a specific file from the package, I wouldn't be opposed to it. We do that for other packages, and this package seems pretty stable. That said, not sure how much value this function would give us.

I thought it was since it's < 3 characters.

Got it, thanks for explaining. A quick Google confirms what you're saying.

The spec does not mention a requirement for a name to end in .eth, and the ENS docs say 'not all ENS names end with .eth', so maybe we should skip enforcing that check before executing a lookup?

I think we can enforce the .eth suffix. It's hypothetically possible for ENS names not to end with .eth, but I think the ENS team hasn't implemented that yet.

Also similarly, the spec doesn't mention any length requirements for the part before the .eth part, but https://app.ens.domains/ calls anything less than three characters 'too short'... so maybe the only additional validation we should do is check that the length of the part before .eth is >= 3 chars?

Not sure about this... maybe the spec allows for it, but the team doesn't allow it?

As it is, if we only use the toAscii from idna-uts46-hx, I'm just worried about the number of name to address lookup calls being made since even regular strings like hello or a obviously won't have an error thrown. So any extra validation we can do might be useful to limiting how often we have to perform a lookup

Makes sense. I don't think we should query on any random string. TLD of .eth with an SLD that has a min three chars makes sense to me

Use a combined approach (Ethereum spec guidelines and our specific guidelines) to determine a valid ENS name

lib/package.json
45 ↗(On Diff #33371)

Used this version since it's the same one that's in the yarn.lock currently from being a listed dependency. I tried v5.1.2 but I got an error when cleaninstalling that it required a higher version of node

idna-uts46-hx@5.1.2: The engine "node" is incompatible with this module. Expected version ">=20.6.1". Got "18.17.1"

So I just chose the version that is currently in our yarn.lock from being a dependency of eth-ens-namehash

eth-ens-namehash@^2.0.8:
  version "2.0.8"
  resolved "https://registry.yarnpkg.com/eth-ens-namehash/-/eth-ens-namehash-2.0.8.tgz#229ac46eca86d52e0c991e7cb2aef83ff0f68bcf"
  integrity sha512-VWEI1+KJfz4Km//dadyvBBoBeSQ0MHTXPvr8UIXiLW6IanxvAV+DmlZAijZwAyggqGUfwQBeHf7tc9wzc1piSw==
  dependencies:
    idna-uts46-hx "^2.3.1"
    js-sha3 "^0.5.7"
rohan retitled this revision from [lib] Define valid RegEx for detecting an ENS name is typed to [lib] Define a function that detects when an ENS name is typed.Nov 17 2023, 6:33 AM
rohan edited the summary of this revision. (Show Details)
ashoat requested changes to this revision.Nov 17 2023, 12:27 PM

Worried that the RegExp is invalidating some legitimate ENS names. Please try to read the spec and figure out what counts as a legitimate ENS name. Would be good to introduce some test cases to cover those

lib/shared/markdown.js
350 ↗(On Diff #33371)

I'm worried that this RegEx is too specific, and will invalidate some legitimate ENS names. In particular, I'm worried about the character set here. Doesn't the ENS spec allow for eg. Unicode chars? I think there's some process for normalizing those Unicode names, but that doesn't mean we should consider them invalid

lib/utils/ens-helpers.js
2 ↗(On Diff #33371)

Please maintain a blank newline after the // @flow pragma. In general, please try to maintain style conventions we have in the codebase. Ideally this is not something that needs to ever be discussed during diff review

82 ↗(On Diff #33371)

Typo: recommended

This revision now requires changes to proceed.Nov 17 2023, 12:27 PM
  1. Updated the RegEx to match any character except newline to match all unicode characters (like ö) but also to match emojis (🎃)
  2. Changed the order of isValidENSName to match the regex first. This is to prevent converting an invalid parameter like ö.eth toAscii, which would then meet the length requirements of the regex statement.
  3. Updated the unit tests, used this post to identify the valid one-char emojis that can be registered as ENS names. Cross checked this manually on https://app.ens.domains/
lib/shared/markdown.js
356 ↗(On Diff #33431)

This is really general as opposed to being super specific on characters because it seems like generally matching emojis with regex is a little complicated:

This covers the general case of making sure the TLD is .eth and the SLD is a valid set of 3 characters or more, while also matching emojis that may look like 1 character but are composed of 3 or more characters.

The more complicated check of valid characters is covered by the uts46.toAscii check anyways to make sure everything adheres to the spec

lib/shared/markdown.js
356 ↗(On Diff #33431)

I also see @ginsu brought in an emoji-regex package that defines a valid emoji regex, so that could possibly be used in combination with something like \p{L} that matches unicode characters to make this regex a little more specific if that's preferred

This looks good. I was surprised to see that 💂‍♂️.eth is a legitimate name despite the 3-char limit, but https://ens.domains seems content to register it, so I guess it counts.

This revision is now accepted and ready to land.Nov 20 2023, 8:22 AM