Page MenuHomePhabricator

[lib] Introduce RadixTree
ClosedPublic

Authored by ashoat on Oct 27 2023, 11:30 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 8, 10:04 PM
Unknown Object (File)
Fri, Nov 8, 10:04 PM
Unknown Object (File)
Fri, Nov 8, 10:04 PM
Unknown Object (File)
Fri, Nov 8, 10:04 PM
Unknown Object (File)
Fri, Nov 8, 10:01 PM
Unknown Object (File)
Thu, Nov 7, 6:04 AM
Unknown Object (File)
Tue, Nov 5, 2:13 AM
Unknown Object (File)
Oct 5 2024, 8:05 AM
Subscribers

Details

Summary

We currently use a rather naive approach for prefix search, where we simply add each substring to a HashMap.

This diff introduces a new data structure optimized for prefix search: the radix tree.

After the improvements in the parent diff, this improve the perf of useChatMentionSearchIndex by 35%. Measuring from before the parent diff, the incremental improvement is about 19%. Combined with the parent diff, in total we're improving perf by 64%.

Linear tasks: ENG-5137 and ENG-5480

Depends on D9625

Test Plan
  1. Included unit tests
  2. I tested the chat mentions experience
  3. I did some perf testing:

In combination with the following diff, I used this patch to test performance before and after this change. I made sure I had at least three samples of each scenario. Will also link my messy Gist of results, but it's not really interpretable by anyone other than me.

Here's the relevant portion:

BEFORE

 LOG  useChatMentionSearchIndex took 1801ms
 LOG  useChatMentionSearchIndex took 1748ms
 LOG  useChatMentionSearchIndex took 1730ms
 LOG  useChatMentionSearchIndex took 1831ms

 AVERAGE 1777.5ms

JUST DEDUP (parent diff)

 LOG  useChatMentionSearchIndex took 1027ms
 LOG  useChatMentionSearchIndex took 949ms
 LOG  useChatMentionSearchIndex took 957ms

 AVERAGE 977.7ms

DEDUP + RADIX TREE

 LOG  useChatMentionSearchIndex took 643ms
 LOG  useChatMentionSearchIndex took 629ms
 LOG  useChatMentionSearchIndex took 651ms
 LOG  useChatMentionSearchIndex took 609ms

 AVERAGE 633ms

JUST RADIX TREE

 LOG  useChatMentionSearchIndex took 1394ms
 LOG  useChatMentionSearchIndex took 1468ms
 LOG  useChatMentionSearchIndex took 1511ms
 LOG  useChatMentionSearchIndex took 1492ms
 LOG  useChatMentionSearchIndex took 1397ms

 AVERAGE 1452.4ms

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

Worth mentioning that I spent like 2-3 hours combing through pretty much every single relevant NPM package before implementing this myself. Keywords I searched for (in various combinations) included "radix tree", "trie", "prefix search", and "patricia".

I found a couple packages that weren't ancient, but mostly they didn't support storing a set of results for a given keywords... just storing the keyword itself. I could've made that work with an additional hashmap, but I didn't really want to bother with needing a second data structure.

There was one package I found that could store a set of results, but it had strange behavior when I tested it.

lib/shared/radix-tree.js
5–14 ↗(On Diff #32475)

These are purposefully not read-only. These objects never get exposed outside of this class (unless somebody accesses root directly), and it improves perf to modify them in-place rather than having to construct new ones every time

Measuring from before the parent diff, the incremental improvement is about 19%.

I guess there should be also a significant improvement in memory usage

lib/shared/radix-tree.js
41 ↗(On Diff #32475)

We can avoid this iteration by replacing an array with a map from the first character to a node (for children prop. values should probably stay as is). This should improve the performance and simplify the code.

49 ↗(On Diff #32475)

Doesn't matter at all, but it looks safer to first test the length and then the first char

137 ↗(On Diff #32475)

It doesn't seem necessary to use the stack in this case, as it will always have 0 or 1 element because at most one child of a node can be considered as a possible match. In each while iteration, we're removing one item from a stack and adding up to one item - we can replace the stack with a single value.

145 ↗(On Diff #32475)

This can match at most once per while iteration because every child from node.children starts with a different char.

This revision is now accepted and ready to land.Oct 30 2023, 4:04 AM
  1. @tomek’s nit about ordering about conditions
  2. @tomek’s feedback about not needing a stack for exact match
  3. @tomek’s feedback to use a map from first char for children (instead of an array) to improve perf
  4. To maintain compatibility with old approach, I updated the code to "dedup" identical values using Sets
  5. To reduce unnecessary memory usage, I got rid of leaf: false

I tested performance again and it's about the same as it was in the last revision of the diff

This revision was automatically updated to reflect the committed changes.
lib/shared/radix-tree.js
140 ↗(On Diff #32513)

Creating a new set here is wasteful... I was doing it this way before I changed RadixTreeLeafNode to store a set instead of an array, but I forgot to change it back afterwards

lib/shared/radix-tree.js
140 ↗(On Diff #32513)

Solved in D9647