Page MenuHomePhabricator

[keyserver] Clean up fetchMessageInfos using ROW_NUMBER window function
ClosedPublic

Authored by ashoat on Jul 28 2022, 11:20 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Dec 28, 2:07 AM
Unknown Object (File)
Fri, Dec 27, 9:51 AM
Unknown Object (File)
Thu, Dec 26, 6:15 PM
Unknown Object (File)
Sun, Dec 15, 7:27 AM
Unknown Object (File)
Sun, Dec 15, 7:27 AM
Unknown Object (File)
Sun, Dec 15, 7:26 AM
Unknown Object (File)
Sun, Dec 15, 7:22 AM
Unknown Object (File)
Dec 9 2024, 5:09 AM

Details

Summary

From reviewing some StackOverflow answers, I learned that MariaDB 10.8 supports ROW_NUMBER, which wasn't previously supported in MySQL 5.7. We can simplify this query while maintaining the same performance.

Depends on D4674

Test Plan

I first tested this stack locally in my environment to make sure the queries worked. Then I patched it onto my production keyserver and confirmed performance was improved

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

Looks great!

keyserver/src/fetchers/message-fetchers.js
281

Do we still need to support the old engine? This query contains code specific to MariaDB LIMIT 18446744073709551615 so it is not specific to mysql5.7.

331

What if two different messages have the same time? We can handle that in two ways: we can either add another order by column, e.g. id, or we can use RANK instead of ROW_NUMBER. The difference between these two is that when ROW_NUMBER will see an equal result, it will assign non deterministically different values, but RANK will assign the same value, but then the next value is increased accordingly. So if we order by a field f the following will happen

fRANKROW_NUMBER
111
222
223
344

So overall, in our use case, it is safer to use RANK. At the same time, we should use ids to sort when the same timestamp is assigned.

This revision is now accepted and ready to land.Jul 29 2022, 4:33 AM
keyserver/src/fetchers/message-fetchers.js
281

Do we still need to support the old engine?

Yes until all dev environments have been migrated.

This query contains code specific to MariaDB LIMIT 18446744073709551615 so it is not specific to mysql5.7.

That's valid, I could remove that patch. That said, I did see a StackOverflow comment yesterday implying it was generally good practice to apply this LIMIT. Given MariaDB's relationship with MySQL I'm willing to bet MySQL 8 may have the same requirement.

Either way, I expect this code will be deleted soon.

331
  1. It seems unlikely that two messages have the same time, but it's possible
  2. Seems like it's better to just add id to the ordering than to use RANK, since it maintains the property that we return no more than numberPerThread results
  3. If we want to modify the ORDER BY in that way, it might be best to do it for all message-fetchers.js code in a separate diff, rather than just for fetchMessageInfos
keyserver/src/fetchers/message-fetchers.js
281

ENG-1394 tracks the deprecation. Hoping to have it complete by Monday, August 8th

331

Created ENG-1488 to track this