Page MenuHomePhabricator

[keyserver] Add migration - process messages in our db for search
ClosedPublic

Authored by inka on Mar 21 2023, 3:12 AM.
Tags
None
Referenced Files
F3761738: D7117.diff
Fri, Jan 10, 6:05 PM
Unknown Object (File)
Sun, Jan 5, 8:08 PM
Unknown Object (File)
Sun, Jan 5, 4:44 PM
Unknown Object (File)
Wed, Jan 1, 11:12 PM
Unknown Object (File)
Wed, Jan 1, 11:12 PM
Unknown Object (File)
Wed, Jan 1, 11:12 PM
Unknown Object (File)
Wed, Jan 1, 11:12 PM
Unknown Object (File)
Wed, Jan 1, 11:12 PM

Details

Summary

issue:https://linear.app/comm/issue/ENG-3316/implement-processing-messages-that-are-already-in-the-db-for-search
We need to process the messages that are already in our db for search.
pageSize is 1001 because I used the idea from https://mariadb.com/kb/en/pagination-optimization/ that is to fetch one more row than we intend to process. This allows us to know at each query whether it is the last query or not, and saves us from running a query that would return nothing, in case the number of elements is divisible by the page size we actually process.
This is also why if (messages.length === pageSize) I process a table that is one element shorter.
I use the id not the timestamp, because we don't have an index on the timestamp.

This should be landed after D7077

Test Plan

I checked that db_version in metadata table is 19, and run the keyserver. After each test I removed part, or all of the rows in the search table and changed db_version in metadata table to 19. I checked that the processed messages appear in the search table. I checked that no messages of type 0 or 20 are skipped (I artificially added some messages of type 20 to my messages table)

Diff Detail

Repository
rCOMM Comm
Branch
inka/testing_db
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

keyserver/src/database/search_utils.js
68–77 ↗(On Diff #23916)

I used an else if here, because technically there is nothing stopping someone from using this function somewhere else and passing in rows rows that are not type 0 or 20. But I didn't want to have a big if for that at the beginning of the loop, because that looked unclean to me. So I opted for the else if

inka requested review of this revision.Mar 21 2023, 3:28 AM
inka edited the test plan for this revision. (Show Details)

pageSize is 1001 because I used the idea from https://mariadb.com/kb/en/pagination-optimization/ that is to fetch one more row than we intend to process.

Nice trick!

keyserver/src/database/search_utils.js
89–92 ↗(On Diff #23920)

Could the message types 0 and 20 be extracted as constants (similarly to pageSize), or at least add a comment?

68–77 ↗(On Diff #23916)

totally agree on this - your way looks clean

This revision is now accepted and ready to land.Mar 21 2023, 4:56 AM
inka retitled this revision from [keyserver] Add 20th migration - process messages in our db for search to [keyserver] Add migration - process messages in our db for search.Mar 22 2023, 1:48 AM
ashoat requested changes to this revision.Mar 22 2023, 2:01 PM
ashoat added inline comments.
keyserver/src/database/search_utils.js
85 ↗(On Diff #23920)

You are breaking variable naming conventions. We never name variables like this in the JS codebase... it always looks like lastID. Please try to maintain consistency!

85–86 ↗(On Diff #23920)

let is somewhat of an antipattern, and do-while is an unusual construction. Can you revise this code to use a simple while (true) { look with a conditional break inside of it? I think that approach will have less "indirection" and will consequenly be more readable

90–91 ↗(On Diff #23920)

Lots of issues with this query:

  1. You have lines longer than 80 chars. Please always check this before submitting a diff going forward, as it leads to unnecessary review churn!
  2. You should always use constants from JS. Please look at how other queries are constructed
  3. Your query format does not look anything like how we typically format SQL queries
  4. You have trailing spaces

This should be:

SELECT id, type, content, target_message
FROM messages
WHERE (type = {messageTypes.TEXT} OR type = {messageTypes.EDIT_MESSAGE})
  AND id > ${lastID}
ORDER BY id
LIMIT ${pageSize}
101 ↗(On Diff #23920)

Once again, inconsistent variable naming conventions

This revision now requires changes to proceed.Mar 22 2023, 2:01 PM
keyserver/src/database/search_utils.js
85–86 ↗(On Diff #23920)

We have an eslint rule no-constant-condition that disallows constant expressions in conditions. We do not ignore this rule anywhere in the code.
It is turned on by

"extends": [
    "eslint:recommended",

in our .eslintrc.json

And we use a do-while in one other place in the code here

Should I change it anyway?

keyserver/src/database/search_utils.js
85–86 ↗(On Diff #23920)

I just put up D7173 to address this. Can you rebase on top of that and try again?

ashoat requested changes to this revision.Mar 24 2023, 12:14 PM

Back to you

This revision now requires changes to proceed.Mar 24 2023, 12:14 PM
inka added inline comments.
keyserver/src/database/search_utils.js
85–86 ↗(On Diff #23920)

I cannot rebase until @kuba rebases his stack, because I need to be rebased on a patch of his diffs. I will ask him to do that.

Change do-while to a while(true), make messages constant

keyserver/src/database/search-utils.js
85 ↗(On Diff #24262)

I don't think I can avoid this let in a reasonable way

Accepting with a question. Don't think we need to throw the last result away

keyserver/src/database/search-utils.js
97–98 ↗(On Diff #24262)

I understand your strategy of checking if the LIMIT is returned, and assuming we are down if fewer messages are returned

But I'm confused why you aren't processing the 1001th message, and instead fetching it again. Shouldn't we process all of the messages, and set lastID to messages[messages.length - 1].id?

This revision is now accepted and ready to land.Mar 28 2023, 5:45 AM
keyserver/src/database/search-utils.js
97–98 ↗(On Diff #24262)

s/down/done

keyserver/src/database/search-utils.js
97–98 ↗(On Diff #24262)

I don't understand what you mean by "s/down/done"

Now that I think about this, I think this trick is not very useful in this case. The assumption was it lets us know whether there is a next page to fetch and saves us a query that would return nothing. But actually we have to run the same number of queries, we just send more data in total, since we repeat some.

Example: if we had 3003 entries to fetch. In the scenario where we process all of 1001, we would run a 4th query that would return nothing. But in the scenario where we process 1000 out of 1001, we would still run the 4th query, it would just return something. That doesn't really help us.

I will fix this and just fetch 1000

keyserver/src/database/search-utils.js
97–98 ↗(On Diff #24262)

That makes sense. We do something similar in message-fetchers.js (assume that if the result count is the same as the LIMIT, then there is more to fetch), but the difference there is that we have another variable we pass to the client that indicates if we think the result count is "exhaustive" or not. If we weren't passing that variable to the client, then there would be no point

s/down/done corrects a typo of "down" with "done". It's using SED syntax for find-replace

keyserver/src/database/search-utils.js
97–98 ↗(On Diff #24262)

(Actually you can ignore my comment, I'm not sure it's relevant)