issues: https://linear.app/comm/issue/ENG-3314/implement-a-function-for-processing-messages-for-search-and-storing,
https://linear.app/comm/issue/ENG-3315/implement-adding-processed-messages-to-db-when-new-messages-are
tokenizeAndStem splits the text into words, removes stopwords, and stemms the remaining words. It returns an array. The second paramether is a boolean tellinng whether to keep the stopwords. source
code:https://github.com/NaturalNode/natural/blob/master/lib/natural/stemmers/stemmer.js
Details
Tested that when a new message is created, a proper field appears in the search table. Tested that when a message is edited (by passing an artificial edit message to processMessagesForSearch
function) a proper field is edited in the search table.
Diff Detail
- Repository
- rCOMM Comm
- Branch
- inka/testing_db
- Lint
No Lint Coverage - Unit
No Test Coverage
Event Timeline
keyserver/src/database/search_utils.js | ||
---|---|---|
36–38 ↗ | (On Diff #23751) |
|
Looks good, but a couple notes in message-creator.js
keyserver/src/creators/message-creator.js | ||
---|---|---|
186 | We should only pass newMessageInfos in here. The messages that are in returnMessageInfos but not newMessageInfos are messages that have already been created. Those messages should have already been indexed This brings to mind a question: is the indexing process idempotent? Meaning, if I index the same message twice, will it be the same as if I indexed that message once? | |
207 | I don't think we should block the return on this. Search indexing is usually implemented as a "post-processing step"... the user creating the message shouldn't need to wait on the search indexing to complete for the endpoint to return. Instead, I think we should include this in postMessageSendPromise. Can you move the call to processMessagesForSearch into postMessageSend? You can use the messageInfos parameter (stripLocalIDs should have no effect on indexing I think) | |
keyserver/src/database/search_utils.js | ||
1 | Can you name this file search-utils.js to match the naming convention in the codebase? |