Page MenuHomePhabricator

[keyserver] Create a function to query for all uploads from a provided threadID
ClosedPublic

Authored by rohan on Jan 31 2023, 11:04 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 13, 5:38 AM
Unknown Object (File)
Wed, Nov 13, 5:38 AM
Unknown Object (File)
Wed, Nov 13, 5:38 AM
Unknown Object (File)
Fri, Nov 8, 5:19 PM
Unknown Object (File)
Fri, Nov 8, 12:30 AM
Unknown Object (File)
Fri, Nov 8, 12:30 AM
Unknown Object (File)
Fri, Nov 8, 12:30 AM
Unknown Object (File)
Fri, Nov 8, 12:30 AM
Subscribers

Details

Summary

We create a function in upload-fetchers to query the database for uploads given a specific threadID. The query uses a UNION to join two select statements, one for the photos and one for the videos. We get the necessary information for both media, and for the thumbnail data for videos, the photos are defaulted to NULL.

https://linear.app/comm/issue/ENG-2873/create-a-function-to-query-for-all-uploads-from-a-provided-thread

Depends on D6482

Test Plan

I tested the membership permissions in two ways:

  1. By simulating a user who should have access to the media, navigating to the thread and confirming that the shared media is still present in the gallery.
  1. More importantly, to try to simulate a user who shouldn't have access to the shared media, I ran the query in TablePlus and filled in the ${viewer.id} parameter with a user who definitely should not have access to the thread media. I confirmed that the query returned 0 results, as opposed to, prior to the permissions checking, the query always returned all of the media.

Screenshot 2023-02-26 at 10.16.43 AM.png (596×1 px, 60 KB)

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ashoat requested changes to this revision.Jan 31 2023, 12:14 PM
ashoat added inline comments.
keyserver/src/fetchers/upload-fetchers.js
126 ↗(On Diff #21699)

filename NOT LIKE 'thumb%' is not an okay condition here. You should expect that your reviewers won't be okay with this, just like you should expect your reviewers won't be okay with things like parentThreadID === '1'. Try to think more critically about the changes you're putting up, and address things like this BEFORE you put diffs up, so your reviewers don't have to spend time annotating your diffs / pointing them out

(In case it's not obvious, this check is not okay because you're going to catch files that are not thumbnails and just coincidentally have similar looking filenames)

This revision now requires changes to proceed.Jan 31 2023, 12:14 PM
rohan edited the test plan for this revision. (Show Details)

Add optional param limit for future diffs - this will be important for when we only want to display 6 media before a user clicks to
see more (so we don't fetch all media unnecessarily)

rohan planned changes to this revision.Feb 4 2023, 8:28 PM

Still need to fix the query to change filename NOT LIKE 'thumb%'

Change limit from being an optional param (will be used up the stack) since even the full screen media gallery will need to set an initial limit on how many media to fetch initially

rohan planned changes to this revision.Feb 6 2023, 1:48 PM

There's a better way of doing this since mediaFromRow is a little out of date given that we don't actually return the thumbnail information in the objects.

According to @atul, in the messages table we hold an object that contains the relevant information for media - if it's a photo, we have the type and uploadID, and if it's a video, we have both of those plus the thumbnailUploadID (link he provided https://blob.sh/a2377b.png). So in reality, we can get the uploads for the specific thread and construct the MediaMessageServerDBContent type for each upload by checking for messages where uploads.container = messages.id, and destructuring the object.

With that, we can call fetchMediaFromMediaMessageContent and get a proper Media array in return. This gives us the benefit of using the Multimedia component up the stack to 1. re-use internal components and 2. not have to load entire videos but instead just display the thumbnails before a user clicks on the media item.

Left lots of comments (for now), they can be removed before landing if needed. Changed the way I retrieve the media from the server. Now,
we don't filter out thumbnails since they'll get merged in with the video object, but instead keep count of how many thumbnails we had to
retrieve to update the offset accordingly.

rohan added inline comments.
keyserver/src/fetchers/upload-fetchers.js
119 ↗(On Diff #22694)

Screenshot 2023-02-17 at 6.34.43 PM.png (112×560 px, 30 KB)

Type any is going to be changed immediately in the next diff D6531 since the thread types are introduced there. By the time the builds pass here that diff should be updated

Change the ordering of some queries to be more efficient (so we can be more selective about what we fetch from the DB as opposed to fetching all media)

keyserver/src/fetchers/upload-fetchers.js
199 ↗(On Diff #22703)

To clarify this, this is an event where a container is holding multiple media. So if a user uploads 5 media at once, they're all assigned the same container. When fetching, if it so happens that only 3 of those media are able to be rendered before we hit our limit, we'll be hitting the same container in the next fetch.

Due to the nature of how the media are stored in messages, they're all grouped by uploads.container = messages.id, so we will get the 3 from last render and the 2 new ones. For example:

[
 {"type":"video",
  "uploadID":"107022",
   "thumbnailUploadID":"107023"
 },
 {"type":"photo",
  "uploadID":"107025"
 },
 {"type":"photo",
  "uploadID":"107024"
 },
 {"type":"photo",
  "uploadID":"107021"
 },
 {"type":"photo",
  "uploadID":"107020"
 }
]'

So video (107022), photo (107025) and photo (107024) will be rendered the first time, and when we need to fetch more data, we'll get the same 5 media and we'd filter out the IDs we already have and just render photo (107021) and photo (107020).

ashoat requested changes to this revision.Feb 20 2023, 2:37 PM
In D6485#200792, @rohan wrote:

There's a better way of doing this since mediaFromRow is a little out of date given that we don't actually return the thumbnail information in the objects.

What do you mean by "out of date"? Wondering if something is broken on prod, since we use mediaFromRow everywhere

keyserver/src/fetchers/upload-fetchers.js
119 ↗(On Diff #22703)

Get rid of any

125–131 ↗(On Diff #22703)

It seems like you're relying on creation time here, which could introduce a race condition if two people upload two videos at the same time.

Can we just make it so the query returns what we want: one row for each video, each thumbnails as a separate column?

I suspect this can be done with a LEFT JOIN, but I don't recall what the schema is exactly. Can you share a dump of several rows of the uploads table, including photos, videos, and thumbnails?

This revision now requires changes to proceed.Feb 20 2023, 2:37 PM
In D6485#200792, @rohan wrote:

There's a better way of doing this since mediaFromRow is a little out of date given that we don't actually return the thumbnail information in the objects.

What do you mean by "out of date"? Wondering if something is broken on prod, since we use mediaFromRow everywhere

We use mediaFromRow for IMAGES and constructMediaFromMediaMessageContentsAndUploadRows for MULTIMEDIA.

What do you mean by "out of date"? Wondering if something is broken on prod, since we use mediaFromRow everywhere

Basically what Atul said above, constructMediaFromMediaMessageContentsAndUploadRows helps construct Media objects for both photos and videos

keyserver/src/fetchers/upload-fetchers.js
119 ↗(On Diff #22703)

I needed a type for request here to satisfy flow in my editor, and ThreadFetchMediaRequest is introduced in the next diff and it's fixed there

Would you prefer me to combine the two diffs into one?

125–131 ↗(On Diff #22703)

In terms of the creation_time here, I thought that ordering by creation_time is ideal so we can render the media from newest --> oldest

Also sure, here are some rows with all three, here are 12 records. The thumbnails will be with type: "photo" and their filename will begin with "thumb".

keyserver/src/fetchers/upload-fetchers.js
119 ↗(On Diff #22703)

The types should always come first. I don't need you need to squash the diffs... you can either move the types you need over here, or potentially reorder the diffs if that's possible

125–131 ↗(On Diff #22703)

Nice!! Now can you LEFT JOIN in the container?

You'll first need to first out what table each of the containers is in, so that you can craft the query. To do that, start by LEFT JOIN-ing the ids table. You won't look at ids in your final query, but right now you just want to see the table_name for each ID.

Once you know the tables you need to LEFT JOIN, craft a query with a LEFT JOIN for each individual table. Eg. if you seem that some containers (maybe all?) are messages, then join them in and select the content column. Now share the results of that query here and we can figure out how to proceed.

125–131 ↗(On Diff #22703)
  • First need to figure out
keyserver/src/fetchers/upload-fetchers.js
125–131 ↗(On Diff #22703)

I know that all containers = some message id, so I can left join on upload.container = messages.id.

Here's my query with LEFT JOIN. I know that all uploads.container corresponds with a messages.id.

The results show that each result with a container has it's content field duplicated across each of the results that have the same container (since my query just gets the entire messages.content field for that specific container)

keyserver/src/fetchers/upload-fetchers.js
125–131 ↗(On Diff #22703)

Sorry, here is the corrected query without the GROUP BY

keyserver/src/fetchers/upload-fetchers.js
125–131 ↗(On Diff #22703)

Awesome!! So it looks like we have all the data we need here, but we'll need to do a JSON_SEARCH / JSON_EXTRACT on the content column to get it.

Maybe we want our ultimate query to be something like this:

  1. Query for all messages of multimedia / images type for this thread
  2. Left join in uploads table where container matches the message ID
  3. Group by discrete media item
    • This will be a little hard since a message can have multiple media, and a media can have multiple uploads (video + thumbnail)
    • I think the approach should be to group by the uploadID of the object in the content array that corresponds to the row. So if the row is a photo / video, we group by its id. But if the row is a video thumbnail, we group by the uploadID that is in the same object as its thumbnailUploadID

This might be a hard SQL query to craft... feel free to grab some time with me tomorrow (maybe 4pm ET?) and we can work on it together.

keyserver/src/fetchers/upload-fetchers.js
125–131 ↗(On Diff #22703)

Here's the query that works from our 1:1:

SELECT content.photo AS uploadID,
  u.secret AS uploadSecret,
  u.type AS uploadType, u.extra AS uploadExtra,
  u.container,
  NULL AS thumbnailID,
  NULL AS thumbnailUploadSecret,
  NULL AS thumbnailUploadType,
  NULL AS thumbnailUploadExtra
FROM messages m
LEFT JOIN JSON_TABLE(
  m.content,
  "$[*]" COLUMNS(photo INT PATH "$")
) content ON 1
LEFT JOIN uploads u ON u.id = content.photo
WHERE m.thread = ${request.threadID} AND m.type = 14
UNION SELECT content.media AS uploadID,
  uv.secret AS uploadSecret,
  uv.type AS uploadType, uv.extra AS uploadExtra,
  uv.container,
  content.thumbnail AS thumbnailID,
  ut.secret AS thumbnailUploadSecret,
  ut.type AS thumbnailUploadType,
  ut.extra AS thumbnailUploadExtra
FROM messages m
LEFT JOIN JSON_TABLE(
  m.content,
  "$[*]" COLUMNS(
    media INT PATH "$.uploadID",
    thumbnail INT PATH "$.thumbnailUploadID"
  )
) content ON 1
LEFT JOIN uploads uv ON uv.id = content.media
LEFT JOIN uploads ut ON ut.id = content.thumbnail
WHERE m.thread = ${request.threadID} AND m.type = 15;

(It's missing LIMIT and ORDER BY, though.)

keyserver/src/fetchers/upload-fetchers.js
125–131 ↗(On Diff #22703)

Oh and you should use messageTypes constants instead of 14 and 15

Update query following 1:1 with Ashoat and move type declarations into this diff. I no longer need to use constructMediaFromMediaMessageContentsAndUploadRows since it'll be cleaner constructing the Media objects in the same function, as opposed to constructing an additional array of MediaMessageServerDBContent. Also, I'm pretty sure the uploadMap is going to have trouble finding thumbnails since this query puts videos and thumbnails in one row together. So it just seems cleaner to handle it within fetchMediaForThread

thumbnailUploadType and thumbnailUploadExtra are not used so I'm removing them from the query

Screenshot 2023-02-24 at 5.23.21 PM.png (560×2 px, 568 KB)

Please only land if you can confirm that the ORDER BY and LIMIT apply to the whole query, not just the second half

keyserver/src/fetchers/upload-fetchers.js
158–159 ↗(On Diff #23081)

Do the ORDER BY and LIMIT apply to the whole query or just the UNION SELECT part?

This revision is now accepted and ready to land.Feb 24 2023, 2:57 PM
ashoat requested changes to this revision.Feb 25 2023, 6:24 PM

Permission checks query seems good. Can you amend the Test Plan to mention how you tested them?

On the ORDER BY / LIMIT question from earlier, I think you had mentioned over chat that you would have to wrap the SELECT / UNION SELECT and apply them to the whole thing. Is that still your belief? I noticed this StackOverflow implies the opposite, but I haven't done any testing to confirm.

keyserver/src/fetchers/upload-fetchers.js
143 ↗(On Diff #23098)

Please make sure all lines are no longer than 80 chars

This revision now requires changes to proceed.Feb 25 2023, 6:24 PM

Make sure lines are < 80 characters

On the ORDER BY / LIMIT question from earlier, I think you had mentioned over chat that you would have to wrap the SELECT / UNION SELECT and apply them to the whole thing. Is that still your belief? I noticed this StackOverflow implies the opposite, but I haven't done any testing to confirm.

Yeah, I looked into it and two things stand out to me to demonstrate that my thoughts are still the case:

  1. Tested locally - the first thing I did was use the parenthesis in my query and found that the media got out of sync because it was applying the LIMIT and ORDER BY logic on each individual SELECT, so it some photos would be appearing before the videos because the ORDER BY was not considering both photos and videos. Meanwhile, with the current query, I've manually confirmed the videos/photos are in order, and also it seems to make sense for what we need. The two SELECT statements will be essentially 'unioned' into one resulting set, and only after that will we be using ORDER BY and then LIMIT.
  1. The question in the Stack Overflow post is exactly what we have (no parenthesis), and their 'problem' is what we're looking for. They said the problem is that the LIMIT 0,10 applies to the whole result set, and that's exactly what we want so the photos and videos are not sorted individually. Meanwhile, the answer suggests to use parenthesis so they could apply LIMIT to each individual resultant set before it gets combined into one

Awesome!! I thought you had earlier indicated that a parentheses was necessary, but either way it sounds like now you’re in agreement with that StackOverflow post about the ORDER BY / LIMIT.

This revision is now accepted and ready to land.Feb 26 2023, 10:11 AM