Page MenuHomePhabricator

[services][backup] AddAttachments 2/4 - holders string utils
ClosedPublic

Authored by bartek on Jan 9 2023, 12:13 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Dec 14, 2:27 AM
Unknown Object (File)
Thu, Nov 28, 11:32 AM
Unknown Object (File)
Thu, Nov 28, 11:31 AM
Unknown Object (File)
Thu, Nov 28, 11:31 AM
Unknown Object (File)
Thu, Nov 28, 11:31 AM
Unknown Object (File)
Thu, Nov 28, 11:31 AM
Unknown Object (File)
Thu, Nov 28, 4:07 AM
Unknown Object (File)
Nov 5 2024, 4:48 PM
Subscribers

Details

Summary

Attachment holders are provided and then stored in database as a semicolon-separated string. When manipulating them, it's best to operate on HashSets (as they should have no duplicates) and then merge to string again. To avoid unnecessary string allocations, the HashSet contains string slices (&str) instead of actual string copies.

Depends on D6204

Test Plan

Added unit tests for these methods. cargo test passes.

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

bartek held this revision as a draft.
bartek published this revision for review.Jan 9 2023, 12:35 PM
tomek requested changes to this revision.Jan 11 2023, 5:45 AM
tomek added inline comments.
services/backup/src/service/handlers/add_attachments.rs
22 ↗(On Diff #20737)

I don't think it is a good idea to return a value as a parameter. It would be more maintainable if this function returned a new string and the the caller could assign it.

24–31 ↗(On Diff #20737)

Can we simplify it by using join? (maybe we need to collect before joining)

This revision now requires changes to proceed.Jan 11 2023, 5:45 AM
services/backup/src/service/handlers/add_attachments.rs
22 ↗(On Diff #20737)

Generally I agree, but in this particular case mutating the param is a perfect fit, because in my opinion

add_new_attachments(&mut backup.attachment_holders, new_holders);

is cleaner than doing

backup.attachment_holders = add_new_attachments(backup.attachment_holders, new_holders);

Possibly I could rename this function to merge_attachments or sth like that

24–31 ↗(On Diff #20737)

This was my first thought, but HashSet doesn't have join(), so I'd need to collect it into a Vec first

current_holders.iter().collect::<Vec<_>>().join(ATTACHMENT_HOLDER_SEPARATOR).

My solution is done on purpose, as creating a single string and mutating it each time is more efficient.
But maybe in this case efficiency isn't as important and we can do a tradeoff for readability.

Btw. I noticed I made a small mistake in reserve()

services/backup/src/service/handlers/add_attachments.rs
22 ↗(On Diff #20737)

I think the fact that Rust makes you annotate at the callsite makes this much less of an anti-pattern than eg. in C++

  • Replaced my low-level solution with join() as suggested
  • Kept add_new_attachments() as mutating - I consider this a cleaner solution
services/backup/src/service/handlers/add_attachments.rs
17 ↗(On Diff #20884)

Feel like marshalling of the string into a HashSet then an array can be done once we first receive the object, then we can serialize it back into a str before sending it over the wire.

Having a function go from String -> HashSet -> Vec -> String is a smell, in my opinion. Especially if this function may be called many times, it may become a performance concern.

We can do this as future work, as you have a pretty large stack going already.

services/backup/src/service/handlers/add_attachments.rs
17 ↗(On Diff #20884)

Yeah, discussed a bit in https://phab.comm.dev/D6205?id=20737#inline-41744

Generally, performance is one of the reasons why I store HashSet<&str> instead of HashSet<String>. Fortunately, this function is called once per request and these strings are not that huge.
Also, my first version avoided the -> Vec step by implementing join() manually (HashSet doesn't implement Join), but finally gave up.

tomek added inline comments.
services/backup/src/service/handlers/add_attachments.rs
17 ↗(On Diff #20884)

I agree with @jon: it would be better if we stored a set somewhere, then calling this function would result in adding some entries to it, and then, at the end, we would transform it back to a string. But on the other hand, if this function is called only once per request, we wouldn't get any performance benefits. So it seems like keeping the current implementation is a better idea.

22 ↗(On Diff #20737)

Ok, makes sense. The current name add_new_attachments sounds ok to me.

24–31 ↗(On Diff #20737)

I think it is safer and more readable to use built-in function. If we want to use our own logic, we would need to extract it to a function and then write some tests for it. If we are concerned with the performance, we should measure and compare it, but I don't think join would be significantly slower than fold.

This revision is now accepted and ready to land.Jan 16 2023, 5:28 AM