Page MenuHomePhabricator

[client-backup] implement C++ functions for encoding/decoding string in `utf-8`
ClosedPublic

Authored by kamil on Aug 21 2023, 5:03 AM.
Tags
None
Referenced Files
F3386659: D8888.id30136.diff
Fri, Nov 29, 5:35 AM
F3386166: D8888.diff
Fri, Nov 29, 3:29 AM
Unknown Object (File)
Mon, Nov 25, 5:31 PM
Unknown Object (File)
Mon, Nov 25, 2:32 PM
Unknown Object (File)
Sat, Nov 23, 1:47 AM
Unknown Object (File)
Oct 27 2024, 4:25 PM
Unknown Object (File)
Oct 13 2024, 6:35 PM
Unknown Object (File)
Sep 28 2024, 4:41 AM
Subscribers

Details

Summary

ENG-4503.

To encrypt/decrypt UserData we need to have it in bytes (Uint8Array in JS or uint8_t[] in C++). From JS level it's not that easy because standard TextEncoder from JS is not available in react-native, so we're using JSI capabilities to convert this.

Encoding can be done always - any UTF-8 string can be represented as ArrayBuffer.

Decoding can be done only if ArrayBuffer has valid code points, to make this check we use this fact: link.

Depends on D8887

Test Plan
const str = '!ABC+** /==🤬';
const bytes = commUtilsModule.encodeStringToUTF8ArrayBuffer(str);
const str2 = commUtilsModule.decodeUTF8ArrayBufferToString(
  new Uint8Array(bytes).buffer,
);
console.log(str === str2);

try {
  console.log(
    commUtilsModule.decodeUTF8ArrayBufferToString(
      new Uint8Array([72, 101, 108, 108, 111, 0x80]).buffer,
    ),
  );
} catch (e) {
  // should fail
  console.log(e);
}
try {
  console.log(
    commUtilsModule.decodeUTF8ArrayBufferToString(
      new Uint8Array([72, 101, 108, 0x80, 111, 111]).buffer,
    ),
  );
} catch (e) {
  // should fail
  console.log(e);
}
try {
  console.log(
    commUtilsModule.decodeUTF8ArrayBufferToString(
      new Uint8Array([0x80, 101, 108, 108, 111]).buffer,
    ),
  );
} catch (e) {
  // should fail
  console.log(e);
}

try {
  console.log(
    commUtilsModule.decodeUTF8ArrayBufferToString(
      new Uint8Array([72, 101, 108, 108, 111]).buffer,
    ),
  );
} catch (e) {
  // should work
  console.log(e);
}

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kamil held this revision as a draft.
kamil published this revision for review.Aug 21 2023, 6:40 AM
native/cpp/CommonCpp/NativeModules/CommUtilsModule.cpp
153 ↗(On Diff #30136)

How do we ensure in this function that those are correct UTF-8 bytes and can be correctly stringified? If there is no way to ensure it we should aim to detect it and react by throwing or resolving to error message.

kamil edited the test plan for this revision. (Show Details)
kamil added a reviewer: bartek.
kamil added inline comments.
native/cpp/CommonCpp/NativeModules/CommUtilsModule.cpp
153 ↗(On Diff #30136)

fair point - improved

marcin added inline comments.
native/cpp/CommonCpp/NativeModules/CommUtilsModule.cpp
166 ↗(On Diff #30466)

I appreciate the effort you took to implement this validation!

If this is really the only way to check if the initial byte sequence was UTF-8 correct then it is a weakness of JSI. I expected that some source error is thrown or the method returns null or undefined.

If you think that this method is going to be called on large chunks of data and iterating over those data twice might hurt performance then I am willing to accept if you decide to remove this check.

Up to you.

This revision is now accepted and ready to land.Aug 29 2023, 2:23 AM
bartek added inline comments.
native/cpp/CommonCpp/NativeModules/CommUtilsModule.cpp
166 ↗(On Diff #30466)

If the JSI solution has poor performance, we can implement some validations ourselves directly for C++ std::string. Googled some examples (#1,#2)

michal added inline comments.
native/cpp/CommonCpp/NativeModules/CommUtilsModule.cpp
160–161 ↗(On Diff #30466)

My c++ is very rusty (haha) but could we skip allocation + N copy with something like this or some other construction?


Also there might be an option to just use a "view" onto the jsi::Object data instead of allocating anything. But then we would have to make sure we don't deallocate data so it's probably not worth it.

Thanks for the suggestions, I will work on performance as a follow-up: ENG-4825.