Depends on D4505
Fixing function for getting utf8 string length. Now it returns proper length of string with multibytes characters, not a count of bytes that string takes.
Differential D4544
[services] Backup - Add function to get utf8 string length • jakub on Jul 15 2022, 3:01 AM. Authored by Tags None Referenced Files
Details
Diff Detail
Event TimelineComment Actions This looks much cleaner than before! Can you either explain how this works, or link a resource that does?
Comment Actions My questions have not been answered, but this diff has appeared in my queue. If you update a diff but aren't yet ready for review, but please hit "Plan Changes" to keep the diff off of your reviewer's queues Comment Actions This function seems to be redundant, because the standard c++ string method "length" or "size" return exact size of string in bytes. Example code, using the UTF8-CPP library for comparison and conversion from string to hex values #include "utf8/utf8.h" #include <iostream> #include <string> using namespace std; void show(string str) { cout << "calculate for [" << str << "]" << endl; cout << "normal size: " << str.size() << endl; cout << "utf-8 size: " << utf8::distance(str.begin(), str.end()) << endl; } std::string string_to_hex(const std::string& input) { static const char hex_digits[] = "0123456789ABCDEF"; std::string output; output.reserve(input.length() * 2); for (unsigned char c : input) { output.push_back(hex_digits[c >> 4]); output.push_back(hex_digits[c & 15]); output.push_back(' '); } return output; } int main() { string s = "zasiąść"; show(s); // the same output as from https://mothereff.in/utf-8 cout << string_to_hex(s) << endl; } /** * output: calculate for [zasiąść] normal size: 10 utf-8 size: 7 7A 61 73 69 C4 85 C5 9B C4 87 */ /* * bytes from https://mothereff.in/utf-8 \x7A\x61\x73\x69\xC4\x85\xC5\x9B\xC4\x87 */ The result was completely the same.
|