How to convert unicode to QString?

Occasionally, we need to convert a number that represents the code point of a character to a QString or QChar. You may want to use the following constructor of QString:

QString s(0x738b);

You want to construct a QString from the Chinese character whose code point is 0x73b8. However, if you lookup the reference documents of Qt for QString, you cannot find a QString construction function that takes a number as the parameter. According to this post, the above constructor actually calls QString::QString(QChar ch), i.e., the constant 0x738b is first used to initialize a  QChar object, which is used to initialize the QString object. QChar does have a constructor QChar::QChar(ushort  code) that constructs a QChar by a code point that is less than 0xffff. Note that QChar has another construction function QChar::QChar(uint code) that seems to accept code point greater than 0xffff, but it actual only uses the lower 2 bytes of the code. QChar internally uses UTF-16(2 bytes) to store a character. QString also stores characters using UTF-16 encoding.

So how to convert a unicode that is greater than 0xffff to a QChar or QString? Obviously, you cannot convert it to QChar because QChar has only two 2 bytes that cannot accommodate the code point. But you can convert it to a QString. Since QString uses UTF-16 encoding, it will use 4 bytes(2 Qchars) to store the character. There are several ways to do this.

First, use the static function QString QString::fromUcs4(const uint *unicode, int size = -1).

uint ch=0x1F64B;
QString func=QString::fromUcs4(&ch, 1);

Here, we pass a unicode array to fromUcs4 which converts the first size(in this case 1) 4-byte ucs-4 unicodes to QString.

Second, you can use the QString::QString(const QChar *unicode, int size = -1) constructor.

QChar utf16[2] = { 0xD83D, 0xDE4B };
QString str1 = QString(utf16, 2);

Don’t think you can only convert unicodes less than 0xffff to string using this method. For a character whose unicode>0xffff, it will be represented by 2 QChars(a high surrogate 0xD83D and a low surrogate 0xDE4B). The QString is constructed by the surrogate pairs and there is only one character in it. This method requires you know the UTF-16 encoding of the character.

Third, use the static function QString::fromUtf8(const char *str, int size = -1).

char utf8[4] = { (char)0xF0, (char)0x9F, (char)0x99, (char)0x8B };
QString str2 = QString::fromUtf8(utf8, 4);

Here, you pass a char array which stores the utf-8 encoding of code point 0x1F64B to QString::fromUtf8 which converts it to utf-16 encoding in QString. To use this method, you must know the UTF8 encoded values of a character.

The above methods assume the code point, its UTF-16 encoded values, its UTF-8 encoded values are stored in variables. You can also write the values in literal strings in the source file if they are known at coding time:

QString str1 = QString::fromWCharArray(L"\xD83D\xDE4B");
QString str2 = QString::fromUtf8("\xF0\x9F\x99\x8B");

 

If you like my content, please consider buying me a coffee. Buy me a coffeeBuy me a coffee Thank you for your support!
Posted in

Leave a Reply