In a php file, you can define a string using ” ” or ‘ ‘. You specify what characters the string consists of in between ‘ ‘ or ” “such as:
Writing a \ in between ‘ ‘ or ” ” is a kind of special, in that it does not necessarily denote the character \ in the string. Let’s consider a string specified with “”. Although in most cases, writing a \ denotes the character \ itself, but if you write some character such as t, r, n, or another \ immediately after the \, the two characters(\ and its immediate follower) in the php file only denote one character in memory, and that character is usually not printable. This is the reason why \ is called escape character – \ escapes its following character.
But \ escapes different sets of characters in ‘ ‘ and ” ” specified strings. In ‘ ‘ specified strings, \ can only escape ‘ and \ itself(the tuple denotes ‘ and \, respectively.) In ” ” specified strings, \ can escape much more characters, and it can not only escape its immediate follower, but also several characters following it. For example, \115 denotes M(115 is the octal representation of M), \61 denotes 1(61 is the octal representation of 1). \x4d denotes M(4d is the hexadecimal representation of M).
where the string “\u738b” is converted to its utf-8 encoded string and the chinese character is printed out. You can also use:
echo mb_convert_encoding(‘王’, ‘UTF-8′, ‘HTML-ENTITIES’);
where the string 王(a html entity which denotes the chinese character) is converted to the string which is the utf-8 encoding of that chinese character. Or you can use:
echo mb_convert_encoding(“\x73\x8b”, ‘UTF-8′, ‘UTF-16BE’);
where the string \x73\x8b(the utf-16be encoding of the chinese character) is converted to the uft-8 encoding of that chinese character and printed out.
Note that strlen returns the number of bytes, not the number of characters of the string. So strlen(“汉”)==3, strlen(“a”)==1. Now it is impossible to get the nth character of a utf-8 string with . $str=”a汉1″; $str is not “汉” because $str accesses the second byte not the second character of the string. Iterating over a UTF-8 encoded string also becomes more complex. The following classic piece of code won’t work.
You can use the following code:
Or you can use a faster way:
You can use echo bin2hex(string) to print the hex value of the string, which is helpful in debugging problems.