php string

In a php file, you can define a string using ” ” or ‘ ‘. You specify what characters the string consists of in between ‘ ‘ or ” “such as:

$str=’abc';
$str=”def”;
$str=’a\bc';

Writing a \ in between ‘ ‘ or ” ” is a kind of special, in that it does not necessarily denote the character \ in the string. Let’s consider a string specified with “”. Although in most cases, writing a \ denotes the character \ itself, but if you write some character such as t, r, n, or another \ immediately after the \, the two characters(\ and its immediate follower) in the php file only denote one character in memory, and that character is usually not printable. This is the reason why \ is called escape character – \ escapes its following character.

But \ escapes different sets of characters in ‘ ‘ and ” ” specified strings. In ‘ ‘ specified strings, \ can only escape ‘ and \ itself(the tuple denotes ‘ and \, respectively.) In ” ” specified strings, \ can escape much more characters, and it can not only escape its immediate follower, but also several characters following it. For example, \115 denotes M(115 is the octal representation of M), \61 denotes 1(61 is the octal representation of 1). \x4d denotes M(4d is the hexadecimal representation of M).

Now we know we cannot only denote a character by typing the character itself, but also by typing the number that representing the character(following the escape \). It is time to get some confusion. You must have seen something like \uxxxx such as \u738b. Is it another type of escaped character? You should have be told that \u738b is the unicode encoding of a chinese character. Unfortunately, php does not escape characters by \uxxxx. If you type \uxxxx in a string, it represents a string of 6 characters. Here you have been confused with the javascript escape and php escape. Javascript strings do use \uxxxx to denote a character(weather it is English character or other type of character such as a Chinese character). But php does not have that escape.  It does not implicate you cannot use numbers to denote Chinese characters in a php string. To use numbers to denote a Chinese character, you should know the encoding of the character, then type the encoding byte by byte using \ddd or \xdd (d means a digit). For example, if the unicode value of a chinese character is \u738b, its utf-8 encoding is E78E8B, then in your php file, echo “\xE7\x8E\x8B”; will print the chinese character(remember to choose Unicode in the view/Character Encoding menu of your browser). A simpler method is to use the json_decode function:

echo json_decode(‘”‘.’\u738b’.'”‘);

where the string “\u738b” is converted to its utf-8 encoded string and the chinese character is printed out. You can also use:

echo mb_convert_encoding(‘王’, ‘UTF-8′, ‘HTML-ENTITIES’);

where the string 王(a html entity which denotes the chinese character) is converted to the string which is the utf-8 encoding of that chinese character. Or you can use:

echo mb_convert_encoding(“\x73\x8b”, ‘UTF-8′, ‘UTF-16BE’);

where the string \x73\x8b(the utf-16be encoding of the chinese character) is converted to the uft-8 encoding of that chinese character and printed out.

Note that strlen returns the number of bytes, not the number of characters of the string. So strlen(“汉”)==3, strlen(“a”)==1. Now it is impossible to get the nth character of a utf-8 string with []. $str=”a汉1″; $str[1] is not “汉” because $str[1] accesses the second byte not the second character of the string. Iterating over a UTF-8 encoded string also becomes more complex. The following classic piece of code won’t work.

for($i=0;$i<strlen($str);$i++)

echo $str[$i];

You can use the following code:

for($i=0;$i<mb_strlen($str);$i++)

echo $mb_substr($str,$i,1);

Or you can use a faster way:

$chars=preg_split(‘//u’,$str,-1,PREG_SPLIT_NO_EMPTY);

for($i=0;$i<count($chars);$i++)

echo $chars[$i];

You can use echo bin2hex(string) to print the hex value of the string, which is helpful in debugging problems.

 

Posted in

Comments are closed, but trackbacks and pingbacks are open.