confused php regular expression

I had a case to use regexp to match unicode codes of chinese characters. The unicode codes are of the form \uxxxx where x is hexidecimal digit. To match them, I used the following regular expression at first:
“#\u[0-9a-f]{4}#”
Unfortunately, I got the following error:
Compilation failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1
Changing the regular expression to the following one produces the same error.
“#\\u[0-9a-f]#”
Actually the two expression feed the same string including one \(refer to this post) to the regular expression module of php. The error is due to the regular expression module considers \ as the escape character in all time(if \ is followed by a character that cannot be escaped, an error will be produced.), which is different from the string parse module in which \ may be just an ordinary character if it can not escape the following character. Because for the regular expression module, u cannot be escaped, so the error occurred. To resolve this problem, you should feed \\ to the regular expression module to tell it that you just meant an ordinary character \, not the escape character. So you should type four \ in your php file as:
“#\\\\u[0-9a-f]{4}#”

Posted in

Comments are closed, but trackbacks and pingbacks are open.