php iconv

php iconv is an interesting function that can convert a string from one charset to another.

$text="abc";
$text = iconv('utf-8', 'us-ascii', $text);

In the above example, since the source file is utf-8 encoded, $text holds the bytes that are the utf-8 encoding of “abc”. php string does not add encoding information to represented string, so you must specify the encoding in the first parameter of iconv for it to work. The second parameter of iconv is the encoding of converted string. Knowing the encoding of the source string and the target encoding, iconv will execute the conversion. The source string bytes will be converted to a byte sequence that represents the target encoding of the string.

In most cases, the first parameter of iconv is “utf-8″ as most source code editors store the code in utf-8 format. But if you use another format to save your source code, you need to tell iconv the correct source character set.

In the example above, it seems no  conversion is actually done because the source byte sequence is exactly the same as the converted byte sequence. The utf-8 encoding of “abc” is exactly the same as the ascii encoding of “abc”, isn’t it? So why bother to use iconv?

In some cases, iconv can be used sanitize the source string. For example, WordPress uses iconv to sanitize a post title to  generate a slug(post_name) for the post. A slug may only allow limited ascii characters to occur. See the following example:

$posttile="ab€c";
$slug = iconv('utf-8', 'us-ascii', $posttile);

Here, the original text has an Euro character in it, which is not expected to occur in a slug. We can use iconv to convert the original string to a pure ascii string. But since € does not have a correspondence in the us ascii charset, the conversion will fail with the following error:

Notice: iconv(): Detected an illegal character in input string in ….

You have two options to choose here:

  • ignore the €:
    $posttile="ab€c";
    $slug= iconv('utf-8', 'us-ascii//IGNORE', $posttile);

    The resulting string will be “abc”;

  • use similar characters in ascii charset to replace the €:
    $posttitle="ab€c";
    $slug= iconv('utf-8', 'us-ascii//TRANSLIT', $posttile);

    The converted string will be “aaEURbb

    “, i.e., iconv uses three ascii chars(EUR) to replace the €. Don’t bring too much trouble to iconv. iconv is not smart enough to find similar characters in target charset to replace any char in source charset. If it does not know how to translate, it will complain the error “Detected an illegal character in input string in ….” like in the following example:

    $posttitle="ab€汉c";
    $slug= iconv('utf-8', 'us-ascii//TRANSLIT', $posttile);

     

     

If you like my content, please consider buying me a coffee. Buy me a coffeeBuy me a coffee Thank you for your support!
Posted in

Leave a Reply