How to uncompress gzip-encoded content in php?

If you use php curl to scrape web page, the retrieved content may be in compressed format,i.e., the Content-Encoding http response header may have the value “gzip”. How to uncompress the gzip content,then? You have two options:

1 use php gzdecode function. The function takes a gzip compressed string as its parameter and output an uncompressed string.

2 set the curl option CURLOPT_ENCODING:

curl_setopt($curl,CURLOPT_ENCODING , "");

This will let curl generate an “Accept-Encoding” http header when sending the request. In this header, all the compression formats curl supports will be listed. The default compression format curl supports are:deflate, gzip. The best part is that curl will detect the Content-Encoding of the response and call the corresponding algorithm to decode the compressed content, automatically. For example, if the returned Content-Encoding http header is gzip, curl will call the gzip function to decode the returned content. You do not need to uncompress the returned content manually. You can specify any format when setting the CURLOPT_ENCODING option. Curl will send the string¬† of CURLOPT_ENCODING in the Accept-Encoding header as is. But if the format of returned content is not supported by curl, the content will not be decoded and you need to decode it yourself. For example:

curl_setopt($curl,CURLOPT_ENCODING , "deflate,gzip,br");

curl will send “Accept-Encoding:deflate,gzip,br” in the request. If the server chooses to return br-encoded content(i.e.,¬† Content-Encoding:br), the content curl returned will be in compressed format because curl does not support br encoding.

Posted in

Comments are closed, but trackbacks and pingbacks are open.