I encountered a strange problem of php curl. I use curl to fetch a remote web page as follows:
$curl = curl_init(); curl_setopt($curl, CURLOPT_URL, "https://myprogrammingnotes.com"); curl_setopt($curl, CURLOPT_HEADER,1); curl_setopt($curl, CURLOPT_FRESH_CONNECT,1); curl_setopt($curl, CURLOPT_RETURNTRANSFER,1); curl_setopt($curl, CURLOPT_SSL_VERIFYHOST,1); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER,false); curl_setopt($curl, CURLOPT_HTTPHEADER, $headers); curl_setopt($curl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0); $results = curl_exec($curl);
After retrieving the content of the web page, I echo the http headers and the content. Weirdly, the browser complains the following error after a long time of loading:
The page isn’t redirecting properly
Firefox has detected that the server is redirecting the request for this address in a way that will never complete.
This problem can sometimes be caused by disabling or refusing to accept cookies.
It turns out the remote server responses with 301 redirect instead of 200 OK. And even more weirdly, the 301 response has a Location header that has the same url as the original request. So, the browser loads the same url repeatedly. After many times of attempting, the browser decides to cease the loading and report the above error message.
Since I can use the command line curl to fetch the url https://myprogrammingnotes.com without a problem, I thought there must be something wrong about the http headers I sent through php curl. I opened https://myprogrammingnotes.com in the browser, captured the https packets using wireshark, and recorded all the http headers the browser issued. Then I added these headers such as the Accept header, the Accept-Language header, the Accept-Encoding header. etc. in php using curl_setopt($curl, CURLOPT_HTTPHEADER, $headers). Unfortunately, the problem persisted.
The best way to find the root cause of the endless redirect problem is to capture the traffic between the client and the server, and compare it with the successful http transactions of the command line curl or the packets captured by wireshark on my desktop. I know tcpdump can be used on Linux to capture packets. But tcpdump cannot decode the ssl packets which https uses. Packet sniffers such as ssldump, tshark, and mitmproxy are said to be able to decode https packets on CentOS. But I think even they can deal with https traffic, they are not easy to use because I have the experience of configuring wireshark to decode https and it is not a trivial work. You need to set an environment variable SSLKEYLOGFILE to record the session key in a file and specify the key file in Wireshark. I even do not know how to configure the php script to save its key file.
Then I thought maybe php curl has its own mechanism to log its output?It turns out I am correct. We can set some curl options to let curl generate verbose log information (CURLOPT_VERBOSE) and save it into a file(CURLOPT_STDERR)(reference).
$curl = curl_init(); $f = tmpfile(); curl_setopt($curl, CURLOPT_VERBOSE, 1); curl_setopt($curl, CURLOPT_STDERR, $f); curl_setopt($curl, CURLOPT_URL, "https://myprogrammingnotes.com"); curl_setopt($curl, CURLOPT_HEADER,1); curl_setopt($curl, CURLOPT_FRESH_CONNECT,1); curl_setopt($curl, CURLOPT_RETURNTRANSFER,1); curl_setopt($curl, CURLOPT_SSL_VERIFYHOST,1); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER,false); curl_setopt($curl, CURLOPT_HTTPHEADER, $headers); curl_setopt($curl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0); $results = curl_exec($curl); fseek($f, 0); echo fread($f, 32*1024); # output up to 32 KB cURL verbose log fclose($f);
The logged output of curl is like:
About to connect() to myprogrammingnotes.com port 443 (#1) * Trying 18.104.22.168... * Connected to myprogrammingnotes.com (22.214.171.124) port 443 (#1) * warning: ignoring value of ssl.verifyhost * skipping SSL peer certificate verification * SSL connection using TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 * Server certificate: * subject: CN=drd546445.cloudflaressl.com,OU=PositiveSSL Multi-Domain,OU=Domain Control Validated * start date: Aug 07 00:00:00 2018 GMT * expire date: Feb 13 23:59:59 2019 GMT * common name: drd546445.cloudflaressl.com * issuer: CN=COMODO ECC Domain Validation Secure Server CA 2,O=COMODO CA Limited,L=Salford,ST=Greater Manchester,C=GB > GET / HTTP/1.0 Accept-Encoding: gzip,deflate,br Cookie: User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:49.0) Gecko/20112122 Firefox/49.0 Host: myprogrammingnotes.com:443 Accept: text/html,application/xhtml+xml,application/xml,*/* Referer: https://myprogrammingnotes.com.com/ < HTTP/1.1 301 Moved Permanently < Date: Tue, 11 Dec 2018 16:47:31 GMT < Content-Type: text/html; charset=UTF-8 < Connection: close < Expires: Thu, 19 Nov 1981 16:52:00 GMT < Cache-Control: no-store, no-cache, must-revalidate < Pragma: no-cache < Vary: Accept-Encoding, Cookie,Accept-Encoding < X-Pingback: https://myprogrammingnotes.com/xmlrpc.php < Location: https://myprogrammingnotes.com/ < Accept-Ranges: bytes < X-Turbo-Charged-By: LiteSpeed < Expect-CT: max-age=606700, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct" < Server: cloudflare < CF-RAY: 4876001154476-SEA < * Closing connection 1
We can see clearly the SSL handshaking process, the http request and the response. Comparing the http request with that of the command line curl(you can get the verbose output of the command line curl using curl -v https://myprogrammingnotes.com), I noticed the difference: the php curl issued: GET / HTTP/1.0 while the command line curl issued GET / HTTP/1.1. That is the point. The remote server supports http 1.1 not http 1.0 but we tell the php curl to use http 1.0 through the CURLOPT_HTTP_VERSION option. The solution is simple:just comment the line: curl_setopt($curl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0) and everything is ok now.