javascript - Getting correct encoding from php cURL - Stack Overflow

IT技术

更新时间：2025-03-173

admin管理员组
文章数量:1326093

(see update at bottom of post)

Using the Chrome network logger, I notice a given XHR request:

Request Headers

GET ... HTTP/1.1
Host: ...
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36
Origin: ...
Authorization: Jra45648WwbbQ
Accept: */*
Referer: ...
Accept-Encoding: gzip, deflate, sdch, br
Accept-Language: en-US,en;q=0.8

Response Headers

HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Headers: Authorization, Origin, Content-Type, Accept, Referer, User-Agent, deportes
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
Access-Control-Allow-Origin: ...
Access-Control-Expose-Headers: Authorization, x-request-id, x-mlbam-reply-after
Content-Type: application/octet-stream
Date: Sun, 16 Apr 2017 ... GMT
Server: nginx/1.11.3
Vary: Accept
X-Request-ID: ...
Content-Length: 16
Connection: keep-alive

The response content is @ EqV¡^MSÁ9

Perfect. This is the correct output.

Now, I need to recreate this exact exchange within PHP using cURL. So I duplicate the request using the same headers.

    $ch = curl_init();
    curl_setopt_array($ch, array(
        CURLOPT_URL => $url,
        CURLOPT_HTTPHEADER => $headers,
        CURLOPT_ENCODING => 'gzip',
        CURLOPT_RETURNTRANSFER => true
    ));

However, the output here is @ EqV–¡^MSƒÁ’9, which is clearly different.

I need to get it in the original format (@ EqV¡^MSÁ9), because eventually the output from the PHP will be served to a javascript script, and the value of charCodeAt has different results between these two output. I'm not sure how to approach this problem.

As you can see, after the XHR request, the response preview in Chrome is correct:

If I change the encoding type of my PHP page's output to Western (ISO-8859-15), I get @ EqV¡^MSÁ9.

And if I paste that output into Notepad++, I get something very, very similar to what I want, but still slightly different (in this case, different by one single character). So maybe this is very close to the encoding I need?

How can I find the encoding I need? What is the default encoding of chrome, since it seems to handle the response just fine?

UPDATE: I tested with a new value, òÝD¶0v¢ÔL·ßÎO Ó, and using mb_convert_encoding($r, 'utf-8', 'ISO-8859-15') gave me the correct result. So why is it encoding that particular response (@ EqV¡^MSÁ9) gives me a value that is short a character?

(see update at bottom of post)

Using the Chrome network logger, I notice a given XHR request:

Request Headers

GET ... HTTP/1.1
Host: ...
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36
Origin: ...
Authorization: Jra45648WwbbQ
Accept: */*
Referer: ...
Accept-Encoding: gzip, deflate, sdch, br
Accept-Language: en-US,en;q=0.8

Response Headers

HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Headers: Authorization, Origin, Content-Type, Accept, Referer, User-Agent, deportes
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
Access-Control-Allow-Origin: ...
Access-Control-Expose-Headers: Authorization, x-request-id, x-mlbam-reply-after
Content-Type: application/octet-stream
Date: Sun, 16 Apr 2017 ... GMT
Server: nginx/1.11.3
Vary: Accept
X-Request-ID: ...
Content-Length: 16
Connection: keep-alive

The response content is @ EqV¡^MSÁ9

Perfect. This is the correct output.

Now, I need to recreate this exact exchange within PHP using cURL. So I duplicate the request using the same headers.

    $ch = curl_init();
    curl_setopt_array($ch, array(
        CURLOPT_URL => $url,
        CURLOPT_HTTPHEADER => $headers,
        CURLOPT_ENCODING => 'gzip',
        CURLOPT_RETURNTRANSFER => true
    ));

However, the output here is @ EqV–¡^MSƒÁ’9, which is clearly different.

I need to get it in the original format (@ EqV¡^MSÁ9), because eventually the output from the PHP will be served to a javascript script, and the value of charCodeAt has different results between these two output. I'm not sure how to approach this problem.

As you can see, after the XHR request, the response preview in Chrome is correct:

If I change the encoding type of my PHP page's output to Western (ISO-8859-15), I get @ EqV¡^MSÁ9.

And if I paste that output into Notepad++, I get something very, very similar to what I want, but still slightly different (in this case, different by one single character). So maybe this is very close to the encoding I need?

How can I find the encoding I need? What is the default encoding of chrome, since it seems to handle the response just fine?

UPDATE: I tested with a new value, òÝD¶0v¢ÔL·ßÎO Ó, and using mb_convert_encoding($r, 'utf-8', 'ISO-8859-15') gave me the correct result. So why is it encoding that particular response (@ EqV¡^MSÁ9) gives me a value that is short a character?

Share Improve this question edited Apr 19, 2017 at 14:09 asked Apr 16, 2017 at 16:46 X33 1,4101 gold badge19 silver badges38 bronze badges

1 Can you share the url on which you are submitting request? – Sahil Gulati Commented Apr 16, 2017 at 16:47
Do you know what encoding is used for the response? – TurtleTread Commented Apr 16, 2017 at 17:46
@TurtleTread I update the post with the response headers, but I don't think that really provides any info other than maybe the Content-Type. I'm not aware of any encoding, because as you can see in my second picture, the Chrome preview of the response looks fine. This would be the data as it is directly served because it's just the response preview. – X33 Commented Apr 16, 2017 at 17:51
What's on the response tab? – TurtleTread Commented Apr 16, 2017 at 17:53
@TurtleTread The response and preview tabs are identical – X33 Commented Apr 16, 2017 at 17:54

| Show 1 more ment

2 Answers 2

Sorted by: Reset to default 4 +100

Chrome default encoding is UTF-8, and if you set it to to UTF-8
curl_setopt($ch, CURLOPT_ENCODING, 'UTF-8'); your text will be as expected you can try that here.
Also detecting the encoding is painful since it can encounter many issues using mb_detect_encoding but in this case it can be helpful if you specify the expected order of detection like so:

mb_detect_encoding($val, 'UTF-8,ISO-8859-15');

In my personal experience it is worthless without specifying the targets and in the right order, for example you need to list UTF-8 before ISO-8859-1 in your encoding_list or it will return ISO-8859-1 in most cases

UPDATE:
The doc says CURLOPT_ENCODING => '' handle all encodings you can try that but as I said since you are dealing with a known encoding wich is UTF-8 please try

$ch = curl_init();
    curl_setopt_array($ch, array(
        CURLOPT_URL => $url,
        CURLOPT_HTTPHEADER => $headers,
        CURLOPT_ENCODING => 'UTF-8',
        CURLOPT_RETURNTRANSFER => true
    ));

You can attempt to detect the encoding of the octet stream and then convert it to a known charset.

$result = curl_exec($ch);
curl_close($ch);
echo mb_detect_encoding($result);
$resultUTF8 = mb_convert_encoding($result, 'ISO-8859-15', 'utf-8');

本文标签： javascriptGetting correct encoding from php cURLStack Overflow

版权声明：本文标题：javascript - Getting correct encoding from php cURL - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1742194136a2430761.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

javascript - Getting correct encoding from php cURL - Stack Overflow

2 Answers 2

更多相关文章