admin管理员组文章数量:1387433
I'm having trouble understanding character encoding in node.js. I'm transmitting data and for some reason the encoding causes certain characters to be replaced with other ones. What I'm doing is base 64 encoding at the client side and decoding it in node.js.
To simplify, I narrowed it down to this piece of code which fails:
new Buffer("1w==", 'base64').toString('utf8');
The 1w==
is the base 64 encoding of the ×
character. Now, when passing this string with the 'base64'
argument to a buffer and then doing .toString('utf8')
I expected to get the same character back, but I didn't. Instead I got �
(character code 65533
).
Is the encoding utf8
wrong? If so, what should I use instead? If not, how can I decode a base 64 string in node.js?
I'm having trouble understanding character encoding in node.js. I'm transmitting data and for some reason the encoding causes certain characters to be replaced with other ones. What I'm doing is base 64 encoding at the client side and decoding it in node.js.
To simplify, I narrowed it down to this piece of code which fails:
new Buffer("1w==", 'base64').toString('utf8');
The 1w==
is the base 64 encoding of the ×
character. Now, when passing this string with the 'base64'
argument to a buffer and then doing .toString('utf8')
I expected to get the same character back, but I didn't. Instead I got �
(character code 65533
).
Is the encoding utf8
wrong? If so, what should I use instead? If not, how can I decode a base 64 string in node.js?
2 Answers
Reset to default 4No, your assumption is wrong. The base64-encoded string obviously has only one byte encoded. And all Unicode code points above U+007F need at least two bytes for being encoded in UTF-8.
I'm still not good at decoding base64 in mind, but try ISO-8859-1 instead.
The point is, base64 decoding transforms a character string to a byte string. You assumed that it decodes to a character string, but this is wrong. You still need to encode the byte string to a character string, and in your case the correct encoding is ISO-8859-1.
echo -n x | base64
gives
eA==
The given code would give the expected answer if the encoding were correct. The problem is likely on the encoding side. (1w== translates to the byte 0xD7 which would be the start of a multi-byte UTF-8 character)
本文标签: javascriptNodejs buffer encoding issueStack Overflow
版权声明:本文标题:javascript - Node.js buffer encoding issue - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744512644a2609973.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论