admin管理员组文章数量:1391947
If I receive a UTF-8 string via a socket (or for that matter via any external source) I would like to get it as a properly parsed string object. The following code shows what I mean
var str='21\r\nJust a demo string \xC3\xA4\xC3\xA8-should not be anymore parsed';
// Find CRLF
var i=str.indexOf('\r\n');
// Parse size up until CRLF
var x=parseInt(str.slice(0, i));
// Read size bytes
var s=str.substr(i+2, x)
console.log(s);
This code should print
Just a demo string äè
but as the UTF-8 data is not properly parsed it only parses it up to the first Unicode character
Just a demo string ä
Would anyone have an idea how to convert this properly?
If I receive a UTF-8 string via a socket (or for that matter via any external source) I would like to get it as a properly parsed string object. The following code shows what I mean
var str='21\r\nJust a demo string \xC3\xA4\xC3\xA8-should not be anymore parsed';
// Find CRLF
var i=str.indexOf('\r\n');
// Parse size up until CRLF
var x=parseInt(str.slice(0, i));
// Read size bytes
var s=str.substr(i+2, x)
console.log(s);
This code should print
Just a demo string äè
but as the UTF-8 data is not properly parsed it only parses it up to the first Unicode character
Just a demo string ä
Would anyone have an idea how to convert this properly?
Share Improve this question asked Jul 17, 2014 at 17:24 user3847784user3847784 231 gold badge1 silver badge3 bronze badges 4- You may want to use Punycode, here is a library too: github./bestiejs/punycode.js – howderek Commented Jul 17, 2014 at 17:29
- This might help: stackoverflow./questions/17057407/… – Diodeus - James MacFarlane Commented Jul 17, 2014 at 17:29
- @howderek Thanks, but how would a punycode library help in this case? – user3847784 Commented Jul 17, 2014 at 17:30
- nvm, I thought you were doing this over http, use this string instead: '21\r\nJust a demo string \xE4\xE8\xC3\xA8-should not be anymore parsed' you simply used the wrong escapes – howderek Commented Jul 17, 2014 at 17:36
2 Answers
Reset to default 2It seems you could use this decodeURIComponent(escape(str))
:
var badstr='21\r\nJust a demo string \xC3\xA4\xC3\xA8-should not be anymore parsed';
var str=decodeURIComponent(escape(badstr));
// Find CRLF
var i=str.indexOf('\r\n');
// Parse size up until CRLF
var x=parseInt(str.slice(0, i));
// Read size bytes
var s=str.substr(i+2, x)
console.log(s);
BTW, this kind of issue occurs when you mix UTF-8 and other types of enconding. You should check that as well.
You should use utf8.js which is available on npm.
var utf8 = require('utf8');
var encoded = '21\r\nJust a demo string \xC3\xA4\xC3\xA8-foo bar baz';
var decoded = utf8.decode(encoded);
console.log(decoded);
本文标签: javascriptConvert UTF8 data into the proper string formatStack Overflow
版权声明:本文标题:javascript - Convert UTF-8 data into the proper string format - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744688165a2619829.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论