javascript - Getting substring without messing up UTF-8 string - Stack Overflow

IT技术

更新时间：2025-03-190

admin管理员组
文章数量:1335895

I have a UTF-8 encoded string that es from an ajax response, I want to get substring of that string up to the first ma. For the string "Привет, мир" it would be "Привет".

Will this work and not run into "multibyte-ness" issues?

var i = text.indexOf(',');
if (i != -1) text = text.substr(0, i);

Or is it better to use split?

I have a UTF-8 encoded string that es from an ajax response, I want to get substring of that string up to the first ma. For the string "Привет, мир" it would be "Привет".

Will this work and not run into "multibyte-ness" issues?

var i = text.indexOf(',');
if (i != -1) text = text.substr(0, i);

Or is it better to use split?

Share Improve this question asked May 24, 2013 at 15:29 galymzhan 5,5232 gold badges31 silver badges45 bronze badges

Your code is 100% correct. This is one of the features of both UTF-16 and UTF-8: even though index of a string is a code unit and not a code point, indexOf and alike search will never match a ma (or other <127 ascii char) that is not a ma character. More information on utf8everywhere – Pavel Radzivilovsky Commented Jun 2, 2013 at 18:03

Add a ment |

2 Answers 2

Sorted by: Reset to default 5

Javascript treats strings by characters, not by bytes.
As such, yes, that's fine from an encoding/string handling standpoint.
You may treat strings in Javascript as not having any particular encoding, but as a string of characters.

> "漢字".substr(1)
  "字"

Note that the above is only a simplification though. As pointed out in the ments, Javascript treats strings as 16-bit code points. This enables you to treat strings "by character" for the majority of mon characters, but for characters which are encoded with more than 2 bytes in UTF-16 or characters posed of more than one code point, this abstraction breaks down.

UTF-8 uses only values higher than 128 to encode characters other than ASCII so an ASCII ma is never part of a multibyte sequence.

本文标签： javascriptGetting substring without messing up UTF8 stringStack Overflow

版权声明：本文标题：javascript - Getting substring without messing up UTF-8 string - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1742396481a2467025.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

javascript - Getting substring without messing up UTF-8 string - Stack Overflow

2 Answers 2

更多相关文章