admin管理员组文章数量:1221774
My application was relying on this function to test if a string is Korean or not :
const isKoreanWord = (input) => {
const match = input.match(/[\u3131-\uD79D]/g);
return match ? match.length === input.length : false;
}
isKoreanWord('만두'); // true
isKoreanWord('mandu'); // false
until I started to include Chinese support and now this function is incoherent :
isKoreanWord('幹嘛'); // true
I believe this is caused by the fact that Korean characters and Chinese ones are intermingled into the same Unicode range.
How should I correct this function to make it returns true
if the input contains only Korean characters ?
My application was relying on this function to test if a string is Korean or not :
const isKoreanWord = (input) => {
const match = input.match(/[\u3131-\uD79D]/g);
return match ? match.length === input.length : false;
}
isKoreanWord('만두'); // true
isKoreanWord('mandu'); // false
until I started to include Chinese support and now this function is incoherent :
isKoreanWord('幹嘛'); // true
I believe this is caused by the fact that Korean characters and Chinese ones are intermingled into the same Unicode range.
How should I correct this function to make it returns true
if the input contains only Korean characters ?
- 1 By "Korean characters" you mean hangul? 'Cause Chinese characters are also used in Korea. Asking to distinguish "Chinese Chinese characters" from "Korean Chinese characters" is like asking to distinguish English from French. – deceze ♦ Commented Oct 25, 2018 at 12:33
- @deceze Yes I meant hangul. How to distinguish between hangul and hanja. – vdegenne Commented Oct 25, 2018 at 12:34
- @deceze Also I don't think your comparison is true in that English and French derive from Latin so yes it is extremely hard to compare both language, while Korean is using Chinese as its base language and Chinese, well... is using Chinese as its own historical base language. – vdegenne Commented Oct 25, 2018 at 12:40
- 1 I'm talking purely about the writing system used. If you just look at the range of letters, English is indistinguishable from French. In the same way, seeing just a few Chinese characters it's virtually impossible to tell whether it's a Chinese word or a word used in the context of Korean. – deceze ♦ Commented Oct 25, 2018 at 12:43
- 2 "Korean characters" means hangul, there's no exception. – wonsuc Commented Mar 26, 2019 at 6:59
3 Answers
Reset to default 16Here is the unicode range you need for Hangul (Taken from their wikipedia page).
U+AC00–U+D7AF
U+1100–U+11FF
U+3130–U+318F
U+A960–U+A97F
U+D7B0–U+D7FF
So your regex .match
should look like this:
const match = input.match(/[\uac00-\ud7af]|[\u1100-\u11ff]|[\u3130-\u318f]|[\ua960-\ua97f]|[\ud7b0-\ud7ff]/g);
a shorter version that matches korean characters
const regexKorean = /[\u1100-\u11FF\u3130-\u318F\uA960-\uA97F\uAC00-\uD7AF\uD7B0-\uD7FF]/g
In modern browsers, you can use unicode character classes directly:
const RE = /\p{sc=Hangul}/u
console.log(RE.test('만두')) // true
console.log(RE.test('mandu')) // false
console.log(RE.test('幹嘛')) // false
本文标签: unicodeWhat is proper way to test if the input is Korean or Chinese using JavaScriptStack Overflow
版权声明:本文标题:unicode - What is proper way to test if the input is Korean or Chinese using JavaScript? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1739319605a2157972.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论