admin管理员组

文章数量:1221774

My application was relying on this function to test if a string is Korean or not :

const isKoreanWord = (input) => {
  const match = input.match(/[\u3131-\uD79D]/g);
  return match ? match.length === input.length : false;
}

isKoreanWord('만두'); // true
isKoreanWord('mandu'); // false

until I started to include Chinese support and now this function is incoherent :

isKoreanWord('幹嘛'); // true

I believe this is caused by the fact that Korean characters and Chinese ones are intermingled into the same Unicode range.

How should I correct this function to make it returns true if the input contains only Korean characters ?

My application was relying on this function to test if a string is Korean or not :

const isKoreanWord = (input) => {
  const match = input.match(/[\u3131-\uD79D]/g);
  return match ? match.length === input.length : false;
}

isKoreanWord('만두'); // true
isKoreanWord('mandu'); // false

until I started to include Chinese support and now this function is incoherent :

isKoreanWord('幹嘛'); // true

I believe this is caused by the fact that Korean characters and Chinese ones are intermingled into the same Unicode range.

How should I correct this function to make it returns true if the input contains only Korean characters ?

Share Improve this question asked Oct 25, 2018 at 12:30 vdegennevdegenne 13.3k16 gold badges84 silver badges115 bronze badges 6
  • 1 By "Korean characters" you mean hangul? 'Cause Chinese characters are also used in Korea. Asking to distinguish "Chinese Chinese characters" from "Korean Chinese characters" is like asking to distinguish English from French. – deceze Commented Oct 25, 2018 at 12:33
  • @deceze Yes I meant hangul. How to distinguish between hangul and hanja. – vdegenne Commented Oct 25, 2018 at 12:34
  • @deceze Also I don't think your comparison is true in that English and French derive from Latin so yes it is extremely hard to compare both language, while Korean is using Chinese as its base language and Chinese, well... is using Chinese as its own historical base language. – vdegenne Commented Oct 25, 2018 at 12:40
  • 1 I'm talking purely about the writing system used. If you just look at the range of letters, English is indistinguishable from French. In the same way, seeing just a few Chinese characters it's virtually impossible to tell whether it's a Chinese word or a word used in the context of Korean. – deceze Commented Oct 25, 2018 at 12:43
  • 2 "Korean characters" means hangul, there's no exception. – wonsuc Commented Mar 26, 2019 at 6:59
 |  Show 1 more comment

3 Answers 3

Reset to default 16

Here is the unicode range you need for Hangul (Taken from their wikipedia page).

U+AC00–U+D7AF
U+1100–U+11FF
U+3130–U+318F
U+A960–U+A97F
U+D7B0–U+D7FF

So your regex .match should look like this:

const match = input.match(/[\uac00-\ud7af]|[\u1100-\u11ff]|[\u3130-\u318f]|[\ua960-\ua97f]|[\ud7b0-\ud7ff]/g);

a shorter version that matches korean characters

const regexKorean = /[\u1100-\u11FF\u3130-\u318F\uA960-\uA97F\uAC00-\uD7AF\uD7B0-\uD7FF]/g

In modern browsers, you can use unicode character classes directly:

const RE = /\p{sc=Hangul}/u

console.log(RE.test('만두')) // true
console.log(RE.test('mandu')) // false
console.log(RE.test('幹嘛')) // false

本文标签: unicodeWhat is proper way to test if the input is Korean or Chinese using JavaScriptStack Overflow