admin管理员组

文章数量:1425742

'❌'[0] === '❌' // true
'✔️'[0] === '✔️' // false
'✔️'[0] === '✔'  // true

I suspect it's unicode related but would like to understand precisely what is happening and how can I correctly pare such charaters. Why is '✔️' treated differently than '❌'?

I encountered it in this simple char counting

'✔️❌✔️❌'.split('').filter(e => e === '❌').length // 2
'✔️❌✔️❌'.split('').filter(e => e === '✔️').length // 0
'❌'[0] === '❌' // true
'✔️'[0] === '✔️' // false
'✔️'[0] === '✔'  // true

I suspect it's unicode related but would like to understand precisely what is happening and how can I correctly pare such charaters. Why is '✔️' treated differently than '❌'?

I encountered it in this simple char counting

'✔️❌✔️❌'.split('').filter(e => e === '❌').length // 2
'✔️❌✔️❌'.split('').filter(e => e === '✔️').length // 0
Share Improve this question asked Oct 12, 2021 at 23:51 Wilhelm OlejnikWilhelm Olejnik 2,5073 gold badges15 silver badges23 bronze badges 4
  • 2 I think the check-mark is a two-character sequence, while the "X" is not. – Pointy Commented Oct 12, 2021 at 23:55
  • 1 You encountered a surrogate pair: stackoverflow./questions/31986614/what-is-a-surrogate-pair – Adelin Commented Oct 13, 2021 at 0:01
  • 1 thorough explication here: Emojis in Javascript from this question: How to convert one emoji character to Unicode codepoint number in JavaScript? – pilchard Commented Oct 13, 2021 at 0:12
  • 2 It’s not surrogate pairs; this is a grapheme cluster made out of the U+2714 HEAVY CHECK MARK and the U+FE0F VARIATION SELECTOR-16. – Sebastian Simon Commented Oct 13, 2021 at 0:15
Add a ment  | 

3 Answers 3

Reset to default 6

Because ✔️ takes two characters: "✔️".length === 2

"✔️"[0] === "✔" an "✔️"[1] denotes color, I think.

And "❌".length === 1 so it take only one character.

It's similar to the way emojis with different skin colors work as well.

As to how to pare, I think that "✔️".codePointAt(0) (not to confuse with charCodeAt()) might help. See https://thekevinscott./emojis-in-javascript/:

codePointAt and fromCodePoint are new methods introduced in ES2015 that can handle unicode characters whose UTF-16 encoding is greater than 16 bits, which includes emojis. Use these instead of charCodeAt, which doesn’t handle emoji correctly.

The second char '✔️'[1](code point = 65039) is a Variation Selector

A Variation Selector specifies that the preceding character should be displayed with emoji presentation. Only required if the preceding character defaults to text presentation.

Often used in Emoji ZWJ Sequences, where one or more characters in the sequence have text and emoji presentation, but otherwise default to text (black and white) display.

Examples Snowman as text: ☃. Snowman as Emoji: ☃️

Black Heart as text: ❤. Black Heart as Emoji: ❤️ (not so black)

Variation Selector-16 was approved as part of Unicode 3.2 in 2002.

https://unicode-table./en/FE0F/

I believe the '✔️' is made up of 2 ponents. When you output '✔️'[0] you get '✔', and the black checkmark does not equal the green checkmark.

However, the '❌' is made up of just a single ponent, so when you output '❌'[0], you get the same thing: '❌'.

本文标签: javascriptWhy 39❌39039❌39 but 39✔️39039✔️39Stack Overflow