path | title |
---|---|
/learnings/javascript_unicode_everything_you_never_wanted_to_know |
Learnings: Javascript: Unicode Everything you never wanted to know |
Javascript natively uses UCS-2 exposing but likely UTF-16 character encoding at a language level. Which means it runs out of characters after the 65k range, which means high order characters are done by looking at the second charcter to see more.
SO: if you are looking up the number, use codePointAt()
instead of charCodeAt()
, as the former natively supports this higher order character work.
Technically this likely means that if you are typing to get .length
of a string with a high unicode character, then it might not match what you expected (ie one character) because JS "followed the spec and gave you UCS-2 semantics". Source
First, read javascript has a unicode problem.
Could implement it like so:
function countSymbols(string) {
return Array.from(string).length;
}
Key words:
-
color modifiers
-
zero width joiners
-
https://eng.getwisdom.io/emoji-modifiers-and-sequence-combinations/