path	title
/learnings/javascript_unicode_everything_you_never_wanted_to_know	Learnings: Javascript: Unicode Everything you never wanted to know

<<Learning_Javascript_Unicode>>

Javascript natively uses UCS-2 exposing but likely UTF-16 character encoding at a language level. Which means it runs out of characters after the 65k range, which means high order characters are done by looking at the second charcter to see more.

SO: if you are looking up the number, use codePointAt() instead of charCodeAt(), as the former natively supports this higher order character work.

Technically this likely means that if you are typing to get .length of a string with a high unicode character, then it might not match what you expected (ie one character) because JS "followed the spec and gave you UCS-2 semantics". Source

Getting real length of a string in Node

First, read javascript has a unicode problem.

Could implement it like so:

function countSymbols(string) {
	return Array.from(string).length;
}

More fun: Emojis and modifiers and other wildness

Key words:

color modifiers
zero width joiners
https://eng.getwisdom.io/emoji-modifiers-and-sequence-combinations/
https://thekevinscott.com/emojis-in-javascript/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

learning_javascript_unicode_everything_you_never_wanted_to_know.md

learning_javascript_unicode_everything_you_never_wanted_to_know.md

Table Of Contents

<<Learning_Javascript_Unicode>>

Getting real length of a string in Node

See also

More fun: Emojis and modifiers and other wildness

Files

learning_javascript_unicode_everything_you_never_wanted_to_know.md

Latest commit

History

learning_javascript_unicode_everything_you_never_wanted_to_know.md

File metadata and controls

Table Of Contents

<<Learning_Javascript_Unicode>>

Getting real length of a string in Node

See also

More fun: Emojis and modifiers and other wildness