Minor Changes
-
e2c9e1d: Optimize perf again 🔥
It can be still getting faster, why not?
Through seriously thoughtful micro-optimizations, it has achieved performance improvements of up to ~30% (404ns -> 310ns) in the previously used simple emoji joiner test.
Now it use more realistic benchmark with various types of input text. In most cases,
unicode-segmenter
is 7~15 times faster than other competing libraries.For example, here a Tweet-like text ChatGPT generated:
🚀 새로운 유니코드 분할기 라이브러리 \'unicode-segmenter\'를 소개합니다! 🔍 각종 언어의 문자를 정확하게 구분해주는 강력한 도구입니다. Check it out! 👉 [https://github.com/cometkim/unicode-segmenter] #Unicode #Programming 🌐
And the result then:
cpu: Apple M1 Pro runtime: node v21.7.1 (arm64-darwin) time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 7'850 ns/iter (7'753 ns … 8'122 ns) 7'877 ns 8'079 ns 8'122 ns Intl.Segmenter 60'581 ns/iter (57'916 ns … 405 µs) 59'167 ns 66'458 ns 358 µs graphemer 66'303 ns/iter (64'708 ns … 287 µs) 65'500 ns 73'459 ns 206 µs grapheme-splitter 146 µs/iter (143 µs … 466 µs) 145 µs 157 µs 397 µs summary unicode-segmenter 7.72x faster than Intl.Segmenter 8.45x faster than graphemer 18.6x faster than grapheme-splitter
-
ab6787b: Make the Intl adapter's type definitions compatible with the original
-
f974448: - Rename
searchGrapheme
tosearchGraphemeCategory
, and deprecated old one.- Rename
Segmenter
definitions from grapheme module toGraphemeCategory
. - Remove
SearchResult<T>
, andGraphemeSearchResult
defnitions which are identical toCategorizedUnicodeRange<T>
. - Improve JSDoc comments to be more informative.
- Rename
-
dc62381: Add
takeCodePoint
util to avoid extraString.codePointAt()
Patch Changes
-
3ea5a2d: Optimized initial parsing time via compacting tables into JSON
See https://v8.dev/blog/cost-of-javascript-2019#json
and https://youtu.be/ff4fgQxPaO0 -
16d2028: - Fix
Intl.Segmenter
adapter type definitions to be 100% compatible with tslib- Implemented
Intl.Segmenter.prototype.resolvedOptions
.
But since the locale matcher is environment-specific,
the adapter returns input locale as-is, or fallback toen
.
- Implemented