-
Notifications
You must be signed in to change notification settings - Fork 243
Open
Description
Yes, fq does support Unicode in some way, but Unicode is tricky. I'd like to transform between an array of bytes and an array of Unicode code points via UTF-8, UTF-16 (UTF-16LE and UTF-16BE), or UTF-32 (UTF-32LE and UTF-32BE), optionally with Byte Order Mark. Malformed UTF sequences should be reported as errors and/or replaced with the Unicode replacement character U+FFFD. On top, it should be possible to transform arrays of code points to arrays of characters via Unicode normalization. Only the latter is visible to a human reader.
An example:
0x| 65 cc 81 | bytes (UTF-8)
U+| 0065 0301 | code points
| e ◌́ | characters
| é | normalized characters (NFC)
Metadata
Metadata
Assignees
Labels
No labels