[feature] support Unicode Transformation Format (UTF)

Yes, fq does support Unicode in some way, but Unicode is tricky. I'd like to transform between an array of `bytes` and an array of Unicode code points via UTF-8, UTF-16 (UTF-16LE and UTF-16BE), or UTF-32 (UTF-32LE and UTF-32BE), optionally with Byte Order Mark. Malformed UTF sequences should be reported as errors and/or replaced with the Unicode replacement character `U+FFFD`. On top, it should be possible to transform arrays of code points to arrays of characters via Unicode normalization. Only the latter is visible to a human reader. 

An example:

~~~
0x| 65    cc 81 | bytes (UTF-8)
U+| 0065  0301  | code points
  | e     ◌́     | characters
  | é           | normalized characters (NFC)
~~~
 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature] support Unicode Transformation Format (UTF) #1154

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[feature] support Unicode Transformation Format (UTF) #1154

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions