Skip to content

Conversation

@m4rch3n1ng
Copy link

@m4rch3n1ng m4rch3n1ng commented Oct 27, 2025

currently, there is no way to get the numeric value of a character from the icuexportdata, so this exports the values into a nv.toml file. this is a value, that icu4x would like to be able to provide (unicode-org/icu4x#3014).

for the new nv.toml export, i added a new type of property, a [[value_property]]. a value property is similar to an [[enum_property]], but it doesn't have the values key for the enum variants and it doesn't have a name field for each of the range maps. similar to the bmg.toml, this exports a [[enum_property]], but without the values and without the name field in each of the ranges.

i was a little unsure, of what value to export, as there were two options: exporting it as a double or exporting the raw numeric type value (via GET_NUMERIC_TYPE_VALUE(u_getMainProperties(c))). i have decided on the second, both for being smaller (a double vs an int32_t) and for being more accurate (floating point numbers cannot accurately represent some fractions and the highest number that unicode provides is higher than the max safe integer of a double). it is also more flexible, potentially allowing languages with native support for fractions to actually consume them as fractions. this does put the burden of reinterpreting the value again on the consumer side, but i think, that is a fine tradeoff.

i have also made a icu4x branch, where i provide this new property: https://github.com/m4rch3n1ng/icu4x/tree/numeric-value. you can also see how the new nv.toml file looks like there.

Checklist

  • Required: Issue filed: ICU-22284
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/tools/icuexportdata/icuexportdata.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@m4rch3n1ng
Copy link
Author

i noticed (a little late), that what i was doing here previously was essentially just what bmg already does, but using a new [[value_property]], while the bmg.toml is just a "normal" [[enumerated_property]], so i switched to do that too.

@markusicu
Copy link
Member

@sffc @robertbastian @hsivonen does this look like what you would want for ICU4X?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants