Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define fallback behaviour in {Calendars,Collations,HourCycles,NumberingSystems,CharacterDirection,WeekInfo}OfLocale #76

Open
anba opened this issue Aug 21, 2023 · 8 comments

Comments

@anba
Copy link
Contributor

anba commented Aug 21, 2023

This issue is similar to #47.

TimeZonesOfLocale is currently defined to return an empty Array object when no time zones are in use for a specific region. This matches the behaviour of ICU4C. (See also #47)

The other abstract operations ({Calendars,Collations,HourCycles,NumberingSystems,CharacterDirection,WeekInfo}OfLocale) don't define their fallback behaviour. To match the ICU4C behaviour, the description should be updated to say that the locale "und-001" is used as the fallback when the input locale is unsupported. (At least I think that's the default fallback behaviour in ICU4C.)

For example in CalendarsOfLocale, change step 4 from:

Let list be a List of 1 or more unique canonical calendar identifiers, which must be lower case String values conforming to the type sequence from UTS 35 Unicode Locale Identifier, section 3.2, sorted in descending preference of those in common use for date and time formatting in locale.

To something like:

Let list be a List of 1 or more unique canonical calendar identifiers, which must be lower case String values conforming to the type sequence from UTS 35 Unicode Locale Identifier, section 3.2, sorted in descending preference of those in common use for date and time formatting in locale. When no calendar information for locale is available, use the calendar identifiers in common use for the locale "und-001".

For example returning ["gregory"] for new Intl.Locale("tlh").calendars doesn't match the actual calendar used in tlh, but instead reflects that the fallback locale und-001 is used.

There isn't a way for a user to detect when the fallback locale is used instead of the input locale, but that's a pre-existing issue in this proposal.

@anba
Copy link
Contributor Author

anba commented Aug 21, 2023

Or maybe even just define explicit fallback values instead of defaulting to und-001, that way it's more clear what to do in a (theoretical) implementation which doesn't even have locale data for und-001.

Operation Default value
CalendarsOfLocale "gregory"
CollationsOfLocale ???
HourCyclesOfLocale "h23"
NumberingSystemsOfLocale "latn"
CharacterDirectionOfLocale "ltr"
WeekInfoOfLocale { [[FirstDay]]: 1, [[Weekend]]: « 6, 7 », [[MinimalDays]]: 1 }

collations seems to be a problem, because ICU4C returns the collations for the default locale when the input locale is unrecognised:

andre@andre-dev:~$ LANG=de opt/v8/d8 -e "print(new Intl.Locale('tlh').collations)"
emoji,eor,phonebk
andre@andre-dev:~$ LANG=fr opt/v8/d8 -e "print(new Intl.Locale('tlh').collations)"
emoji,eor

(When de is the default locale, there's an additional phonebk collation in the returned array.)

This seems a bit problematic to me.

@FrankYFTang
Copy link
Collaborator

FrankYFTang commented Sep 13, 2023

I do not think we need to define the fallback behavior. The fallback is just part of the detail of how to decode locale data. Conceputally every locale has a value, the fact that the resource data in ICU does not explicit define them are just a memory optimization strategy to not duplicate the information when that information are the same as the fallback value. The fallback are just an implementation detail that CLDR define based on that data model. Conceptually each locale has every values, either explicitly encoded in the resources or implictly getting from the fallback.

@FrankYFTang
Copy link
Collaborator

@sffc @zbraniecki @ben-allen @gibson042 - any opinion?

@sffc
Copy link
Contributor

sffc commented Sep 13, 2023

TimeZone is exceptional because it impacts the rendered value of the data. The others are all preference-related, so it makes sense to use a CLDR fallback algorithm (which may do something such as looking up a value from likely subtags).

When no calendar information for locale is available, use the calendar identifiers in common use for the locale "und-001".

This situation, "when no calendar information for locale is available", is still fuzzy. The calendar is defined in CLDR by region, but it may be reasonable for th, not just th-TH, to infer the Buddhist calendar.

1 similar comment
@sffc
Copy link
Contributor

sffc commented Sep 13, 2023

TimeZone is exceptional because it impacts the rendered value of the data. The others are all preference-related, so it makes sense to use a CLDR fallback algorithm (which may do something such as looking up a value from likely subtags).

When no calendar information for locale is available, use the calendar identifiers in common use for the locale "und-001".

This situation, "when no calendar information for locale is available", is still fuzzy. The calendar is defined in CLDR by region, but it may be reasonable for th, not just th-TH, to infer the Buddhist calendar.

@anba
Copy link
Contributor Author

anba commented Sep 22, 2023

The input may not be a recognised locale, for example consider new Intl.Locale("xyz") (there isn't an ISO 639 code for xyz) or if no CLDR data is available (new Intl.Locale("egy-Egyh"), for Egyptian in Hieratic script). Most implementations will likely return "ltr" for new Intl.Locale("egy-Egyh").getTextInfo().direction, even though the correct answer should be "rtl".

So for bogus inputs or if no corresponding locale data is available, spec steps like:

[...] sorted in descending preference of those in common use for formatting numeric values in locale.

can't be applied, because neither the preference nor the common use can be determined.


Maybe there was a misunderstanding when I mentioned "fallback"? I didn't mean "fallback" to refer to some locale inheritance scheme, but instead what values should be used as the fallback when no data is available at all.

@sffc
Copy link
Contributor

sffc commented Dec 21, 2024

@Mariaperez00
Copy link

This issue is similar to #47.

TimeZonesOfLocale is currently defined to return an empty Array object when no time zones are in use for a specific region. This matches the behaviour of ICU4C. (See also #47)

The other abstract operations ({Calendars,Collations,HourCycles,NumberingSystems,CharacterDirection,WeekInfo}OfLocale) don't define their fallback behaviour. To match the ICU4C behaviour, the description should be updated to say that the locale "und-001" is used as the fallback when the input locale is unsupported. (At least I think that's the default fallback behaviour in ICU4C.)

For example in CalendarsOfLocale, change step 4 from:

Let list be a List of 1 or more unique canonical calendar identifiers, which must be lower case String values conforming to the type sequence from UTS 35 Unicode Locale Identifier, section 3.2, sorted in descending preference of those in common use for date and time formatting in locale.

To something like:

Let list be a List of 1 or more unique canonical calendar identifiers, which must be lower case String values conforming to the type sequence from UTS 35 Unicode Locale Identifier, section 3.2, sorted in descending preference of those in common use for date and time formatting in locale. When no calendar information for locale is available, use the calendar identifiers in common use for the locale "und-001".

For example returning ["gregory"] for new Intl.Locale("tlh").calendars doesn't match the actual calendar used in tlh, but instead reflects that the fallback locale und-001 is used.

There isn't a way for a user to detect when the fallback locale is used instead of the input locale, but that's a pre-existing issue in this proposal.

This was referenced Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants