diff --git a/meetings/notes-2024-12-19.md b/meetings/notes-2024-12-19.md new file mode 100644 index 00000000..f099538a --- /dev/null +++ b/meetings/notes-2024-12-19.md @@ -0,0 +1,345 @@ +# 2024-12-19 ECMA-402 Meeting + +## Logistics + +### Attendees + +- Shane Carr - Google i18n (SFC), Co-Moderator +- Eemeli Aro - Mozilla (EAO) +- Frank Yung-Fong Tang - Google i18n, V8 (FYT) +- Henri Sivonen - Mozilla (HJS) +- Jesse Alama - Igalia (JMN) +- J S Choi - Invited Expert (JSC) +- Louis-Aimé de Fouquières - Invited Expert (LAF) +- Richard Gibson - OpenJS Foundation (RGN) +- Ujjwal Sharma - Igalia (USA), Co-Moderator +- Yusuke Suzuki - Apple (YSZ) +- Zibi Braniecki - Mozilla (ZB) + +### Standing items + +- [Discussion Board](https://github.com/tc39/ecma402/projects/2) +- [Status Wiki](https://github.com/tc39/ecma402/wiki/Proposal-and-PR-Progress-Tracking) -- please update! +- [Abbreviations](https://github.com/tc39/notes/blob/master/delegates.txt) +- [MDN Tracking](https://github.com/tc39/ecma402-mdn) +- [Meeting Calendar](https://calendar.google.com/calendar/embed?src=unicode.org_nubvqveeeol570uuu7kri513vc%40group.calendar.google.com) +- [Matrix](https://matrix.to/#/#tc39-ecma402:matrix.org) + +## Status Updates + +### Updates from the Editors + +USA: Two PRs for the Stage 4 proposals. + +SFC: It’s time for our yearly update for the list of TG2 delegates. Sent out an email to attendees from 2+ meetings from this year. Please reach out. + +### Updates from the MessageFormat Working Group + +EAO: The MF2 spec is officially at "final candidate" status, in that it has been approved by CLDR TC. We've been working on improving docs. See the readme here: + +https://github.com/unicode-org/message-format-wg/tree/main/spec + +EAO: We're effectively at a stage where it would be useful and interesting to get feedback from TC39-TG2 members, if you could find time to look at the spec and comment on it, to make sure that our needs as TC39-TG2 are accounted for. Without any more objections, this should land in CLDR 47 as final, at which point the work of the MF WG is done. + +SFC: If someone from this group finds a problem, is it sufficient to just open an issue, or is the expectation that the reporter drives the change? + +EAO: Making changes is very difficult atm, but if we have some issues then we should communicate them ASAP. + +SFC: I filed some fairly significant issues. + +EAO: We’ve given the CLDR-TC the go-ahead. Issues should be clarified if they should be blocking the 2.0 release. The next WG call will be on 6th Jan. Discussing these issues with them would be important. + +SFC: The interaction is difficult, we can file an issue but it’s not clear how we communicate/advocate for issues. There is no clear flow for feedback. The concerns aren’t new. Some are up to a year old. They weren’t adequately addressed. This is my feedback for the WG. + +EAO: Speaking just for myself, in part here the issue is that from the POV of the WG, your issues and feedback were treated as someone from within the WG rather than an external reviewer. Perhaps clearer communication about the issues would help. + +EAO: Could you highlight the issues in the next meeting and the repo? + +SFC: I could clarify my issues. + +EAO: That would help. + +### Updates from Implementers + +https://github.com/tc39/ecma402/wiki/Proposal-and-PR-Progress-Tracking + +FYT: Lots of things are shipping in January in V8 132 + +YSZ: We’re working on fixing issues uncovered from test262 but nothing major recently. + +SFC: Would be good to keep the table updated with MDN and JSC status. + +FYT: bug filed for https://github.com/tc39/test262/issues/4363 for PR 877 + +### Updates from the W3C i18n Group + +SFC: There has been a brief discussion about a proposal in W3C about a translation API for the web. + +https://github.com/webmachinelearning/writing-assistance-apis/pull/17 + +There’re some questions about handling locale subtype fallbacks. This is just something that’s going on, and the idea is to align with 402 as much as possible. Has a potential concern re:dynamic data loading so I hope we can come to a conclusion regarding that. + +### Prep for TG1 + +EAO: Would like to bring Stable Formatting to this meeting next time so that I can bring it to TG1 in Feb. + +## Proposals and Discussion Topics + +https://github.com/tc39/ecma402/projects/2 + +### Normative: Add 8 new numbering systems for Unicode 16 #929 + +https://github.com/tc39/ecma402/pull/929 + +YSZ: Last meeting we talked about a change to numbering systems, I discussed it with some folks internally and we’re on-board. + +FYT: How about we discuss it as a separate topic today. + +SFC: Is there anything else to discuss about it or can we bring it to TG1? + +YSZ: From Apple side we have consensus that we're not blocking this. + +SFC: Believe that Mozilla is also on-board. Do EAO or HJS have an idea? + +FYT: If we merge it then it’ll be part of the next release in July. That should give us enough time to implement it. + +FYT: Can we add it to Feb’s agenda already so it can be brought to TG1. + +#### Conclusion + +Bring this to the next TG1 meeting. FYT to add it to the agenda. + +### Intl Locale Info API for Stage 4 + +https://github.com/tc39/ecma402/pull/942 + +https://github.com/tc39/proposal-intl-locale-info + +FYT: Still needs some work, some minor editorial change it will be rebased on top of. I personally believe it’s an editorial change. If you look at Intl.LocaleInfo, there are still some issues listed. I closed some editorial issues but we should discuss the last four. + +#### Define fallback behaviour + +https://github.com/tc39/proposal-intl-locale-info/issues/76 + +HJS: We have an implementer who filed these issues found during implementation so it’s not appropriate to address it like mentioned. We should work on the specification to improve it. + +FYT: The fallback is not a requirement. + +HJS: If we want interop, we need to specify it. Need to minimize implementation-defined behavior. + +FYT: The spec doesn’t depend on CLDR, so we can’t specify fallbacks with that assumption. + +HJS: In reality, all impls use CLDR. We have a limited number of fields each of which can have a well-defined default value. + +FYT: The issue with the spec is that every locale has every single value so you can’t specify a fallback value. + +EAO: I think the issue is not about well-defined locales but about how the API works for structurally valid language tags. So what happens when you create a locale "tlh", Klingon, and what should happen, when we don't expect the engine to have data? + +SFC: For example in Intl.NumberFormat the spec currently says that if you don’t have data for the locale you fallback to the host locale. You should ideally fallback to the root locale. It’s important to note that so I agree with Anba. You could have multiple options that could lead to lack of interop. There’s a certain interop concern here. + +FYT: The problem is that this is not observable behavior. We can’t test for a fallback. + +EAO: Shouldn’t the test be made against other locales that aren’t defined? + +FYT: I don’t get it. + +SFC: This doesn’t need to be a normative requirement but we should specify this expectation. We can’t test it and we can’t normatively require it. I thought initially this should be ILD. There’s a lot of it in 402 but I don’t understand why this shouldn’t be. + +HJS: Even if in principle implementations are allowed to have non-CLDR locale data sources there’s still value in specifying this. After all this was filed by an implementer. + +#### minimalDays #86 + +https://github.com/tc39/proposal-intl-locale-info/issues/86 + +FYT: Is there any empirical evidence that they could be different? + +SFC: Peter Edberg had an example of a calendar with different week numbering schemes but I haven’t had any answers yet. + +FYT: If they were different what would be the impact on this proposal? + +SFC: firstDayOfWeek naming basically or we could go ahead and then call the second one something different later. Perhaps need to document it better. + +FYT: What is others’ view on this? + +HJS: In HTML for inputtype=week back when Opera researched it the locales where you enter week number are locales that use ISO weeks and since we haven’t had non-ISO week locales that use this, so minimalDays=4. If we did find a locale with different values, would they even care? If you don’t match the week number rule and week display rules it’s complex. In theory there are locales that don’t use ISO weeks. How many of them use week numbers? + +SFC: Different systems of week numbering, meteorology for instance. Unclear if they’re good from an i18n perspective. If I can see evidence I’d be persuaded but I can’t see week numbers popular outside ISO weeks. Without sufficient evidence, should we even have minimalDays? It’s a bit late feedback but given that we’re already making changes, is minimalDays useful? firstDayOfWeek is a bit similar, but it has its own use case (calendar display). + +EAO: Is there any currently observable evidence for locales with different values for these two to differentiate them? I think firstDayOfWeek is sufficient. + +FYT: I think minimalDays was there because Mozilla originally had it. The prior art had it. This particular issue wasn’t very well-articulated. There are some concerns raised but what are they really? It’s a bit unclear. I can’t act on it. + +LAF: The objective of minimalDays is how to compute week numbers within a calendar. + +FYT: There’s another reason. Imagine you have calendar layout with all months in a page. You can decide where to draw the box for instance. + +SFC: I think minimal days defined for the reason Luis mentioned which is to number weeks. + +LAF: You may define the week number with the minimal days of the first week but you can also define the first week as the week that has the fourth day of the week for instance. + +EAO: I misremembered that Intl.DateTimeFormat supported week numbering output, but it seems like not. If in Temporal or elsewhere we add week numbering, that value is going to depend on the locale. But if there's no intent to add week formatting, we don't really need that information in Intl.Locale. + +HJS: Looking at how calendars are laid out, I don't think min days of week affects where you draw the boxes. Usually the overlapping week appears in both month boxes. So, firstDayOfWeek affects what the box looks like, but minimalDays does not. I think so far it hasn't been shown that the domain modeling should distinguish between the two types of first day of week. + +LAF: Just wondering whether the idea of week numbering even stand in calendars without ISO weeks. + +SFC: ATM, we don’t support week formatting and I’d much rather see a cohesive proposal for it that encompasses everything. The only motivation I’ve heard is that Mozilla had the option already in the prior art. Henri brought a great point with other fields we added and we can’t articulate good use cases for it. + +FYT: Java has it, PHP has it, .NET has it. All the other APIs have it. + +https://docs.oracle.com/javase/8/docs/api/java/time/temporal/WeekFields.html#getMinimalDaysInFirstWeek-- + +https://www.php.net/manual/en/intlcalendar.getminimaldaysinfirstweek.php + +https://learn.microsoft.com/en-us/dotnet/api/java.util.calendar.minimaldaysinfirstweek?view=net-android-34.0 + + +EAO: Is Temporal going to add something like .weekOfYear()? If you have nothing like that then you can’t get the week of a date so this seems completely unnecessary. + +RGN: https://tc39.es/proposal-temporal/#sec-get-temporal.plaindate.prototype.weekofyear + +EAO: This is the ISO week? It would be misleading to have an API that indicates that the start is different from ISO. + +USA: It sounds like maybe this isn't motivated and we should file an issue to discuss further. + +SFC: I heard a number of arguments to remove it, there’s two arguments to keep it. The biggest one is because it’s a Stage 3 proposal and it’s inappropriate for us to make changes at this point. Another good argument is the prior art. + +FYT: Temporal only gives weekOfYear to be ISO year if the calendar is 8601. Otherwise, the calendar can behave differently. + +https://tc39.es/proposal-temporal/#sec-temporal-calendar-date-records + +https://tc39.es/proposal-temporal/#sec-temporal-calendarisotodate + +SFC: The behavior is calendar dependent but not locale dependent. + +HJS: I think the stage argument is not particularly persuasive. We should avoid things sailing through that are regrettable. + +USA: I think the stage expectation is most strong when adding things. If this is removal based on user feedback, it can be not that controversial. + +SFC: There are open questions regarding minimalDays and there are a few other open issues. There was one issue that was closed that we couldn't discuss. Currently the spec has this behavior where it returns “ltr” when it’s unknown but CLDR returns null. Should we do null instead? We can ask Frank to investigate these and get back later to discuss them here. + +SFC: We should reopen #90 and discuss it. You made a PR that was good but we should discuss the issue again. + +HJS: We have an internal object for querying related properties but minDays is ignored. It’s dead code in the internal API. It’s currently only for the firefox frontend. Unsure if it can be called by extensions. + +EAO: My recollection is that it’s not widely used, only accessible by “privileged extensions”. + +#### Conclusion + +TG2 feels that these issues should be fixed. FYT to come back next month with specific proposals to fix them. + +### Should "und" behave like "root" or undefined? #885 + +https://github.com/tc39/ecma402/issues/885 + +SFC: We have the “und” locale which means a specific thing when interacting with ICU. All browsers currently use the host locale instead because “und” is unavailable and the fallback is the host locale. This isn’t specifically bad. With the stable formatting proposal we could proposal another locale which isn’t “und”. In Collator and Segmenter we could have different behavior in “und” instead of host behavior. Separate distinction where the locale for segmenter and collator is a property not of the reader but of the text. If we don’t have a locale that’s specified it’s wrong to fallback to host. + +USA: I think what's important is we have to balance the user intent as well as the ergonomics/expressivity. The user can use `undefined` when they want the host locale. That is consistent across all of Intl. I think there us usefulness of "und" for stable formatting and to have a proper unbiased undefined value. If somebody is explicitly setting the locale to "und", we should allow them to use that, like ISO for calendar. + +EAO: I'm reminded that, for stable formatting, we were considering "zxx" because "und" had pre-existing meaning in the CLDR context. If that argument applied to stable formatting… if we say that "und" is okay to define separately from CLDR, then it could also be a candidate for stable formatting. + +HJS: I think I’m saying something similar to Ujjwal but differently. How many people invoke the Collator with “und” with the intent of falling back to the host locale vs using the root locale? You can write code like this and test it to be the correct because you tested in a locale which matches root collation. It’s a hazard for developers. The notion that und means the same thing as JS’s undefined doesn’t seem well founded. Atleast for Collator and Segmenter und should mean root. No opinions for formatters. + +ZBI: I'm a bit concerned that we're conflating 2 problems. There's a separation of concern of how the concepts stack to a composable end result. Around "und", we have 4 concepts: a concept of host locale, last-resort locale, "I don't know" my locale, and "I didn't specify" my locale. For example, "und-RU" means, "I didn't specify" my language, even though I specified by region. That's not the same as "I don't know" or "please use host". I would also argue that when the customer specifies "und" for the collator, they are saying, "I don't know" the locale. We can decide that if the user tells us that, we use root; that is implementation logic. But I can see a better outcome that if you don't specify, then we do a more complicated resolution. We can tailor the heuristics of data better. But certainly I think we should distinguish "host locale" from "last resort locale". We are not doing a good job of helping customers distinguish between those two. + +USA: I agree, though I think I don't see value having "und" since undefined already carries that meaning. The most relevant way I personally see "und" is this idea that we have with “Intl-lite” use cases where we either don’t have locale data or don’t need to use a locale for the use caseI. + +HJS: I disagree with ZBI's taxonomy. A fifth case is, I explicitly want to treat the input text without tailoring. I want the most internationally appropriate behavior. In collation, that's the root collation. In DateTime, that's perhaps the ISO calendar. I think there is very much value in talking about the root. I think we shouldn't pretend that nobody wants the root; I think there are cases. If someone writes a locale that flows through variables, I can see "und" coming in. But if someone writes the literal "und" in an Intl.Collator constructor, I think they meant, "use the root collation". They may have tested it in en-US and thought it worked. + +SFC: It basically means the “i don’t know” option in most cases it’s the same as asking the implementation to do the best thing and we could infer using subtags. The best thing we can do with no info for collation is using the world collation but maybe for the formatting APIs this best option is using the host locale. We have to consider the case when the locale is coming from someplace else in the application so undefined not being a valid value in certain languages, “und” could be a cross-language way to express that. + +ZBI: I'm not advocating for not exposing to customers the ability to ask for a root locale. I think there's a use case for that. I think it's a different use case than, do whatever you can. My argument is, I think the best thing we can do for an undefined locale… if we say root, that is incidental. In scenario A, I can say, use the root. In scenario B, I can say, detect as best as possible. I can imagine scenario B being more intense. So I see "undefined" to mean "do the best you can". But I don't think we should be trying to conflate 2, 3, and 4 in the taxonomy. I think we should consider them separately. + +HJS: On likely subtags. A "und" without other tags are the case that happens to work; it fills in US English. CLDR has special segmentation tailorings for US English. About "try your best", I think it's not particularly realistic to do language inference. root or host locale are realistic fallbacks. Collation is not only a function of the text, but it's also not only a function of the host locale. You have some sort of context, and the context defines what the collation is. So in summary, I think we should be designing for what collators are like today. + +ZBI: (notes from chat) + +concepts: + +1. undefined language (und, und-RU, und-Cyrl, und-150 etc.) +2. host locale +3. last fallback locale (root in CLDR, en-US in many western softwares) +4. "non-locale locale" - stable locale + +heuristics: + +1) undefined -> (try host) -> lastFallback -> stable +2) undefined -> stable + +I prefer the former heuristics + + +### Section 9.1 AvailableLocales shouldn't require base language if script is present #947 + +https://github.com/tc39/ecma402/issues/947 + +SFC: The spec currently says, it’s required if you support zh-Hant then you should have zh[-Hans]. If you’re developing in Montenegro you should have both cyrillic and latin variants for instance. + +RGN: What you just described is totally reasonable to me and in such an environment zh would give identical results as zh-hant. + +SFC: My understanding is that it’d be zh-hans. If we load more data we’d have more tailoring like region overrides but not script overrides. If I’m creating a translation API if I input zh I should get hans as the default even if I don’t have it on my system, like it should error out if it doesn’t have it rather than defaulting to hant. + +EAO: I'm persuaded by RGN's point. I don't think we should define anywhere what "zh" means. So if you only support zh-Hant, it's totally reasonable to normalize zh to zh-Hant. + +### Document icu_locid's relationship to backwards compatibility syntax #3989 + +https://github.com/unicode-org/icu4x/issues/3989 + +ZBI: We're trying to reconcile ICU4X, Test262, and ECMA-402 behavior involving locale IDs. We found some discrepancies. I think they're converging on BCP-47 locale ID vs Unicode locale ID. There are 3 main differences. (1) Handling of underscores, which is a difference between BCP47 syntax. A simple solution is that engines can do a simple string can for underscores. Alternatively is that we remodel ICU4X to reject underscores, or we add different modalities. (2) Handling of alias extensions. This makes Locale operations in ECMA-402 slower. (3) A scenario around 5-8-character languages. These are allowed in ECMA-402. For example, in Test262, there is a string "posix", and it should be parsed into a valid locale. Right now, some engines accept it, some engines reject it. ICU4X benefits from having a smaller stack for language because it is a string of 2 or 3 characters. If we extend this to support the longer language subtags, we hurt everyone. What I would like to propose to this group is that we consider it a bug that ECMA-402 allows for 5-8-character languages and start rejecting them. The alternative is, if we want to accept 5-8-character languages, we should consider that there are multiple types of locales, and ICU4X could add multiple abstractions. + +SFC: I think we should fix 5-8 languages in 402. I think we should try to support the strict subset of behavior. + +EAO: In MF2, we support the "lang tag" rule from BCP-47. Maybe look there for what is the strict subset. + +SFC: Cool, yeah, in an ideal world, ICU4X, MF2, and ECMA-402 are all the same thing. + +EAO: To clarify, MF2 says, this is what you must support, but you're allowed to support more. + +ZBI: `new Intl.Locale("posix")` is a valid ECMA-402 locale with 5 character language tag. unfortunately, seems like SM, JSC and V8 all support `posix` as a language tag :(, and new `Intl.Locale("foobar")`, and `{new Intl.DateTimeFormat("posix")}` + +SFC: Is it conformant to canonicalize those things? + +HJS: We should investigate if this language exists elsewhere and see whether it would break anything to map it to "und-posix". + +ZBI: Let's not over-index on "posix". What do you do about "foobar"? + +HJS: My question is, is "posix" something that exists in the wild. + +ZBI: My expectation is that it's more likely a thing that exists in Test262. + +HJS: Seems worthwhile to remove those. + +SFC: Is someone volunteering to drive this alignment? + +EAO: I think this should involve a lot of research. We don't want to break the web. + +HJS: Might be more useful to have telemetry in a browser. + +ZBI: BCP-47 uses dialect as an example of a 5-letter subtag. + +``` + Suppose that a registered language subtag 'dialect' represents a + language not yet available in any part of ISO 639. The later + addition of a corresponding language code in ISO 639 SHOULD result + in the addition of a 'Preferred-Value' for 'dialect'. +``` + +See https://www.rfc-editor.org/rfc/bcp/bcp47.txt + +And https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names + +``` +2 letter code from ISO 639-1 (2002) OR 3 letter code from ISO 639-2 (1998), ISO 639-3 (2007), or ISO 639-5 (2008) OR 5-8 letters and registered through the BCP 47 process. +``` + +uuugh... https://developer.apple.com/documentation/foundation/locale/identifiertype/bcp47 + +so Apple just accepts that we have two types of Locales + +even three - https://developer.apple.com/documentation/foundation/locale/identifiertype + +SFC: I'm hoping someone else will champion this, but I'll do it if nobody else. Expect to see more in 2025. + +#### Conclusion + +TG2 is open to the idea of dropping support for the 5-8 languages, either exception or canonicalization, but not if it breaks the web.