Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internationalization (i18n) / multilingualism for some text fields ? #86

Open
LaurentAjdnik opened this issue Nov 29, 2024 · 13 comments · May be fixed by #115
Open

Internationalization (i18n) / multilingualism for some text fields ? #86

LaurentAjdnik opened this issue Nov 29, 2024 · 13 comments · May be fixed by #115
Labels
enhancement New feature or request

Comments

@LaurentAjdnik
Copy link
Contributor

Hello.

All text fields included in the spec are provided on a single-language basis.

As often, English is the most widely used language, in the spec or examples found elsewhere.

It is usually not a problem when dealing with APIs, since these are seen only by developers or machines.

However, with MCP, I understand some texts (prompts, descriptions...) might be read / selected / typed directly by end users.

Are there any plans to provide an internationalization (i18n) / multilingualism mechanism in the spec?

Or shall we rely on the LLM to handle the (possibly not totally accurate) translations?

Thanks!

@LaurentAjdnik LaurentAjdnik added the enhancement New feature or request label Nov 29, 2024
@jspahrsummers
Copy link
Member

Good flag! We indeed overlooked this when designing the spec.

The main things intended to be human-readable are names (e.g., of resources and prompts) and descriptions, so I think we'd want to focus i18n support on those.

Would you be interested in drafting a proposal PR?

@jspahrsummers jspahrsummers added this to the after-2024-11-05 milestone Dec 2, 2024
@LaurentAjdnik
Copy link
Contributor Author

Would you be interested in drafting a proposal PR?

I'd love to.

However, this would be a somehow ambitious PR, with quite a few design choices to be made beforehand.

Do you really think this is a must-have feature? If so, I'll gladly give it some work.

I'll follow up with some more targeted questions.

@LaurentAjdnik
Copy link
Contributor Author

LaurentAjdnik commented Dec 3, 2024

What would be the most appropriate standard to encode language codes? ISO 693-1? ISO 693-2? IETF BCP 47? Any other?

@LaurentAjdnik
Copy link
Contributor Author

Should we provide backward compatibility? If so, any advice on how to achieve this here?

@LaurentAjdnik
Copy link
Contributor Author

LaurentAjdnik commented Dec 3, 2024

In terms of workflow, where should language selection occur?

  • Server-side: The client adds an optional language code parameter to its requests and the server returns only the appropriate language, if available, or its default language.

    • This might not make sense if notifications sent by server must also be i18n-compliant
    • More complex servers
    • Reduced load on network
  • Client-side: The server returns all available languages and the clients selects its preferred one.

    • More complex clients
    • Heavier load on network

These options would have pretty different impacts on the spec.

@LaurentAjdnik
Copy link
Contributor Author

The main things intended to be human-readable are names (e.g., of resources and prompts) and descriptions [...]

Names are used as identifiers. I fear we might end up having problems by making them multilingual, for prompts names or tools names in particular.

Especially if they end up being hardcoded in some clients when calling "well-known" servers, or for inter-servers communication someday, if this ever occurs.

On the other hand, descriptions can be translated harmlessly.

@jspahrsummers
Copy link
Member

Sorry for the delay in getting back to you here!

Would you be interested in drafting a proposal PR?

I'd love to.

However, this would be a somehow ambitious PR, with quite a few design choices to be made beforehand.

Makes sense. We can hold off on the PR while we figure out these key questions, just to avoid unnecessary work that then requires changes.

Do you really think this is a must-have feature? If so, I'll gladly give it some work.

I don't know if it's must have, but I think it's pretty silly in 2024 to not consider internationalization. Really glad you brought it up!

Regarding the specific questions, I'd be curious for your initial opinions. Always happy to discuss and suggest alternatives, if needed, but you've had a lot of good thoughts so far and I want to make sure to hear you first.

@LaurentAjdnik
Copy link
Contributor Author

Regarding the specific questions, I'd be curious for your initial opinions. Always happy to discuss and suggest alternatives, if needed, but you've had a lot of good thoughts so far and I want to make sure to hear you first.

OK, my suggestions will follow. I'll address each topic in a specific "thread" inside this issue, to keep things separated.

@LaurentAjdnik
Copy link
Contributor Author

What would be the most appropriate standard to encode language codes? ISO 693-1? ISO 693-2? IETF BCP 47? Any other?

Let's go for the wiiiiiiidely used IETF BCP 47. For instance:

  • Basic language codes: en for English, fr for French, es for Spanish, zh for "Generic Chinese" (usually defaulted to Mandarin), yue for Cantonese...
  • Language + Region Codes: en-US for American English, en-GB for British English, fr-CA for Canadian French...

The component that selects the language (either client-side or server-side, see other topic) SHOULD provide a fallback mechanism to a close variant if the preferred language is not available. For instance: en=>en-US or en-US=>en or en-GB=>en-US.

If no close variant is available, or if i18n is not handled by the component, the "default" language MUST be returned (text + corresponding language code).

@LaurentAjdnik
Copy link
Contributor Author

Names are used as identifiers. I fear we might end up having problems by making them multilingual [...]

The more I think about it, the more I feel it would be a hassle to translate names for tools and prompts:

  • Servers would have to check for variants of names (much higher complexity)
  • The spec indeed states they are "unique identifiers"

On the other hand, names for resources could be i18n-compliant, since the identifier would be the uri in that case. The spec indeed states they are "human-readable names".

That being said, we have another problem, not spec-related: In Claude Desktop, the "Allow" popup says "Run tool_name from server", without description, which is some kind of flaw in itself. And even more if only the description is translated.

@LaurentAjdnik
Copy link
Contributor Author

LaurentAjdnik commented Dec 7, 2024

In terms of workflow, where should language selection occur?
[...]
These options would have pretty different impacts on the spec.

Let's avoid unnecessary network traffic.

For requests, clients should add an optional preferredLanguage parameter.

For responses, servers should add a (mandatory?) language field.

Language selection would be handled by the server.

The server would fall back to its default language if:

  • It does not support i18n
  • No preferred language was specified
  • The preferred language is not available
  • No equivalence was found (see above)

@LaurentAjdnik
Copy link
Contributor Author

Should we provide backward compatibility?

Not sure yet, but I feel me might figure out a smooth solution.

If not, this would probably join other major and impactful changes in a next version of the spec.

@jspahrsummers
Copy link
Member

jspahrsummers commented Dec 9, 2024

Thanks for all the thoughts! Broadly, this sounds great. I'd suggest the following:

  1. preferredLanguage becomes part of the client's advertised capabilities, under a new locale subobject (i.e., capabilities.locale.preferredLanguage)
  2. language is then part of the server's response in a similar object, iff supported (i.e., capabilities.locale.language)

Done this way, it should actually be 100% backwards compatible, since this doesn't involve changes to the shapes of any other responses—and we can add it to any protocol version.

What do you think? If this sounds good to you, let's get started on the spec changes. 🙏

@LaurentAjdnik LaurentAjdnik linked a pull request Dec 13, 2024 that will close this issue
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants