-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Hi,
We have a use case for being able to track terms across multiple languages.
Looking at some of the providers, localization of the terms is usually handled through one of the following means:
- A URL parameter (either in URL query or in the domain part) with a localization code (iso language or country code)
- A localization based on the language header sent by the browser
- A localization set through cookies (either basic ISO codes or more sophisticated cookies encoding)
- An IP-based solution with a given localized content being served based on the inferred geolocation of the user (e.g. MaxMind GeoIP).
While the last two methods (cookies and IP based) are quite advanced and not easy to tackle, it seems the first ones are quite easy to tackle by a small extension of the current declarations' specifications. We would propose to have the following in place:
-
Introduce the list of languages/regions to scrape in the top level of the declaration (and/or with a per-service override):
languages: [fr-BE, en-IE, etc.](short language codes would be inferred by the engine from the regions codes). -
Introduce a binary multilingual parameter in the declaration specification (defaults to false). When set to true, it would scrape the page in all languages defined in the configuration (making multiple successive requests, updating the header parameter each time).
-
Introduce an extra
<LANG: {short,long}>(<LANG:short>or<LANG:long>) placeholder in the fetch specification of the declaration. It would be replaced at runtime by the current ISO code scraped (short language form - e.g. "fr", or language/country combination - e.g. "fr-FR").fetch: "http://example.com/?myLocale=<LANG:short>"would be replaced by the engine byfetch: "http://example.com/?myLocale=fr"(for each of the locale defined in the terms) -
At a later point in time, the placeholder could be customised via the declaration of the target(s), to support more sophisticated schemes:E.g. to use a custom header name or cookies:
lang:
method:
headers
headers_name:
- locale # default
- region
- lang
...
The languages definition could be converted from a mere list to a dict structure to support encoding/mapping between ISO code and provider's specifics encodings.
The storage part could be easily extended by introducing another layer of folder with the language (to be defined, either top level or nested in the provider folder).
The main point of uncertainty is how to properly handle the storage migration (since we need to provide a default / fallback language to move existing terms recorded, if any). An idea could be to provide a language for the existing term (e.g., existing_term_language) when declaring it as multilingual so that the existing term (and related history) is moved to the new, relevant subfolder.
Thanks!
cc @EUbaldiEC