Skip to content

Latest commit

 

History

History
569 lines (375 loc) · 18 KB

README.md

File metadata and controls

569 lines (375 loc) · 18 KB

cljs-i18n

Install

Clojars Project

Overview

Google Closure goog.i18n is a wrapper for the Unicode CLDR data goog.i18n that provides localisation logic for:

  • Number formatting and parsing
  • Datetime/Timezone formatting and parsing
  • Collation
  • Currency formatting
  • Text message translation
  • Plurals
  • Bidirectional text

Which is amazing for several reasons:

  • We get it "free" in clojure
  • Very few l10n libs out there support both bidirectional formatting and parsing
  • CLDR is really the only way to get high quality coverage for every language/locale out there
    • Maintained by Unicode
    • ISO standard templating language for format/parse logic (many libs hand-roll their own and then fail to account for certain nuances)
    • Supports literally thousands of locale/country combinations
    • Is updated as cultures/locales evolve over time
    • Available as open source JSON files

Unfortunately, the interface to goog.i18n is about as far from idiomatic clojure as one could possibly get. Most of the behaviour is determined by fiddling with mutable properties of both global and local objects that are deeply complected with each other. E.g. the "current locale" is set globally but configuration like "formatter pattern" is set as a property on a local formatter object.

The documentation for Google Closure's i18n code is almost nonexistant. It is neccessary to read the code comments and review tests directly to understand how to use it.

Natively goog.i18n does not expose the ability to work with more than one locale at a time. Internally it has several mostly undocumented global properties such as goog.i18n.NumberFormatSymbols and goog.i18n.DateTimeSymbols that must be set manually on each locale change.

Presumably Google know what they are doing and so all this mutation and limits placed on dynamic locale support is for performance reasons or something. Based on that assumption (I haven't done extensive benchmarking/profiling) I've memoized and implemented automated tests for as much as I can. As localisation of a string for a given locale/pattern is totally referentially transparent the default is to cache aggressively using native cljs memoize.

Of course, the aggressive memoization could lead to memory issues, depending on what you are doing in your application. It's great if you have a few strings that are being re-used across the UI, potentially very bad if you have a lot of unique strings to process.

Each of the core fns in the public API has an unmemoized version prefixed by - so that parse is cached while -parse is not.

The end result of this library is the ability to do something like this:

(parse "1.000.000.000,00" :locale "gl") ; 1000000000 in Galician
(parse "1,000,000,000.00" :locale "en") ; 1000000000 in English
(parse "1,00,00,00,000.00" :locale "en-IN") ; 1000000000 in Indian English

(format 1000000000 :locale "gl") ; "1.000.000.000" in Galician
(format 1000000000 :locale "en") ; "1,000,000,000" in English
(format 1000000000 :locale "en-IN") ; "1,00,00,00,000" in Indian English

Additionally this lib provides several things goog.i18n is missing that we need in order to work with an end-user's locale in the browser:

  • Extracting the user's preferred locale based on OS/browser/config settings
  • Normalizing locale's format (e.g. en_US to en-US)
  • Extracting a supported locale from Accept-Language HTTP headers

Comparison with other libraries

Here's a comparison of various i18n libraries available in JavaScript, including Google Closure:

https://github.com/rxaviers/javascript-globalization

Supported locales

Everything from goog.i18n.NumberFormatSymbols as at 2018-03-23.

From the Google docs:

  • File generated from CLDR ver. 32
  • To reduce the file size (which may cause issues in some JS
  • developing environments), this file will only contain locales
  • that are frequently used by web applications. This is defined as
  • proto/closure_locales_data.txt and will change (most likely addition)
  • over time. Rest of the data can be found in another file named
  • "numberformatsymbolsext.js", which will be generated at
  • the same time together with this file.

I don't have a script/build process to track what Google Closure supports under the primary namespace automatically. Also note that there are literally hundreds of additional locales available in goog.i18n.NumberFormatSymbolsExt.

Adding new locales is a simple matter of adding the relevant k/v pair to i18n.data/locales. If a locale you're looking for is missing please feel free to put a pull request up for inclusion.

Accepted locale code formats

Ideally pass in locales to :locale params as ISO styles strings.

i.e. <lowercase language code>-<uppercase country code>.

So Hong Kong HK Chinese zh becomes zh-HK.

Passing in a valid locale string ensures maximum speed and compatibility.

In the wild, locales are also often represented:

  • with _ rather than -, e.g. en_US
  • with inconsistent casing, e.g. EN-US
  • as a sequence of options, e.g. ["en-US" "en"]
  • an Accept-Language header, e.g. "fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5"

Any public API function that accepts a locale as an argument will normalize to a supported locale as per i18n.locale/supported-locale. The basic logic is to select the first supported locale in a sequence/accept string by fixing the casing and delimiter if possible as per i18n.locale/fix-locale.

If a locale is not supported (see above) then:

  • The next supported locale in the sequence will be used
  • If no locale in a sequence is supported, the default locale will be used
  • If the locale is singular then the default locale will be used instead
  • If the locale is corrupt and cannot be parsed at all a warning will be logged via taoensso.timbre/warn and the default locale used instead

The default locale (at the time of writing) is set by Google as "en".

i18n.locale - Working with locales

i18n.locale/valid-locale?

Takes a string and returns true if it looks like a locale we might support.

Only singular locale strings are valid, no sequences or headers will validate.

Typically to be valid a string must:

  • Have a lowercase langcode
  • Have no country code, or an uppercase country code
  • Be delimited by -

But there are exceptions if the locale string is a key in i18n.data/locales.

Notably "sr-Latn" and "es-419" are considered valid locales.

Keyword locales like :default that are found in i18n.data/locales are NOT considered valid.

i18n.locale/fix-locale

Takes an invalid locale string and attempts to hammer it into an ISO locale.

Can fix:

  • Invalid case
  • Delimited by _

Is aware of valid-locale? edge cases like "sr-Latn" (see above).

i18n.locale/supported-locale

Takes a string or seq and returns the best match from supported locales.

Supports both Accept-Language header and singular locale strings (see below).

Sequences of locales are supported.

Nested structures are NOT supported.

Tries pretty hard to find a match for each candidate locale:

  • nil falls back to default locale
  • Sequences and accept headers are processed in order
  • Munged locales are fixed as per fix-locale (see above)
  • First match against supported locales is returned
  • If there are no matches, country codes are stripped for a second pass
  • If there are no matches after two passes, falls back to default locale

i18n.locale/accept-language->locales

Takes an Accept-Language header string and converts to a seq of locales.

Parses the string according to the rules documented at:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language

The output sequence is ordered by "quality" scores from the header string.

This header contains the most detailed, reliable and relevant user configuration available in the browser, provided you can actually get at it.

As far as I know, there is no way to get at an Accept-Language header with raw JavaScript. Based on my testing and research both the XMLHttpRequest() and fetch() APIs won't make this header available for inspection.

The only way to get at this header is to read a request from the server and return the header string in the server response.

If you don't have a server, this snippet will return Accept-Language strings from any request to a free tier endpoint from Webtask IO:

/**
* @param context {WebtaskContext}
*/
module.exports = function(context, cb) {
  cb(null, context.headers['accept-language'] || context.headers['Accept-Language']);
};

Make sure to save any fetched headers in local/session storage to avoid spamming round trips to the server for redundant locale information.

i18n.locale/system-locales

Attempts to detect the user's preferred locale from the browser or OS.

This is entirely different from and less reliable than Accept-Language discussed above, but has the advantage of being available in the browser without a server round-trip.

Runs through the various options documented at:

https://zzz.buzz/2016/01/13/detect-browser-language-in-javascript/

Normalizes the return values to a sequence of locales or nil.

i18n.number - Number format and parse

Both formatting and parsing of numbers is supported in i18n.number.

The public API consists of 2 fns:

  • i18n.number/format
  • i18n.number/parse

Both take a number/string to be formatted/parsed as the first arg and optional k/v pairs for other options.

Uncached versions i18n.number/-format and i18n.number/-parse are available if memory pressure is a concern.

See i18n.number tests for examples of both.

Shared options

:locale

Any supported locale code (see above).

:pattern

A CLDR number formatting pattern or one of the preconfigured formats as per i18n.number/formats. Currently supported formats: :decimal, :scientific, :percent, :currency, :compact-short, :compact-long.

Default :pattern is :decimal.

Official documentation for CLDR patterns:

http://cldr.unicode.org/translation/number-patterns

Example patterns:

https://github.com/google/closure-library/blob/master/closure/goog/i18n/numberformatsymbols.js

:ascii?

Boolean to use only ascii digits in the output.

Default is false.

Parsing is also influenced by the ascii digit configuration.

(format 1000000 :locale "fa") ; "۱٬۰۰۰٬۰۰۰"
(format 1000000 :locale "fa" :ascii? true) ; "1,000,000"

Parsing

You should not typically need to set a pattern manually for parse, :locale alone should be sufficient.

(parse "1,000,000") ; 1000000
(parse "10,00,000") ; 1000000
(parse "1,000,000" :locale "gl") ; 1

Formatting

Format has a few extra options not available to parse.

:min-fraction-digits

Integer minimum number of digits to allow for fractions.

Default is 0, max is :max-fraction-digits (see below).

Fills out missing digits with trailing zeros.

(format 1) ; "1"
(format 1 :min-fraction-digits 1) ; "1.0"

:max-fraction-digits

Integer maximum number of digits to allow for fractions.

Default is 3, max is 308.

Does NOT fill out missing digits with trailing zeros.

Applies rounding to truncated values.

(format (/ 10 3)) ; "3.333"
(format (/ 10 3) :max-fraction-digits 1) ; "3.3"
(format (/ 10 3) :max-fraction-digits 2) ; "3.33"
(format (/ 10 3) :max-fraction-digits 3) ; "3.333"
(format 1 :max-fraction-digits 3) ; "1"
(format 1.5678) ; "1.568"

:significant-digits

Integer number of significant digits for the formatted number.

Default is 0, max is :max-fraction-digits.

CANNOT be combined with :min-fraction-digits.

Applies rounding to truncated values.

(format (/ 10 3) :significant-digits 3) ; "3.33"
(format (/ 1 3) :significant-digits 3) ; "0.333"
(format 1.2 :significant-digits 3) ; "1.2"
(format (/ 2 3) :significant-digits 3) ; "0.667"

trailing-zeros?

Boolean to show trailing zeros if :significant-digits is positive.

Has no effect on :min-fraction-digits or :max-fraction-digits.

(format 1.2 :significant-digits 3 :trailing-zeros? true) ; "1.20"

nil-string

String to return for nil.

Default is "".

(format nil) ; ""
(format nil :nil-string "-") ; "-"

nan-string

String to return for ##NaN.

Default is a localised version of "NaN".

(format ##NaN) ; "NaN"
(format ##NaN :locale "fa") ; "ناعدد"

(format ##NaN :nan-string "-") ; "-"
(format ##NaN :nan-string "-" :locale "fa") ; "-"

i18n.datetime - Datetime format, parse and timezones

Both formatting and parsing of numbers is supported in i18n.datetime.

The public API consists of 3 fns:

  • i18n.datetime/format
  • i18n.datetime/parse
  • i18n.datetime/timezone

Format and parse take a number/string to be formatted/parsed as the first arg and optional k/v pairs for other options. They intentionally mirror the signatures of the i18n.number format and parse fns.

Uncached versions i18n.datetime/-format and i18n.datetime/-parse are available if memory pressure is a concern.

See i18n.datetime tests for examples.

Format/parse shared options

:locale

Any supported locale code (see above).

:pattern

A CLDR number formatting pattern or one of the preconfigured formats as per i18n.datetime/formats and i18n.datetime/pattern->common-pattern.

These wrap and normalise two separate sets of patterns in goog:

  • goog.i18n.DateTimeFormat.Format
  • goog.i18n.DateTimePatterns

Currently supported formats:

:full-date, :long-date, :medium-date, :short-date, :full-time, :long-time, :medium-time, :short-time, :full-datetime, :long-datetime, :medium-datetime, :short-datetime, :year-full, :year-full-with-era, :year-month-abbr, :year-month-full, :month-day-abbr, :month-day-full, :month-day-short, :month-day-medium, :month-day-year-medium, :weekday-month-day-medium, :weekday-month-day-year-medium, :day-abbr.

Default :pattern is :long-date.

Official documentation for CLDR patterns:

http://cldr.unicode.org/translation/date-time-patterns

Example patterns:

https://github.com/google/closure-library/blob/master/closure/goog/i18n/datetimepatterns.js

:ascii?

Boolean to use only ascii digits in the output.

Default is false.

Parsing is also influenced by the ascii digit configuration.

(format
  (js/Date. 106000000000)
  :locale "ar"
  :tz 0
  :pattern
  :weekday-month-day-year-medium
  :ascii? false) ; "الجمعة، ١١ مايو، ١٩٧٣"
(format
 (js/Date. 106000000000)
 :locale "ar"
 :tz 0
 :pattern :weekday-month-day-year-medium
 :ascii? true) ; "الجمعة، 11 مايو، 1973"

Formatting

Takes a js/Date or a compatible Date object, e.g. goog.date.DateTime and returns a formatted string.

(format (js/Date. 106000000000) :locale "en-US" :tz 0) ; "May 11, 1973"
(format (js/Date. 106000000000) :locale "en-AU" :tz 0) ; "11 May 1973"

(format (js/Date. 106000000000) :locale "en-US" :pattern :short-time :tz 0) ; "8:26 PM"
(format (js/Date. 106000000000) :locale "en-AU" :pattern :short-time :tz 0) ; "8:26 pm"

(format (js/Date. 106000000000) :locale "en-US" :pattern :full-datetime :tz 0) ; "Friday, May 11, 1973 at 8:26:40 PM UTC"
(format (js/Date. 106000000000) :locale "en-AU" :pattern :full-datetime :tz 0) ; "Friday, 11 May 1973 at 8:26:40 pm UTC"

i18n.datetime/format takes an optional :tz parameter. The value of :tz will be passed to i18n.datetime/timezone before use in the formatter. The timezone handling in goog.i18n.TimeZone works like js/Date methods, in that it is an offset, i.e. negative minutes (see below).

The default :tz is :local, i.e. (.getTimezoneOffset (js/Date.)).

(format (js/Date. 106000000000) :locale "en-AU" :pattern :full-datetime :tz 0) ; "Friday, 11 May 1973 at 8:26:40 pm UTC"
(format (js/Date. 106000000000) :locale "en-AU" :pattern :full-datetime :tz -600) ; "Saturday, 12 May 1973 at 6:26:40 am UTC+10"

Parsing

Takes a date string and attempts to parse to a js/Date instant.

You MUST provide a pattern AND a locale for date parsing to work.

If no :pattern is provided an error will be thrown, but if a :pattern is provided that does not match the structure of the date string, the parser will silently fallback to "now" in the returned date.

(parse "May 12, 1973" :locale "en-US" :pattern :long-date) ; #inst "1973-05-12T08:00:54.428-00:00" - SUCCESS!
(parse "5/11/73" :locale "en-US" :pattern :long-date) ; #inst "2018-03-24T07:00:54.425-00:00" - FAIL!

Timezones

The i18n.datetime/timezone fn is a thin wrapper around Google Closure's own i18n timezone handling.

Numeric values are treated as a simple offset in negative minutes, e.g. UTC+10 hours would be -600 in goog. This is compatible with the timezone offset API provided by native JS.

To get the current offset in the browser, pass :local to i18n.datetime/timezone or call (.getTimezoneOffset (js/Date.)).

Numeric values do not support daylight savings. For DST support a JS object must be provided outlining all the details of the timezone.

Google strongly recommends serving timeZoneData objects from the server as needed rather than shipping the full set of global timezones to the client. If only a few timezones are required and can be specified ahead of time, they can be set statically in client code.

Documentation for timeZoneData objects:

https://github.com/google/closure-library/blob/master/closure/goog/i18n/timezone.js#L144

Examples of timeZoneData objects:

https://github.com/google/closure-library/blob/master/closure/goog/i18n/timezone_test.js#L29

Collation

TODO - patches welcome!

Currency

TODO - patches welcome!

Text messages

TODO - patches welcome!

Plurals

Finds a proper plural form for a given number.

Both plurals AND locale are required.

(def plurals-en {i18n.plural/one "book" i18n.plural/other "books"})
(plural 0 :locale "en" :plurals plurals-en) ; "books"
(plural 1 :locale "en" :plurals plurals-en) ; "book"

BIDI text

TODO - patches welcome!