Announcing intl-codegen 2
— 6 minI have been thinking for a long time about how intl-codegen 2 would look like, and some time ago I went about implementing it. Since then I have validated the concepts by migrating the eversports codebase to it. The migration was quite painless, with some mechanical steps.
Since it is a proper version 2, it does have some breaking changes, together with some exciting features, so lets dive in.
# Changes in v2
# Fluent syntax support
One very big item here is support for the fluent syntax. Well, limited support
that is. Fluent has some features that are not yet supported by intl-codegen
,
but might be in the future. One example is missing support for terms
and
referencing other messages. Another feature that is missing is support for fluent
attributes
.
MessageFormat
is still supported, and has some nice improvements in this
release.
But in general, I consider fluent to be the better format in general, and I will
likely drop MessageFormat
support at some point when the tooling around fluent
matures.
# Proper typing support
Another very big item is support for proper types. Every placeholder that is used in translations needs to be declared beforehand. The easiest way to do so is via doc comments in fluent syntax. There is a proposal to properly add these type of comments to the fluent syntax, but it is not final yet.
# $value (monetary)
fluent-monetary = a monetary value: { $value }
So far, it supports the types string
, number
, datetime
, monetary
and
element
. Together with these changes, there was also a split to separate the
message template
declaration from the translations.
When using MessageFormat
, there is an explicit API to declare messages and
the placeholder types.
Having type declarations improves both the typescript side of things, since it
gives better code completion and errors.
But it also made it possible to better check the correctness of the translations
themselves.
At eversports, we have a small team of translators, which are not engineers
and do struggle with the MessageFormat
syntax a bit and sometimes translate
parts of the syntax itself.
I plan to further improve this, such as validating the plural rules, since
translators that struggle with the syntax actually translate the one
or other
selectors.
Some examples of useful errors:
test.tsx (15,24): Type 'number' is not assignable to type 'string'.
[wrong-type: template/msgfmt-string-as-plural]: Messageformat `plural` selector is only valid for type "number", but parameter `param` has type `string`.
> 1 | {param} {param,plural,
| ^^^^^^^^^^^^^
> 2 | one {parameter}
| ^^^^^^^^^^^^^^^^^
> 3 | other {parameters}
| ^^^^^^^^^^^^^^^^^
> 4 | }
| ^^
[missing-other: template/msgfmt]: MessageFormat requires an `other` case to be defined.
> 1 | selector: {param, select, foo {its foo} bar {its bar}}.
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
All these checks have actually caught some real bugs, both in our typescript code and especially inside of the translation strings themselves.
# Proper language detection
One pain points with intl-codegen 1.x
was that it was only able to load the
defined list of locales, and one had to build language detection around it.
Version 2 now ships with a small runtime that uses fluent-langneg to do proper
language detection, based on either the Accept-Language
header,
or the navigator.languages
property.
# Better formatting and pluralization
Related to this, version 1 also had a severe design limitation, as it hardcoded
the formatting based on the translation. So it would use the same formatting
for the de
language, even though the formatting differs quite a bit based on
the locale. When formatting monetary values, you have 1.234,56 €
for de-de
,
€ 1.234,56
for de-at
and CHF 1’234.56
for de-ch
, but it is still the
same language.
In version 2, the loaded
language and the locale used for the formatter
are
now decoupled, so you should always get the correct formatting for the locale
you requested.
Apart from formatting, version 2 also has proper support for pluralization. This
means you can use the one
selector instead of an explicit =0
, or any of the
other 6 cases. There is also support for ordinal
cases. This all depends
on platform support for Intl.PluralRules
, so the developer needs to provide
appropriate polyfills if support for old platforms is a priority.
# Split out react support
All the react-specific codegen is now output into a separate react
file. So
it is possible to use intl-codegen without react, at least when you do not
declare element
-type placeholders.
# Some recommendations and best practices
There are some clear dos and don’ts that pop into your eyes when you are involved with localization tools, which are not that obvious to other engineers.
# Put everything into translations
I still often see engineers that are translating single words, and then building
those fragments into sentences in code.
A constructed example would be _("Hello, ") + name + _(". How are you?")
.
A more common, and less obvious, case is when you are just combining formatted
values with some whitespace and punctuation such as {date} - {time}
.
The problems you could potentially have here are not that obvious if you are primarily working with germanic languages. But there are other languages out there, which change the order of some placeholders based on grammar. Or left-to-right languages. Some languages may want to use different punctuation symbols. And so on… So the easiest thing to do is to just put everything into a translation string. Also your translators will thank you, because it gives them both more freedom and more context to know what needs to be done.
# Use formatters
Similar to the case above, I still see engineers that are not using formatters
properly. Things like formatting a datetime
value ahead-of-time, and putting
it into the translation as a string
.
Or the quite frequent case where engineers are not aware of the builtin
monetary
support, and are creating translation strings such as
{value}{currency}
, which will be wrongly formatted for 2 out of the 3 german
languages I highlighted above.
One problem, both related to formatters and to translation context is
element
-type placeholders.
I considered experimenting with a feature called DOM Overlays, but decided to
postpone it to later. Essentially, DOM Overlays would give a much larger context
to translators, and would make it more easily possible to put some placeholders
into styled elements, with proper typing support. Maybe :-D
# Establish guidelines
Apart from the two cases above, to put everything into the translations, and to properly use formatters. There is also the question of how to structure translations. How to name them? How to deal with conditional placeholders?
How should you name your translation keys?
intl-codegen is a little bit opinionated already in this regard. Mainly because
there is a syntactic difference between identifiers in js, and translation keys.
a-translation-key
will become aTranslationKey()
.
In general, I would recommend using dashed translation keys, as in this example.
Give the keys descriptive names that give some context. Do not name the key
continue
, but rather registration-finished-continue-button
, or something. :-D