Arpad Borsos Blog Resume

Announcing intl-codegen 2

— 6 min

I have been thinking for a long time about how intl-codegen 2 would look like, and some time ago I went about implementing it. Since then I have validated the concepts by migrating the eversports codebase to it. The migration was quite painless, with some mechanical steps.

Since it is a proper version 2, it does have some breaking changes, together with some exciting features, so lets dive in.

Changes in v2

Fluent syntax support

One very big item here is support for the fluent syntax. Well, limited support that is. Fluent has some features that are not yet supported by intl-codegen, but might be in the future. One example is missing support for terms and referencing other messages. Another feature that is missing is support for fluent attributes.

MessageFormat is still supported, and has some nice improvements in this release.

But in general, I consider fluent to be the better format in general, and I will likely drop MessageFormat support at some point when the tooling around fluent matures.

Proper typing support

Another very big item is support for proper types. Every placeholder that is used in translations needs to be declared beforehand. The easiest way to do so is via doc comments in fluent syntax. There is a proposal to properly add these type of comments to the fluent syntax, but it is not final yet.

# $value (monetary)
fluent-monetary = a monetary value: { $value }

So far, it supports the types string, number, datetime, monetary and element. Together with these changes, there was also a split to separate the message template declaration from the translations.

When using MessageFormat, there is an explicit API to declare messages and the placeholder types.

Having type declarations improves both the typescript side of things, since it gives better code completion and errors. But it also made it possible to better check the correctness of the translations themselves. At eversports, we have a small team of translators, which are not engineers and do struggle with the MessageFormat syntax a bit and sometimes translate parts of the syntax itself.

I plan to further improve this, such as validating the plural rules, since translators that struggle with the syntax actually translate the one or other selectors.

Some examples of useful errors:

test.tsx (15,24): Type 'number' is not assignable to type 'string'.
[wrong-type: template/msgfmt-string-as-plural]: Messageformat `plural` selector is only valid for type "number", but parameter `param` has type `string`.
> 1 | {param} {param,plural,
    |         ^^^^^^^^^^^^^
> 2 |   one {parameter}
    | ^^^^^^^^^^^^^^^^^
> 3 |   other {parameters}
    | ^^^^^^^^^^^^^^^^^
> 4 | }
| ^^
[missing-other: template/msgfmt]: MessageFormat requires an `other` case to be defined.
> 1 | selector: {param, select, foo {its foo} bar {its bar}}.
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

All these checks have actually caught some real bugs, both in our typescript code and especially inside of the translation strings themselves.

Proper language detection

One pain points with intl-codegen 1.x was that it was only able to load the defined list of locales, and one had to build language detection around it. Version 2 now ships with a small runtime that uses fluent-langneg to do proper language detection, based on either the Accept-Language header, or the navigator.languages property.

Better formatting and pluralization

Related to this, version 1 also had a severe design limitation, as it hardcoded the formatting based on the translation. So it would use the same formatting for the de language, even though the formatting differs quite a bit based on the locale. When formatting monetary values, you have 1.234,56 € for de-de, € 1.234,56 for de-at and CHF 1’234.56 for de-ch, but it is still the same language.

In version 2, the loaded language and the locale used for the formatter are now decoupled, so you should always get the correct formatting for the locale you requested.

Apart from formatting, version 2 also has proper support for pluralization. This means you can use the one selector instead of an explicit =0, or any of the other 6 cases. There is also support for ordinal cases. This all depends on platform support for Intl.PluralRules, so the developer needs to provide appropriate polyfills if support for old platforms is a priority.

Split out react support

All the react-specific codegen is now output into a separate react file. So it is possible to use intl-codegen without react, at least when you do not declare element-type placeholders.

Some recommendations and best practices

There are some clear dos and don’ts that pop into your eyes when you are involved with localization tools, which are not that obvious to other engineers.

Put everything into translations

I still often see engineers that are translating single words, and then building those fragments into sentences in code. A constructed example would be _("Hello, ") + name + _(". How are you?"). A more common, and less obvious, case is when you are just combining formatted values with some whitespace and punctuation such as {date} - {time}.

The problems you could potentially have here are not that obvious if you are primarily working with germanic languages. But there are other languages out there, which change the order of some placeholders based on grammar. Or left-to-right languages. Some languages may want to use different punctuation symbols. And so on… So the easiest thing to do is to just put everything into a translation string. Also your translators will thank you, because it gives them both more freedom and more context to know what needs to be done.

Use formatters

Similar to the case above, I still see engineers that are not using formatters properly. Things like formatting a datetime value ahead-of-time, and putting it into the translation as a string.

Or the quite frequent case where engineers are not aware of the builtin monetary support, and are creating translation strings such as {value}{currency}, which will be wrongly formatted for 2 out of the 3 german languages I highlighted above.

One problem, both related to formatters and to translation context is element-type placeholders. I considered experimenting with a feature called DOM Overlays, but decided to postpone it to later. Essentially, DOM Overlays would give a much larger context to translators, and would make it more easily possible to put some placeholders into styled elements, with proper typing support. Maybe :-D

Establish guidelines

Apart from the two cases above, to put everything into the translations, and to properly use formatters. There is also the question of how to structure translations. How to name them? How to deal with conditional placeholders?

How should you name your translation keys? intl-codegen is a little bit opinionated already in this regard. Mainly because there is a syntactic difference between identifiers in js, and translation keys. a-translation-key will become aTranslationKey(). In general, I would recommend using dashed translation keys, as in this example. Give the keys descriptive names that give some context. Do not name the key continue, but rather registration-finished-continue-button, or something. :-D