Virtually every company I worked for made the mistake of not following internationalization best practices early on. Even if you don’t anticipate making your app or service available in other regions today, it is important to follow a few simple practices to avoid incurring tech debt that is time consuming and expensive to retire later on. Fundamentally this about separation of concerns, a staple of modern software development. Just as you don’t want to hard wire business logic in your presentation layer, the presentation layer itself should be designed to support more than one language.
Almost every company I worked for hard-coded user facing strings in English, instead of passing them through a function. The problem with this by the time you get around to launching additional languages, you may have have thousands of strings scattered throughout your code base. Cleaning this up later is expensive, tedious and time consuming (and nobody wants to do this work because it is so boring).
Create a dummy function like the one below that then passes user facing prompts through. All it does is receive the text to be displayed along with context (a message describing how it is used), and then returns that text. Notice that I included a simple test to verify the context message is present (translators greatly benefit from information about the context strings are used in, and developers habitually forget to provide this). I also included an optional url parameter (so translators can click through to in context view of the prompt, which is hugely helpful for QA). This function is also a good place to insert QA logic so you can set up better automated testing.
def t(text, context, keyname="", url = ""):
if type(text) is not str:
raise Exception("text must be a string value")
if type(context) is not str:
raise Exception("context must be a string value")
if len(context) < 1:
raise Exception("You must provide context for how this string is used")
#
# future logic to hook into the translation pipeline goes here
#
return text
print(t("Hello World!","Display a Hello World greeting to the user."))
Just doing this eliminates a huge source of tech debt. You don’t need to worry about the specifics of what your translation pipeline will look like now. You can decide on those details and update this wrapper function to implement that later.
<aside> 🚖 At Lyft, by the time the company decided to add Spanish and several other languages, it had accrued years of tech debt that took over a year and well over a million dollars of engineering time to retire. Meanwhile, Uber was already operating in over a dozen languages and was killing us in the Spanish speaking market. This all could have been avoided had the company followed global ready coding practices early on.
</aside>
In a similar vein, you should build dummy functions to render dates, times, numbers and currency amounts. The formatting for these each varies by locale, so you don’t want to bake US English assumptions for these into your code. Take dates, for example. July 1st, 2023 will typically be formatted as 1 July 2023 in Great Britain. The good news is there are libraries like Intl that handle all of this for every locale imaginable.
def render_date(d):
if type(d) is not date:
raise TypeError("d must be a date value")
return str(d)
If the programming language you are using has a mature internationalization library, such as Intl for Javascript, you can just use that. This problem has been solved many times over, so you don’t need to re-invent the wheel. That said, it is a good idea to wrap that with your own function, so that you can override the default formats with your own rules (for example to prettify dates and times per your design guidelines in specific cases, then fallback to whatever the i18n library generates).
Generating sentences on the fly is a big no-no because not all languages have the same subject-verb-object word order that English does. A dynamically generated sentence that makes sense in English will look completely jumbled in a language like German.
# DO NOT DO THIS
msg = "You have " + str(count) + " widgets in your account."
# DO THIS
msg = "Your account balance: {count}").replace("{count}",str(count))
The best practice is to use message templates with interpolated values. This way translators can reorder the sentence to conform with the rules for the target language. If you are merging numeric values into a message, you’ll probably want to use the ICU message format for this because each language handles pluralization differently.
As noted above, languages handle plural values differently, something else you don’t want to deal with in code. The ICU message format deals with this well. So use that for messages with numeric interpolated values.
You also need to be aware of gender. For example, in Spanish nouns and adjectives must share the same gender (masculine or feminine). The word for red is rojo (masc) or roja (fem), and which form you use depends on the gender of the noun it is modifying. While it is perfectly understandable if the gender is mismatched, it looks bad and lazy to native speakers. The same goes for numeric values.
For this reason, you should take care when designing message templates that use interpolated values. I am not saying you shouldn’t use interpolated values, just be mindful of how rules vary by language and try to avoid generating sentences on the fly.