1 Internationalization Overview
Internationalization is the process of designing an application so that it can be adapted to various languages and regions without engineering changes. Sometimes the term internationalization is abbreviated as i18n, because there are 18 letters between the first "i" and the last "n."
An internationalized program has the following characteristics:
- With the addition of localization data, the same executable can run worldwide.
- Textual elements, such as status messages and the GUI component labels, are not hardcoded in the program. Instead they are stored outside the source code and retrieved dynamically.
- Support for new languages does not require recompilation.
- Culturally-dependent data, such as dates and currencies, appear in formats that conform to the end user's region and language.
- It can be localized quickly.
The internet demands global software - that is, software that can be developed independently of the countries or languages of its users, and then localized for multiple countries or regions. The Java Platform provides a rich set of APIs for developing global applications. These internationalization APIs are based on the Unicode standard and include the ability to adapt text, numbers, dates, currency, and user-defined objects to any country's conventions.
This guide summarizes the internationalization APIs and features of the Java Platform, Standard Edition. For coding examples and step-by-step instructions, see the Internationalization Trail in the Java Tutorials.
Text Representation
The Java programming language is based on the Unicode character set, and several libraries implement the Unicode standard. Unicode is an international character set standard which supports all of the major scripts of the world, as well as common technical symbols. The original Unicode specification defined characters as fixed-width 16-bit entities, but the Unicode standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF. An encoding defined by the standard, UTF-16, allows to represent all Unicode code points using one or two 16-bit units.
The primitive data type char in the Java programming language is an unsigned 16-bit integer that can represent a Unicode code point in the range U+0000 to U+FFFF, or the code units of UTF-16. The various types and classes in the Java platform that represent character sequences - char[], implementations of java.lang.CharSequence (such as the String class), and implementations of java.text.CharacterIterator - are UTF-16 sequences. Most Java source code is written in ASCII, a 7-bit character encoding, or ISO-8859-1, an 8-bit character encoding, but is translated into UTF-16 before processing.
The Character class is an object wrapper for the char primitive type. The Character class also contains static methods such as isLowerCase() and isDigit() for determining the properties of a character. These methods have overloads that accept either a char (which allows representation of Unicode code points in the range U+0000 to U+FFFF) or an int (which allows representation of all Unicode code points).
Locale Identification and Localization
A Locale object is an identifier for a particular combination of language and region. Localization is the process of adapting software for a specific region or language by adding locale-specific components and translating text.
Locales
On the Java platform, a locale is simply an identifier for a particular combination of language and region. It is not a collection of locale-specific attributes. Instead, each locale-sensitive class maintains its own locale-specific information. With this design, there is no difference in how user and system objects maintain their locale-specific resources. Both use the standard localization mechanism.
Java programs are not assigned a single global locale. All locale-sensitive operations may be explicitly given a locale as an argument. This greatly simplifies multilingual programs. While a global locale is not enforced, a default locale is available for programs that do not wish to manage locales explicitly. A default locale also makes it possible to affect the behavior of the entire presentation with a single choice.
Java locales act as requests for certain behavior from another object. For example, a French Canadian locale passed to a Calendar object asks that the Calendar behave correctly for the customs of Quebec. It is up to the object accepting the locale to do the right thing. If the object has not been localized for a particular locale, it will try to find a "close" match with a locale for which it has been localized. Thus if a Calendar object was not localized for French Canada, but was localized for the French language in general, it would use the French localization instead.
Locale Class
A Locale object represents a specific geographical, political, or cultural region. An operation that requires a locale to perform its task is called locale-sensitive and uses the Locale object to tailor information for the user. For example, displaying a number is a locale-sensitive operation - the number should be formatted according to the customs and conventions of the user's native country, region, or culture.
Supported Locales
On the Java platform, there does not have to be a single set of supported locales, since each class maintains its own localizations. Nevertheless, there is a consistent set of localizations supported by the classes of the Java Platform. Other implementations of the Java Platform may support different locales. Locales that are supported by the JDK are summarized by release. Use the search field on the Technical Resources from Oracle page and search for "Supported Locales" to see what is supported.
Localized Resources
All locale-sensitive classes must be able to access resources customized for the locales they support. To aid in the process of localization, it helps to have these resources grouped together by locale and separated from the locale-neutral parts of the program.
ResourceBundle Class
The class ResourceBundle is an abstract base class representing containers of resources. Programmers create subclasses of ResourceBundle that contain resources for a particular locale. New resources can be added to an instance of ResourceBundle, or new instances of ResourceBundle can be added to a system without affecting the code that uses them. Packaging resources as classes allows developers to take advantage of Java's class loading mechanism to find resources.
Resource bundles contain locale-specific objects. When a program needs a locale-specific resource, such as a String object, the program can load it from the resource bundle that is appropriate for the current user's locale. In this way, the programmer can write code that is largely independent of the user's locale, isolating most, if not all, of the locale-specific information in resource bundles.
This allows Java programmers to write code that can
- be easily localized, or translated, into different languages
- handle multiple locales at once
- be easily modified later to support even more locales
ResourceBundle.Control Class
ResourceBundle.Control is a nested class of ResourceBundle. It defines methods to be called by the ResourceBundle.getBundle factory methods so that the resource bundle loading behavior may be changed. For example, application specific resource bundle formats, such as XML, could be supported by overriding the methods.
ResourceBundle.Control is not supported in named
modules. Existing code using Control is expected to work, but for new
code in a named module, implement basenameProvider and load the resource bundle from there. See
Resource Bundles and Named Modules.
ListResourceBundle Class
ListResourceBundle is an abstract subclass of ResourceBundle that manages resources for a locale in a convenient and easy to use list.
PropertyResourceBundle Class
PropertyResourceBundle is a concrete subclass of ResourceBundle that manages resources for a locale using a set of static strings from a property file.
Date and Time Handling
The Date-Time package, java.time, provides a comprehensive model for date and time. Although java.time is based on the International Organization for Standardization (ISO) calendar system, commonly used global calendars are also supported.
See The Date-Time Packages lesson in The Java Tutorials (Java SE 8 and earlier).
Text Processing
Text processing involves formatting locale-sensitive information such as, currencies, dates, times, and text messages. It also includes manipulating text in a locale-sensitive manner, meaning that string operations, such as searching and sorting, are properly performed regardless of locale.
Formatting
It is in formatting data for output that many cultural conventions are applied. Numbers, dates, times, and messages may all require formatting before they can be displayed. The Java platform provides a set of flexible formatting classes that can handle both the standard locale formats and programmer defined custom formats. These formatting classes are also able to parse formatted strings back into their constituent objects.
Format Class
The class Format is an abstract base class for formatting locale-sensitive information such as dates, times, messages, and numbers. Three main subclasses are provided: DateFormat, NumberFormat, and MessageFormat. These three also provide subclasses of their own.
DateFormat Class
Dates and times are stored internally in a locale-independent way, but should be formatted so that they can be displayed in a locale-sensitive manner. For example, the same date might be formatted as follows:
- November 3, 1997 (English)
- 3 novembre 1997 (French)
The class DateFormat is an abstract base class for formatting and parsing date and time values in a locale-independent manner. It has a number of static factory methods for getting standard time formats for a given locale.
The DateFormat object uses Calendar and TimeZone objects in order to interpret time values. By default, a DateFormat object for a given locale will use the appropriate Calendar object for that locale and the system's default TimeZone object. The programmer can override these choices if desired.
SimpleDateFormat Class
The class SimpleDateFormat is a concrete class for formatting and parsing dates and times in a locale-sensitive manner. It allows for formatting (milliseconds to text), parsing (text to milliseconds), and normalization.
DateFormatSymbols Class
The class DateFormatSymbols is used to encapsulate localizable date-time formatting data, such as the names of the months, the names of the days of the week, time of day, and the time zone data. The DateFormat and SimpleDateFormat classes both use the DateFormatSymbols class to encapsulate this information.
Usually, programmers will not use the DateFormatSymbols directly. Rather, they will implement formatting with the DateFormat class's factory methods.
NumberFormat Class
The class NumberFormat is an abstract base class for formatting and parsing numeric data. It contains a number of static factory methods for getting different kinds of locale-specific number formats.
The NumberFormat class helps programmers to format and parse numbers for any locale. Code using this class can be completely independent of the locale conventions for decimal points, thousands-separators, the particular decimal digits used, or whether the number format is even decimal. The application can also display a number as a normal decimal number, currency, or percentage:
- 1,234.5 (decimal number in U.S. format)
- $1,234.50 (U.S. currency in U.S. format)
- 1.234,50 € (European currency in German format)
- 123.450% (percent in German format)
DecimalFormat Class
Numbers are stored internally in a locale-independent way, but should be formatted so that they can be displayed in a locale-sensitive manner. For example, when using "#,###.00" as a pattern, the same number might be formatted as follows:
- 1.234,56 (German)
- 1,234.56 (English)
The class DecimalFormat, which is a concrete subclass of the NumberFormat class, can format decimal numbers. Programmers generally will not instantiate this class directly but will use the factory methods provided.
The DecimalFormat class has the ability to take a pattern string to specify how a number should be formatted. The pattern specifies attributes such as the precision of the number, whether leading zeros should be printed, and what currency symbols are used. The pattern string can be altered if a program needs to create a custom format.
DecimalFormatSymbols Class
The class DecimalFormatSymbols represents the set of symbols (such as the decimal separator, the grouping separator, and so on) needed by DecimalFormat to format numbers. DecimalFormat creates for itself an instance of DecimalFormatSymbols from its locale data. A programmer needing to change any of these symbols can get the DecimalFormatSymbols object from the DecimalFormat object and then modify it.
ChoiceFormat Class
The class ChoiceFormat is a concrete subclass of the NumberFormat class. The ChoiceFormat class allows the programmer to attach a format to a range of numbers. It is generally used in a MessageFormat object for handling plurals.
MessageFormat Class
Programs often need to build messages from sequences of strings, numbers and other data. For example, the text of a message displaying the number of files on a disk drive will vary:
- The disk C contains 100 files.
- The disk D contains 1 file.
- The disk F contains 0 files.
If a message built from sequences of strings and numbers is hard-coded, it cannot be translated into other languages. For example, note the different positions of the parameters "3" and "G" in the following translations:
- The disk G contains 3 files. (English)
- Il y a 3 fichiers sur le disque G. (French)
The class MessageFormat provides a means to produce concatenated messages in language-neutral way. The MessageFormat object takes a set of objects, formats them, and then inserts the formatted strings into the pattern at the appropriate places.
ParsePosition Class
The class ParsePosition is used by the Format class and its subclasses to keep track of the current position during parsing. The parseObject() method in the Format class requires a ParsePosition object as an argument.
FieldPosition Class
The FieldPosition class is used by the Format class and its subclasses to identify fields in formatted output. One version of the format() method in the Format class requires a FieldPosition object as an argument.
Locale-Sensitive String Operations
Programs frequently need to manipulate strings. Common operations on strings include searching and sorting. Some tasks, such as collating strings or finding various boundaries in text, are surprisingly difficult to get right and are even more difficult when multiple languages must be considered. The Java Platform provides classes for handling many of these common string manipulations in a locale-sensitive manner.
Collator Class
The Collator class performs locale-sensitive string comparison. Programmers use this class to build searching and alphabetical sorting routines for natural language text. Collator is an abstract base class. Its subclasses implement specific collation strategies. One subclass, RuleBasedCollator, is applicable to a wide set of languages. Other subclasses may be created to handle more specialized needs.
RuleBasedCollator Class
The RuleBasedCollator class, which is a concrete subclass of the Collator class, provides a simple, data-driven, table collator. Using RuleBasedCollator, a programmer can create a customized table-based collator. For example, a programmer can build a collator that will ignore (or notice) uppercase letters, accents, and Unicode combining characters.
CollationElementIterator Class
The CollationElementIterator class is used as an iterator to walk through each character of an international string. Programmers use the iterator to return the ordering priority of the positioned character. The ordering priority of a character, or key, defines how a character is collated in the given Collator object. The CollationElementIterator class is used by the compare() method of the RuleBasedCollator class.