| 1 |
|
|---|
| 2 | # This document contains text in Perl "POD" format.
|
|---|
| 3 | # Use a POD viewer like perldoc or perlman to render it.
|
|---|
| 4 |
|
|---|
| 5 | # This corrects some typoes in the previous release.
|
|---|
| 6 |
|
|---|
| 7 | =head1 NAME
|
|---|
| 8 |
|
|---|
| 9 | Locale::Maketext::TPJ13 -- article about software localization
|
|---|
| 10 |
|
|---|
| 11 | =head1 SYNOPSIS
|
|---|
| 12 |
|
|---|
| 13 | # This an article, not a module.
|
|---|
| 14 |
|
|---|
| 15 | =head1 DESCRIPTION
|
|---|
| 16 |
|
|---|
| 17 | The following article by Sean M. Burke and Jordan Lachler
|
|---|
| 18 | first appeared in I<The Perl
|
|---|
| 19 | Journal> #13 and is copyright 1999 The Perl Journal. It appears
|
|---|
| 20 | courtesy of Jon Orwant and The Perl Journal. This document may be
|
|---|
| 21 | distributed under the same terms as Perl itself.
|
|---|
| 22 |
|
|---|
| 23 | =head1 Localization and Perl: gettext breaks, Maketext fixes
|
|---|
| 24 |
|
|---|
| 25 | by Sean M. Burke and Jordan Lachler
|
|---|
| 26 |
|
|---|
| 27 | This article points out cases where gettext (a common system for
|
|---|
| 28 | localizing software interfaces -- i.e., making them work in the user's
|
|---|
| 29 | language of choice) fails because of basic differences between human
|
|---|
| 30 | languages. This article then describes Maketext, a new system capable
|
|---|
| 31 | of correctly treating these differences.
|
|---|
| 32 |
|
|---|
| 33 | =head2 A Localization Horror Story: It Could Happen To You
|
|---|
| 34 |
|
|---|
| 35 | =over
|
|---|
| 36 |
|
|---|
| 37 | "There are a number of languages spoken by human beings in this
|
|---|
| 38 | world."
|
|---|
| 39 |
|
|---|
| 40 | -- Harald Tveit Alvestrand, in RFC 1766, "Tags for the
|
|---|
| 41 | Identification of Languages"
|
|---|
| 42 |
|
|---|
| 43 | =back
|
|---|
| 44 |
|
|---|
| 45 | Imagine that your task for the day is to localize a piece of software
|
|---|
| 46 | -- and luckily for you, the only output the program emits is two
|
|---|
| 47 | messages, like this:
|
|---|
| 48 |
|
|---|
| 49 | I scanned 12 directories.
|
|---|
| 50 |
|
|---|
| 51 | Your query matched 10 files in 4 directories.
|
|---|
| 52 |
|
|---|
| 53 | So how hard could that be? You look at the code that
|
|---|
| 54 | produces the first item, and it reads:
|
|---|
| 55 |
|
|---|
| 56 | printf("I scanned %g directories.",
|
|---|
| 57 | $directory_count);
|
|---|
| 58 |
|
|---|
| 59 | You think about that, and realize that it doesn't even work right for
|
|---|
| 60 | English, as it can produce this output:
|
|---|
| 61 |
|
|---|
| 62 | I scanned 1 directories.
|
|---|
| 63 |
|
|---|
| 64 | So you rewrite it to read:
|
|---|
| 65 |
|
|---|
| 66 | printf("I scanned %g %s.",
|
|---|
| 67 | $directory_count,
|
|---|
| 68 | $directory_count == 1 ?
|
|---|
| 69 | "directory" : "directories",
|
|---|
| 70 | );
|
|---|
| 71 |
|
|---|
| 72 | ...which does the Right Thing. (In case you don't recall, "%g" is for
|
|---|
| 73 | locale-specific number interpolation, and "%s" is for string
|
|---|
| 74 | interpolation.)
|
|---|
| 75 |
|
|---|
| 76 | But you still have to localize it for all the languages you're
|
|---|
| 77 | producing this software for, so you pull Locale::gettext off of CPAN
|
|---|
| 78 | so you can access the C<gettext> C functions you've heard are standard
|
|---|
| 79 | for localization tasks.
|
|---|
| 80 |
|
|---|
| 81 | And you write:
|
|---|
| 82 |
|
|---|
| 83 | printf(gettext("I scanned %g %s."),
|
|---|
| 84 | $dir_scan_count,
|
|---|
| 85 | $dir_scan_count == 1 ?
|
|---|
| 86 | gettext("directory") : gettext("directories"),
|
|---|
| 87 | );
|
|---|
| 88 |
|
|---|
| 89 | But you then read in the gettext manual (Drepper, Miller, and Pinard 1995)
|
|---|
| 90 | that this is not a good idea, since how a single word like "directory"
|
|---|
| 91 | or "directories" is translated may depend on context -- and this is
|
|---|
| 92 | true, since in a case language like German or Russian, you'd may need
|
|---|
| 93 | these words with a different case ending in the first instance (where the
|
|---|
| 94 | word is the object of a verb) than in the second instance, which you haven't even
|
|---|
| 95 | gotten to yet (where the word is the object of a preposition, "in %g
|
|---|
| 96 | directories") -- assuming these keep the same syntax when translated
|
|---|
| 97 | into those languages.
|
|---|
| 98 |
|
|---|
| 99 | So, on the advice of the gettext manual, you rewrite:
|
|---|
| 100 |
|
|---|
| 101 | printf( $dir_scan_count == 1 ?
|
|---|
| 102 | gettext("I scanned %g directory.") :
|
|---|
| 103 | gettext("I scanned %g directories."),
|
|---|
| 104 | $dir_scan_count );
|
|---|
| 105 |
|
|---|
| 106 | So, you email your various translators (the boss decides that the
|
|---|
| 107 | languages du jour are Chinese, Arabic, Russian, and Italian, so you
|
|---|
| 108 | have one translator for each), asking for translations for "I scanned
|
|---|
| 109 | %g directory." and "I scanned %g directories.". When they reply,
|
|---|
| 110 | you'll put that in the lexicons for gettext to use when it localizes
|
|---|
| 111 | your software, so that when the user is running under the "zh"
|
|---|
| 112 | (Chinese) locale, gettext("I scanned %g directory.") will return the
|
|---|
| 113 | appropriate Chinese text, with a "%g" in there where printf can then
|
|---|
| 114 | interpolate $dir_scan.
|
|---|
| 115 |
|
|---|
| 116 | Your Chinese translator emails right back -- he says both of these
|
|---|
| 117 | phrases translate to the same thing in Chinese, because, in linguistic
|
|---|
| 118 | jargon, Chinese "doesn't have number as a grammatical category" --
|
|---|
| 119 | whereas English does. That is, English has grammatical rules that
|
|---|
| 120 | refer to "number", i.e., whether something is grammatically singular
|
|---|
| 121 | or plural; and one of these rules is the one that forces nouns to take
|
|---|
| 122 | a plural suffix (generally "s") when in a plural context, as they are when
|
|---|
| 123 | they follow a number other than "one" (including, oddly enough, "zero").
|
|---|
| 124 | Chinese has no such rules, and so has just the one phrase where English
|
|---|
| 125 | has two. But, no problem, you can have this one Chinese phrase appear
|
|---|
| 126 | as the translation for the two English phrases in the "zh" gettext
|
|---|
| 127 | lexicon for your program.
|
|---|
| 128 |
|
|---|
| 129 | Emboldened by this, you dive into the second phrase that your software
|
|---|
| 130 | needs to output: "Your query matched 10 files in 4 directories.". You notice
|
|---|
| 131 | that if you want to treat phrases as indivisible, as the gettext
|
|---|
| 132 | manual wisely advises, you need four cases now, instead of two, to
|
|---|
| 133 | cover the permutations of singular and plural on the two items,
|
|---|
| 134 | $dir_count and $file_count. So you try this:
|
|---|
| 135 |
|
|---|
| 136 | printf( $file_count == 1 ?
|
|---|
| 137 | ( $directory_count == 1 ?
|
|---|
| 138 | gettext("Your query matched %g file in %g directory.") :
|
|---|
| 139 | gettext("Your query matched %g file in %g directories.") ) :
|
|---|
| 140 | ( $directory_count == 1 ?
|
|---|
| 141 | gettext("Your query matched %g files in %g directory.") :
|
|---|
| 142 | gettext("Your query matched %g files in %g directories.") ),
|
|---|
| 143 | $file_count, $directory_count,
|
|---|
| 144 | );
|
|---|
| 145 |
|
|---|
| 146 | (The case of "1 file in 2 [or more] directories" could, I suppose,
|
|---|
| 147 | occur in the case of symlinking or something of the sort.)
|
|---|
| 148 |
|
|---|
| 149 | It occurs to you that this is not the prettiest code you've ever
|
|---|
| 150 | written, but this seems the way to go. You mail off to the
|
|---|
| 151 | translators asking for translations for these four cases. The
|
|---|
| 152 | Chinese guy replies with the one phrase that these all translate to in
|
|---|
| 153 | Chinese, and that phrase has two "%g"s in it, as it should -- but
|
|---|
| 154 | there's a problem. He translates it word-for-word back: "In %g
|
|---|
| 155 | directories contains %g files match your query." The %g
|
|---|
| 156 | slots are in an order reverse to what they are in English. You wonder
|
|---|
| 157 | how you'll get gettext to handle that.
|
|---|
| 158 |
|
|---|
| 159 | But you put it aside for the moment, and optimistically hope that the
|
|---|
| 160 | other translators won't have this problem, and that their languages
|
|---|
| 161 | will be better behaved -- i.e., that they will be just like English.
|
|---|
| 162 |
|
|---|
| 163 | But the Arabic translator is the next to write back. First off, your
|
|---|
| 164 | code for "I scanned %g directory." or "I scanned %g directories."
|
|---|
| 165 | assumes there's only singular or plural. But, to use linguistic
|
|---|
| 166 | jargon again, Arabic has grammatical number, like English (but unlike
|
|---|
| 167 | Chinese), but it's a three-term category: singular, dual, and plural.
|
|---|
| 168 | In other words, the way you say "directory" depends on whether there's
|
|---|
| 169 | one directory, or I<two> of them, or I<more than two> of them. Your
|
|---|
| 170 | test of C<($directory == 1)> no longer does the job. And it means
|
|---|
| 171 | that where English's grammatical category of number necessitates
|
|---|
| 172 | only the two permutations of the first sentence based on "directory
|
|---|
| 173 | [singular]" and "directories [plural]", Arabic has three -- and,
|
|---|
| 174 | worse, in the second sentence ("Your query matched %g file in %g
|
|---|
| 175 | directory."), where English has four, Arabic has nine. You sense
|
|---|
| 176 | an unwelcome, exponential trend taking shape.
|
|---|
| 177 |
|
|---|
| 178 | Your Italian translator emails you back and says that "I searched 0
|
|---|
| 179 | directories" (a possible English output of your program) is stilted,
|
|---|
| 180 | and if you think that's fine English, that's your problem, but that
|
|---|
| 181 | I<just will not do> in the language of Dante. He insists that where
|
|---|
| 182 | $directory_count is 0, your program should produce the Italian text
|
|---|
| 183 | for "I I<didn't> scan I<any> directories.". And ditto for "I didn't
|
|---|
| 184 | match any files in any directories", although he says the last part
|
|---|
| 185 | about "in any directories" should probably just be left off.
|
|---|
| 186 |
|
|---|
| 187 | You wonder how you'll get gettext to handle this; to accomodate the
|
|---|
| 188 | ways Arabic, Chinese, and Italian deal with numbers in just these few
|
|---|
| 189 | very simple phrases, you need to write code that will ask gettext for
|
|---|
| 190 | different queries depending on whether the numerical values in
|
|---|
| 191 | question are 1, 2, more than 2, or in some cases 0, and you still haven't
|
|---|
| 192 | figured out the problem with the different word order in Chinese.
|
|---|
| 193 |
|
|---|
| 194 | Then your Russian translator calls on the phone, to I<personally> tell
|
|---|
| 195 | you the bad news about how really unpleasant your life is about to
|
|---|
| 196 | become:
|
|---|
| 197 |
|
|---|
| 198 | Russian, like German or Latin, is an inflectional language; that is, nouns
|
|---|
| 199 | and adjectives have to take endings that depend on their case
|
|---|
| 200 | (i.e., nominative, accusative, genitive, etc...) -- which is roughly a matter of
|
|---|
| 201 | what role they have in syntax of the sentence --
|
|---|
| 202 | as well as on the grammatical gender (i.e., masculine, feminine, neuter)
|
|---|
| 203 | and number (i.e., singular or plural) of the noun, as well as on the
|
|---|
| 204 | declension class of the noun. But unlike with most other inflected languages,
|
|---|
| 205 | putting a number-phrase (like "ten" or "forty-three", or their Arabic
|
|---|
| 206 | numeral equivalents) in front of noun in Russian can change the case and
|
|---|
| 207 | number that noun is, and therefore the endings you have to put on it.
|
|---|
| 208 |
|
|---|
| 209 | He elaborates: In "I scanned %g directories", you'd I<expect>
|
|---|
| 210 | "directories" to be in the accusative case (since it is the direct
|
|---|
| 211 | object in the sentnce) and the plural number,
|
|---|
| 212 | except where $directory_count is 1, then you'd expect the singular, of
|
|---|
| 213 | course. Just like Latin or German. I<But!> Where $directory_count %
|
|---|
| 214 | 10 is 1 ("%" for modulo, remember), assuming $directory count is an
|
|---|
| 215 | integer, and except where $directory_count % 100 is 11, "directories"
|
|---|
| 216 | is forced to become grammatically singular, which means it gets the
|
|---|
| 217 | ending for the accusative singular... You begin to visualize the code
|
|---|
| 218 | it'd take to test for the problem so far, I<and still work for Chinese
|
|---|
| 219 | and Arabic and Italian>, and how many gettext items that'd take, but
|
|---|
| 220 | he keeps going... But where $directory_count % 10 is 2, 3, or 4
|
|---|
| 221 | (except where $directory_count % 100 is 12, 13, or 14), the word for
|
|---|
| 222 | "directories" is forced to be genitive singular -- which means another
|
|---|
| 223 | ending... The room begins to spin around you, slowly at first... But
|
|---|
| 224 | with I<all other> integer values, since "directory" is an inanimate
|
|---|
| 225 | noun, when preceded by a number and in the nominative or accusative
|
|---|
| 226 | cases (as it is here, just your luck!), it does stay plural, but it is
|
|---|
| 227 | forced into the genitive case -- yet another ending... And
|
|---|
| 228 | you never hear him get to the part about how you're going to run into
|
|---|
| 229 | similar (but maybe subtly different) problems with other Slavic
|
|---|
| 230 | languages like Polish, because the floor comes up to meet you, and you
|
|---|
| 231 | fade into unconsciousness.
|
|---|
| 232 |
|
|---|
| 233 |
|
|---|
| 234 | The above cautionary tale relates how an attempt at localization can
|
|---|
| 235 | lead from programmer consternation, to program obfuscation, to a need
|
|---|
| 236 | for sedation. But careful evaluation shows that your choice of tools
|
|---|
| 237 | merely needed further consideration.
|
|---|
| 238 |
|
|---|
| 239 | =head2 The Linguistic View
|
|---|
| 240 |
|
|---|
| 241 | =over
|
|---|
| 242 |
|
|---|
| 243 | "It is more complicated than you think."
|
|---|
| 244 |
|
|---|
| 245 | -- The Eighth Networking Truth, from RFC 1925
|
|---|
| 246 |
|
|---|
| 247 | =back
|
|---|
| 248 |
|
|---|
| 249 | The field of Linguistics has expended a great deal of effort over the
|
|---|
| 250 | past century trying to find grammatical patterns which hold across
|
|---|
| 251 | languages; it's been a constant process
|
|---|
| 252 | of people making generalizations that should apply to all languages,
|
|---|
| 253 | only to find out that, all too often, these generalizations fail --
|
|---|
| 254 | sometimes failing for just a few languages, sometimes whole classes of
|
|---|
| 255 | languages, and sometimes nearly every language in the world except
|
|---|
| 256 | English. Broad statistical trends are evident in what the "average
|
|---|
| 257 | language" is like as far as what its rules can look like, must look
|
|---|
| 258 | like, and cannot look like. But the "average language" is just as
|
|---|
| 259 | unreal a concept as the "average person" -- it runs up against the
|
|---|
| 260 | fact no language (or person) is, in fact, average. The wisdom of past
|
|---|
| 261 | experience leads us to believe that any given language can do whatever
|
|---|
| 262 | it wants, in any order, with appeal to any kind of grammatical
|
|---|
| 263 | categories wants -- case, number, tense, real or metaphoric
|
|---|
| 264 | characteristics of the things that words refer to, arbitrary or
|
|---|
| 265 | predictable classifications of words based on what endings or prefixes
|
|---|
| 266 | they can take, degree or means of certainty about the truth of
|
|---|
| 267 | statements expressed, and so on, ad infinitum.
|
|---|
| 268 |
|
|---|
| 269 | Mercifully, most localization tasks are a matter of finding ways to
|
|---|
| 270 | translate whole phrases, generally sentences, where the context is
|
|---|
| 271 | relatively set, and where the only variation in content is I<usually>
|
|---|
| 272 | in a number being expressed -- as in the example sentences above.
|
|---|
| 273 | Translating specific, fully-formed sentences is, in practice, fairly
|
|---|
| 274 | foolproof -- which is good, because that's what's in the phrasebooks
|
|---|
| 275 | that so many tourists rely on. Now, a given phrase (whether in a
|
|---|
| 276 | phrasebook or in a gettext lexicon) in one language I<might> have a
|
|---|
| 277 | greater or lesser applicability than that phrase's translation into
|
|---|
| 278 | another language -- for example, strictly speaking, in Arabic, the
|
|---|
| 279 | "your" in "Your query matched..." would take a different form
|
|---|
| 280 | depending on whether the user is male or female; so the Arabic
|
|---|
| 281 | translation "your[feminine] query" is applicable in fewer cases than
|
|---|
| 282 | the corresponding English phrase, which doesn't distinguish the user's
|
|---|
| 283 | gender. (In practice, it's not feasable to have a program know the
|
|---|
| 284 | user's gender, so the masculine "you" in Arabic is usually used, by
|
|---|
| 285 | default.)
|
|---|
| 286 |
|
|---|
| 287 | But in general, such surprises are rare when entire sentences are
|
|---|
| 288 | being translated, especially when the functional context is restricted
|
|---|
| 289 | to that of a computer interacting with a user either to convey a fact
|
|---|
| 290 | or to prompt for a piece of information. So, for purposes of
|
|---|
| 291 | localization, translation by phrase (generally by sentence) is both the
|
|---|
| 292 | simplest and the least problematic.
|
|---|
| 293 |
|
|---|
| 294 | =head2 Breaking gettext
|
|---|
| 295 |
|
|---|
| 296 | =over
|
|---|
| 297 |
|
|---|
| 298 | "It Has To Work."
|
|---|
| 299 |
|
|---|
| 300 | -- First Networking Truth, RFC 1925
|
|---|
| 301 |
|
|---|
| 302 | =back
|
|---|
| 303 |
|
|---|
| 304 | Consider that sentences in a tourist phrasebook are of two types: ones
|
|---|
| 305 | like "How do I get to the marketplace?" that don't have any blanks to
|
|---|
| 306 | fill in, and ones like "How much do these ___ cost?", where there's
|
|---|
| 307 | one or more blanks to fill in (and these are usually linked to a
|
|---|
| 308 | list of words that you can put in that blank: "fish", "potatoes",
|
|---|
| 309 | "tomatoes", etc.) The ones with no blanks are no problem, but the
|
|---|
| 310 | fill-in-the-blank ones may not be really straightforward. If it's a
|
|---|
| 311 | Swahili phrasebook, for example, the authors probably didn't bother to
|
|---|
| 312 | tell you the complicated ways that the verb "cost" changes its
|
|---|
| 313 | inflectional prefix depending on the noun you're putting in the blank.
|
|---|
| 314 | The trader in the marketplace will still understand what you're saying if
|
|---|
| 315 | you say "how much do these potatoes cost?" with the wrong
|
|---|
| 316 | inflectional prefix on "cost". After all, I<you> can't speak proper Swahili,
|
|---|
| 317 | I<you're> just a tourist. But while tourists can be stupid, computers
|
|---|
| 318 | are supposed to be smart; the computer should be able to fill in the
|
|---|
| 319 | blank, and still have the results be grammatical.
|
|---|
| 320 |
|
|---|
| 321 | In other words, a phrasebook entry takes some values as parameters
|
|---|
| 322 | (the things that you fill in the blank or blanks), and provides a value
|
|---|
| 323 | based on these parameters, where the way you get that final value from
|
|---|
| 324 | the given values can, properly speaking, involve an arbitrarily
|
|---|
| 325 | complex series of operations. (In the case of Chinese, it'd be not at
|
|---|
| 326 | all complex, at least in cases like the examples at the beginning of
|
|---|
| 327 | this article; whereas in the case of Russian it'd be a rather complex
|
|---|
| 328 | series of operations. And in some languages, the
|
|---|
| 329 | complexity could be spread around differently: while the act of
|
|---|
| 330 | putting a number-expression in front of a noun phrase might not be
|
|---|
| 331 | complex by itself, it may change how you have to, for example, inflect
|
|---|
| 332 | a verb elsewhere in the sentence. This is what in syntax is called
|
|---|
| 333 | "long-distance dependencies".)
|
|---|
| 334 |
|
|---|
| 335 | This talk of parameters and arbitrary complexity is just another way
|
|---|
| 336 | to say that an entry in a phrasebook is what in a programming language
|
|---|
| 337 | would be called a "function". Just so you don't miss it, this is the
|
|---|
| 338 | crux of this article: I<A phrase is a function; a phrasebook is a
|
|---|
| 339 | bunch of functions.>
|
|---|
| 340 |
|
|---|
| 341 | The reason that using gettext runs into walls (as in the above
|
|---|
| 342 | second-person horror story) is that you're trying to use a string (or
|
|---|
| 343 | worse, a choice among a bunch of strings) to do what you really need a
|
|---|
| 344 | function for -- which is futile. Preforming (s)printf interpolation
|
|---|
| 345 | on the strings which you get back from gettext does allow you to do I<some>
|
|---|
| 346 | common things passably well... sometimes... sort of; but, to paraphrase
|
|---|
| 347 | what some people say about C<csh> script programming, "it fools you
|
|---|
| 348 | into thinking you can use it for real things, but you can't, and you
|
|---|
| 349 | don't discover this until you've already spent too much time trying,
|
|---|
| 350 | and by then it's too late."
|
|---|
| 351 |
|
|---|
| 352 | =head2 Replacing gettext
|
|---|
| 353 |
|
|---|
| 354 | So, what needs to replace gettext is a system that supports lexicons
|
|---|
| 355 | of functions instead of lexicons of strings. An entry in a lexicon
|
|---|
| 356 | from such a system should I<not> look like this:
|
|---|
| 357 |
|
|---|
| 358 | "J'ai trouv\xE9 %g fichiers dans %g r\xE9pertoires"
|
|---|
| 359 |
|
|---|
| 360 | [\xE9 is e-acute in Latin-1. Some pod renderers would
|
|---|
| 361 | scream if I used the actual character here. -- SB]
|
|---|
| 362 |
|
|---|
| 363 | but instead like this, bearing in mind that this is just a first stab:
|
|---|
| 364 |
|
|---|
| 365 | sub I_found_X1_files_in_X2_directories {
|
|---|
| 366 | my( $files, $dirs ) = @_[0,1];
|
|---|
| 367 | $files = sprintf("%g %s", $files,
|
|---|
| 368 | $files == 1 ? 'fichier' : 'fichiers');
|
|---|
| 369 | $dirs = sprintf("%g %s", $dirs,
|
|---|
| 370 | $dirs == 1 ? "r\xE9pertoire" : "r\xE9pertoires");
|
|---|
| 371 | return "J'ai trouv\xE9 $files dans $dirs.";
|
|---|
| 372 | }
|
|---|
| 373 |
|
|---|
| 374 | Now, there's no particularly obvious way to store anything but strings
|
|---|
| 375 | in a gettext lexicon; so it looks like we just have to start over and
|
|---|
| 376 | make something better, from scratch. I call my shot at a
|
|---|
| 377 | gettext-replacement system "Maketext", or, in CPAN terms,
|
|---|
| 378 | Locale::Maketext.
|
|---|
| 379 |
|
|---|
| 380 | When designing Maketext, I chose to plan its main features in terms of
|
|---|
| 381 | "buzzword compliance". And here are the buzzwords:
|
|---|
| 382 |
|
|---|
| 383 | =head2 Buzzwords: Abstraction and Encapsulation
|
|---|
| 384 |
|
|---|
| 385 | The complexity of the language you're trying to output a phrase in is
|
|---|
| 386 | entirely abstracted inside (and encapsulated within) the Maketext module
|
|---|
| 387 | for that interface. When you call:
|
|---|
| 388 |
|
|---|
| 389 | print $lang->maketext("You have [quant,_1,piece] of new mail.",
|
|---|
| 390 | scalar(@messages));
|
|---|
| 391 |
|
|---|
| 392 | you don't know (and in fact can't easily find out) whether this will
|
|---|
| 393 | involve lots of figuring, as in Russian (if $lang is a handle to the
|
|---|
| 394 | Russian module), or relatively little, as in Chinese. That kind of
|
|---|
| 395 | abstraction and encapsulation may encourage other pleasant buzzwords
|
|---|
| 396 | like modularization and stratification, depending on what design
|
|---|
| 397 | decisions you make.
|
|---|
| 398 |
|
|---|
| 399 | =head2 Buzzword: Isomorphism
|
|---|
| 400 |
|
|---|
| 401 | "Isomorphism" means "having the same structure or form"; in discussions
|
|---|
| 402 | of program design, the word takes on the special, specific meaning that
|
|---|
| 403 | your implementation of a solution to a problem I<has the same
|
|---|
| 404 | structure> as, say, an informal verbal description of the solution, or
|
|---|
| 405 | maybe of the problem itself. Isomorphism is, all things considered,
|
|---|
| 406 | a good thing -- it's what problem-solving (and solution-implementing)
|
|---|
| 407 | should look like.
|
|---|
| 408 |
|
|---|
| 409 | What's wrong the with gettext-using code like this...
|
|---|
| 410 |
|
|---|
| 411 | printf( $file_count == 1 ?
|
|---|
| 412 | ( $directory_count == 1 ?
|
|---|
| 413 | "Your query matched %g file in %g directory." :
|
|---|
| 414 | "Your query matched %g file in %g directories." ) :
|
|---|
| 415 | ( $directory_count == 1 ?
|
|---|
| 416 | "Your query matched %g files in %g directory." :
|
|---|
| 417 | "Your query matched %g files in %g directories." ),
|
|---|
| 418 | $file_count, $directory_count,
|
|---|
| 419 | );
|
|---|
| 420 |
|
|---|
| 421 | is first off that it's not well abstracted -- these ways of testing
|
|---|
| 422 | for grammatical number (as in the expressions like C<foo == 1 ?
|
|---|
| 423 | singular_form : plural_form>) should be abstracted to each language
|
|---|
| 424 | module, since how you get grammatical number is language-specific.
|
|---|
| 425 |
|
|---|
| 426 | But second off, it's not isomorphic -- the "solution" (i.e., the
|
|---|
| 427 | phrasebook entries) for Chinese maps from these four English phrases to
|
|---|
| 428 | the one Chinese phrase that fits for all of them. In other words, the
|
|---|
| 429 | informal solution would be "The way to say what you want in Chinese is
|
|---|
| 430 | with the one phrase 'For your question, in Y directories you would
|
|---|
| 431 | find X files'" -- and so the implemented solution should be,
|
|---|
| 432 | isomorphically, just a straightforward way to spit out that one
|
|---|
| 433 | phrase, with numerals properly interpolated. It shouldn't have to map
|
|---|
| 434 | from the complexity of other languages to the simplicity of this one.
|
|---|
| 435 |
|
|---|
| 436 | =head2 Buzzword: Inheritance
|
|---|
| 437 |
|
|---|
| 438 | There's a great deal of reuse possible for sharing of phrases between
|
|---|
| 439 | modules for related dialects, or for sharing of auxiliary functions
|
|---|
| 440 | between related languages. (By "auxiliary functions", I mean
|
|---|
| 441 | functions that don't produce phrase-text, but which, say, return an
|
|---|
| 442 | answer to "does this number require a plural noun after it?". Such
|
|---|
| 443 | auxiliary functions would be used in the internal logic of functions
|
|---|
| 444 | that actually do produce phrase-text.)
|
|---|
| 445 |
|
|---|
| 446 | In the case of sharing phrases, consider that you have an interface
|
|---|
| 447 | already localized for American English (probably by having been
|
|---|
| 448 | written with that as the native locale, but that's incidental).
|
|---|
| 449 | Localizing it for UK English should, in practical terms, be just a
|
|---|
| 450 | matter of running it past a British person with the instructions to
|
|---|
| 451 | indicate what few phrases would benefit from a change in spelling or
|
|---|
| 452 | possibly minor rewording. In that case, you should be able to put in
|
|---|
| 453 | the UK English localization module I<only> those phrases that are
|
|---|
| 454 | UK-specific, and for all the rest, I<inherit> from the American
|
|---|
| 455 | English module. (And I expect this same situation would apply with
|
|---|
| 456 | Brazilian and Continental Portugese, possbily with some I<very>
|
|---|
| 457 | closely related languages like Czech and Slovak, and possibly with the
|
|---|
| 458 | slightly different "versions" of written Mandarin Chinese, as I hear exist in
|
|---|
| 459 | Taiwan and mainland China.)
|
|---|
| 460 |
|
|---|
| 461 | As to sharing of auxiliary functions, consider the problem of Russian
|
|---|
| 462 | numbers from the beginning of this article; obviously, you'd want to
|
|---|
| 463 | write only once the hairy code that, given a numeric value, would
|
|---|
| 464 | return some specification of which case and number a given quanitified
|
|---|
| 465 | noun should use. But suppose that you discover, while localizing an
|
|---|
| 466 | interface for, say, Ukranian (a Slavic language related to Russian,
|
|---|
| 467 | spoken by several million people, many of whom would be relieved to
|
|---|
| 468 | find that your Web site's or software's interface is available in
|
|---|
| 469 | their language), that the rules in Ukranian are the same as in Russian
|
|---|
| 470 | for quantification, and probably for many other grammatical functions.
|
|---|
| 471 | While there may well be no phrases in common between Russian and
|
|---|
| 472 | Ukranian, you could still choose to have the Ukranian module inherit
|
|---|
| 473 | from the Russian module, just for the sake of inheriting all the
|
|---|
| 474 | various grammatical methods. Or, probably better organizationally,
|
|---|
| 475 | you could move those functions to a module called C<_E_Slavic> or
|
|---|
| 476 | something, which Russian and Ukranian could inherit useful functions
|
|---|
| 477 | from, but which would (presumably) provide no lexicon.
|
|---|
| 478 |
|
|---|
| 479 | =head2 Buzzword: Concision
|
|---|
| 480 |
|
|---|
| 481 | Okay, concision isn't a buzzword. But it should be, so I decree that
|
|---|
| 482 | as a new buzzword, "concision" means that simple common things should
|
|---|
| 483 | be expressible in very few lines (or maybe even just a few characters)
|
|---|
| 484 | of code -- call it a special case of "making simple things easy and
|
|---|
| 485 | hard things possible", and see also the role it played in the
|
|---|
| 486 | MIDI::Simple language, discussed elsewhere in this issue [TPJ#13].
|
|---|
| 487 |
|
|---|
| 488 | Consider our first stab at an entry in our "phrasebook of functions":
|
|---|
| 489 |
|
|---|
| 490 | sub I_found_X1_files_in_X2_directories {
|
|---|
| 491 | my( $files, $dirs ) = @_[0,1];
|
|---|
| 492 | $files = sprintf("%g %s", $files,
|
|---|
| 493 | $files == 1 ? 'fichier' : 'fichiers');
|
|---|
| 494 | $dirs = sprintf("%g %s", $dirs,
|
|---|
| 495 | $dirs == 1 ? "r\xE9pertoire" : "r\xE9pertoires");
|
|---|
| 496 | return "J'ai trouv\xE9 $files dans $dirs.";
|
|---|
| 497 | }
|
|---|
| 498 |
|
|---|
| 499 | You may sense that a lexicon (to use a non-committal catch-all term for a
|
|---|
| 500 | collection of things you know how to say, regardless of whether they're
|
|---|
| 501 | phrases or words) consisting of functions I<expressed> as above would
|
|---|
| 502 | make for rather long-winded and repetitive code -- even if you wisely
|
|---|
| 503 | rewrote this to have quantification (as we call adding a number
|
|---|
| 504 | expression to a noun phrase) be a function called like:
|
|---|
| 505 |
|
|---|
| 506 | sub I_found_X1_files_in_X2_directories {
|
|---|
| 507 | my( $files, $dirs ) = @_[0,1];
|
|---|
| 508 | $files = quant($files, "fichier");
|
|---|
| 509 | $dirs = quant($dirs, "r\xE9pertoire");
|
|---|
| 510 | return "J'ai trouv\xE9 $files dans $dirs.";
|
|---|
| 511 | }
|
|---|
| 512 |
|
|---|
| 513 | And you may also sense that you do not want to bother your translators
|
|---|
| 514 | with having to write Perl code -- you'd much rather that they spend
|
|---|
| 515 | their I<very costly time> on just translation. And this is to say
|
|---|
| 516 | nothing of the near impossibility of finding a commercial translator
|
|---|
| 517 | who would know even simple Perl.
|
|---|
| 518 |
|
|---|
| 519 | In a first-hack implementation of Maketext, each language-module's
|
|---|
| 520 | lexicon looked like this:
|
|---|
| 521 |
|
|---|
| 522 | %Lexicon = (
|
|---|
| 523 | "I found %g files in %g directories"
|
|---|
| 524 | => sub {
|
|---|
| 525 | my( $files, $dirs ) = @_[0,1];
|
|---|
| 526 | $files = quant($files, "fichier");
|
|---|
| 527 | $dirs = quant($dirs, "r\xE9pertoire");
|
|---|
| 528 | return "J'ai trouv\xE9 $files dans $dirs.";
|
|---|
| 529 | },
|
|---|
| 530 | ... and so on with other phrase => sub mappings ...
|
|---|
| 531 | );
|
|---|
| 532 |
|
|---|
| 533 | but I immediately went looking for some more concise way to basically
|
|---|
| 534 | denote the same phrase-function -- a way that would also serve to
|
|---|
| 535 | concisely denote I<most> phrase-functions in the lexicon for I<most>
|
|---|
| 536 | languages. After much time and even some actual thought, I decided on
|
|---|
| 537 | this system:
|
|---|
| 538 |
|
|---|
| 539 | * Where a value in a %Lexicon hash is a contentful string instead of
|
|---|
| 540 | an anonymous sub (or, conceivably, a coderef), it would be interpreted
|
|---|
| 541 | as a sort of shorthand expression of what the sub does. When accessed
|
|---|
| 542 | for the first time in a session, it is parsed, turned into Perl code,
|
|---|
| 543 | and then eval'd into an anonymous sub; then that sub replaces the
|
|---|
| 544 | original string in that lexicon. (That way, the work of parsing and
|
|---|
| 545 | evaling the shorthand form for a given phrase is done no more than
|
|---|
| 546 | once per session.)
|
|---|
| 547 |
|
|---|
| 548 | * Calls to C<maketext> (as Maketext's main function is called) happen
|
|---|
| 549 | thru a "language session handle", notionally very much like an IO
|
|---|
| 550 | handle, in that you open one at the start of the session, and use it
|
|---|
| 551 | for "sending signals" to an object in order to have it return the text
|
|---|
| 552 | you want.
|
|---|
| 553 |
|
|---|
| 554 | So, this:
|
|---|
| 555 |
|
|---|
| 556 | $lang->maketext("You have [quant,_1,piece] of new mail.",
|
|---|
| 557 | scalar(@messages));
|
|---|
| 558 |
|
|---|
| 559 | basically means this: look in the lexicon for $lang (which may inherit
|
|---|
| 560 | from any number of other lexicons), and find the function that we
|
|---|
| 561 | happen to associate with the string "You have [quant,_1,piece] of new
|
|---|
| 562 | mail" (which is, and should be, a functioning "shorthand" for this
|
|---|
| 563 | function in the native locale -- English in this case). If you find
|
|---|
| 564 | such a function, call it with $lang as its first parameter (as if it
|
|---|
| 565 | were a method), and then a copy of scalar(@messages) as its second,
|
|---|
| 566 | and then return that value. If that function was found, but was in
|
|---|
| 567 | string shorthand instead of being a fully specified function, parse it
|
|---|
| 568 | and make it into a function before calling it the first time.
|
|---|
| 569 |
|
|---|
| 570 | * The shorthand uses code in brackets to indicate method calls that
|
|---|
| 571 | should be performed. A full explanation is not in order here, but a
|
|---|
| 572 | few examples will suffice:
|
|---|
| 573 |
|
|---|
| 574 | "You have [quant,_1,piece] of new mail."
|
|---|
| 575 |
|
|---|
| 576 | The above code is shorthand for, and will be interpreted as,
|
|---|
| 577 | this:
|
|---|
| 578 |
|
|---|
| 579 | sub {
|
|---|
| 580 | my $handle = $_[0];
|
|---|
| 581 | my(@params) = @_;
|
|---|
| 582 | return join '',
|
|---|
| 583 | "You have ",
|
|---|
| 584 | $handle->quant($params[1], 'piece'),
|
|---|
| 585 | "of new mail.";
|
|---|
| 586 | }
|
|---|
| 587 |
|
|---|
| 588 | where "quant" is the name of a method you're using to quantify the
|
|---|
| 589 | noun "piece" with the number $params[0].
|
|---|
| 590 |
|
|---|
| 591 | A string with no brackety calls, like this:
|
|---|
| 592 |
|
|---|
| 593 | "Your search expression was malformed."
|
|---|
| 594 |
|
|---|
| 595 | is somewhat of a degerate case, and just gets turned into:
|
|---|
| 596 |
|
|---|
| 597 | sub { return "Your search expression was malformed." }
|
|---|
| 598 |
|
|---|
| 599 | However, not everything you can write in Perl code can be written in
|
|---|
| 600 | the above shorthand system -- not by a long shot. For example, consider
|
|---|
| 601 | the Italian translator from the beginning of this article, who wanted
|
|---|
| 602 | the Italian for "I didn't find any files" as a special case, instead
|
|---|
| 603 | of "I found 0 files". That couldn't be specified (at least not easily
|
|---|
| 604 | or simply) in our shorthand system, and it would have to be written
|
|---|
| 605 | out in full, like this:
|
|---|
| 606 |
|
|---|
| 607 | sub { # pretend the English strings are in Italian
|
|---|
| 608 | my($handle, $files, $dirs) = @_[0,1,2];
|
|---|
| 609 | return "I didn't find any files" unless $files;
|
|---|
| 610 | return join '',
|
|---|
| 611 | "I found ",
|
|---|
| 612 | $handle->quant($files, 'file'),
|
|---|
| 613 | " in ",
|
|---|
| 614 | $handle->quant($dirs, 'directory'),
|
|---|
| 615 | ".";
|
|---|
| 616 | }
|
|---|
| 617 |
|
|---|
| 618 | Next to a lexicon full of shorthand code, that sort of sticks out like a
|
|---|
| 619 | sore thumb -- but this I<is> a special case, after all; and at least
|
|---|
| 620 | it's possible, if not as concise as usual.
|
|---|
| 621 |
|
|---|
| 622 | As to how you'd implement the Russian example from the beginning of
|
|---|
| 623 | the article, well, There's More Than One Way To Do It, but it could be
|
|---|
| 624 | something like this (using English words for Russian, just so you know
|
|---|
| 625 | what's going on):
|
|---|
| 626 |
|
|---|
| 627 | "I [quant,_1,directory,accusative] scanned."
|
|---|
| 628 |
|
|---|
| 629 | This shifts the burden of complexity off to the quant method. That
|
|---|
| 630 | method's parameters are: the numeric value it's going to use to
|
|---|
| 631 | quantify something; the Russian word it's going to quantify; and the
|
|---|
| 632 | parameter "accusative", which you're using to mean that this
|
|---|
| 633 | sentence's syntax wants a noun in the accusative case there, although
|
|---|
| 634 | that quantification method may have to overrule, for grammatical
|
|---|
| 635 | reasons you may recall from the beginning of this article.
|
|---|
| 636 |
|
|---|
| 637 | Now, the Russian quant method here is responsible not only for
|
|---|
| 638 | implementing the strange logic necessary for figuring out how Russian
|
|---|
| 639 | number-phrases impose case and number on their noun-phrases, but also
|
|---|
| 640 | for inflecting the Russian word for "directory". How that inflection
|
|---|
| 641 | is to be carried out is no small issue, and among the solutions I've
|
|---|
| 642 | seen, some (like variations on a simple lookup in a hash where all
|
|---|
| 643 | possible forms are provided for all necessary words) are
|
|---|
| 644 | straightforward but I<can> become cumbersome when you need to inflect
|
|---|
| 645 | more than a few dozen words; and other solutions (like using
|
|---|
| 646 | algorithms to model the inflections, storing only root forms and
|
|---|
| 647 | irregularities) I<can> involve more overhead than is justifiable for
|
|---|
| 648 | all but the largest lexicons.
|
|---|
| 649 |
|
|---|
| 650 | Mercifully, this design decision becomes crucial only in the hairiest
|
|---|
| 651 | of inflected languages, of which Russian is by no means the I<worst> case
|
|---|
| 652 | scenario, but is worse than most. Most languages have simpler
|
|---|
| 653 | inflection systems; for example, in English or Swahili, there are
|
|---|
| 654 | generally no more than two possible inflected forms for a given noun
|
|---|
| 655 | ("error/errors"; "kosa/makosa"), and the
|
|---|
| 656 | rules for producing these forms are fairly simple -- or at least,
|
|---|
| 657 | simple rules can be formulated that work for most words, and you can
|
|---|
| 658 | then treat the exceptions as just "irregular", at least relative to
|
|---|
| 659 | your ad hoc rules. A simpler inflection system (simpler rules, fewer
|
|---|
| 660 | forms) means that design decisions are less crucial to maintaining
|
|---|
| 661 | sanity, whereas the same decisions could incur
|
|---|
| 662 | overhead-versus-scalability problems in languages like Russian. It
|
|---|
| 663 | may I<also> be likely that code (possibly in Perl, as with
|
|---|
| 664 | Lingua::EN::Inflect, for English nouns) has already
|
|---|
| 665 | been written for the language in question, whether simple or complex.
|
|---|
| 666 |
|
|---|
| 667 | Moreover, a third possibility may even be simpler than anything
|
|---|
| 668 | discussed above: "Just require that all possible (or at least
|
|---|
| 669 | applicable) forms be provided in the call to the given language's quant
|
|---|
| 670 | method, as in:"
|
|---|
| 671 |
|
|---|
| 672 | "I found [quant,_1,file,files]."
|
|---|
| 673 |
|
|---|
| 674 | That way, quant just has to chose which form it needs, without having
|
|---|
| 675 | to look up or generate anything. While possibly not optimal for
|
|---|
| 676 | Russian, this should work well for most other languages, where
|
|---|
| 677 | quantification is not as complicated an operation.
|
|---|
| 678 |
|
|---|
| 679 | =head2 The Devil in the Details
|
|---|
| 680 |
|
|---|
| 681 | There's plenty more to Maketext than described above -- for example,
|
|---|
| 682 | there's the details of how language tags ("en-US", "i-pwn", "fi",
|
|---|
| 683 | etc.) or locale IDs ("en_US") interact with actual module naming
|
|---|
| 684 | ("BogoQuery/Locale/en_us.pm"), and what magic can ensue; there's the
|
|---|
| 685 | details of how to record (and possibly negotiate) what character
|
|---|
| 686 | encoding Maketext will return text in (UTF8? Latin-1? KOI8?). There's
|
|---|
| 687 | the interesting fact that Maketext is for localization, but nowhere
|
|---|
| 688 | actually has a "C<use locale;>" anywhere in it. For the curious,
|
|---|
| 689 | there's the somewhat frightening details of how I actually
|
|---|
| 690 | implement something like data inheritance so that searches across
|
|---|
| 691 | modules' %Lexicon hashes can parallel how Perl implements method
|
|---|
| 692 | inheritance.
|
|---|
| 693 |
|
|---|
| 694 | And, most importantly, there's all the practical details of how to
|
|---|
| 695 | actually go about deriving from Maketext so you can use it for your
|
|---|
| 696 | interfaces, and the various tools and conventions for starting out and
|
|---|
| 697 | maintaining individual language modules.
|
|---|
| 698 |
|
|---|
| 699 | That is all covered in the documentation for Locale::Maketext and the
|
|---|
| 700 | modules that come with it, available in CPAN. After having read this
|
|---|
| 701 | article, which covers the why's of Maketext, the documentation,
|
|---|
| 702 | which covers the how's of it, should be quite straightfoward.
|
|---|
| 703 |
|
|---|
| 704 | =head2 The Proof in the Pudding: Localizing Web Sites
|
|---|
| 705 |
|
|---|
| 706 | Maketext and gettext have a notable difference: gettext is in C,
|
|---|
| 707 | accessible thru C library calls, whereas Maketext is in Perl, and
|
|---|
| 708 | really can't work without a Perl interpreter (although I suppose
|
|---|
| 709 | something like it could be written for C). Accidents of history (and
|
|---|
| 710 | not necessarily lucky ones) have made C++ the most common language for
|
|---|
| 711 | the implementation of applications like word processors, Web browsers,
|
|---|
| 712 | and even many in-house applications like custom query systems. Current
|
|---|
| 713 | conditions make it somewhat unlikely that the next one of any of these
|
|---|
| 714 | kinds of applications will be written in Perl, albeit clearly more for
|
|---|
| 715 | reasons of custom and inertia than out of consideration of what is the
|
|---|
| 716 | right tool for the job.
|
|---|
| 717 |
|
|---|
| 718 | However, other accidents of history have made Perl a well-accepted
|
|---|
| 719 | language for design of server-side programs (generally in CGI form)
|
|---|
| 720 | for Web site interfaces. Localization of static pages in Web sites is
|
|---|
| 721 | trivial, feasable either with simple language-negotiation features in
|
|---|
| 722 | servers like Apache, or with some kind of server-side inclusions of
|
|---|
| 723 | language-appropriate text into layout templates. However, I think
|
|---|
| 724 | that the localization of Perl-based search systems (or other kinds of
|
|---|
| 725 | dynamic content) in Web sites, be they public or access-restricted,
|
|---|
| 726 | is where Maketext will see the greatest use.
|
|---|
| 727 |
|
|---|
| 728 | I presume that it would be only the exceptional Web site that gets
|
|---|
| 729 | localized for English I<and> Chinese I<and> Italian I<and> Arabic
|
|---|
| 730 | I<and> Russian, to recall the languages from the beginning of this
|
|---|
| 731 | article -- to say nothing of German, Spanish, French, Japanese,
|
|---|
| 732 | Finnish, and Hindi, to name a few languages that benefit from large
|
|---|
| 733 | numbers of programmers or Web viewers or both.
|
|---|
| 734 |
|
|---|
| 735 | However, the ever-increasing internationalization of the Web (whether
|
|---|
| 736 | measured in terms of amount of content, of numbers of content writers
|
|---|
| 737 | or programmers, or of size of content audiences) makes it increasingly
|
|---|
| 738 | likely that the interface to the average Web-based dynamic content
|
|---|
| 739 | service will be localized for two or maybe three languages. It is my
|
|---|
| 740 | hope that Maketext will make that task as simple as possible, and will
|
|---|
| 741 | remove previous barriers to localization for languages dissimilar to
|
|---|
| 742 | English.
|
|---|
| 743 |
|
|---|
| 744 | __END__
|
|---|
| 745 |
|
|---|
| 746 | Sean M. Burke (sburkeE<64>cpan.org) has a Master's in linguistics
|
|---|
| 747 | from Northwestern University; he specializes in language technology.
|
|---|
| 748 | Jordan Lachler (lachlerE<64>unm.edu) is a PhD student in the Department of
|
|---|
| 749 | Linguistics at the University of New Mexico; he specializes in
|
|---|
| 750 | morphology and pedagogy of North American native languages.
|
|---|
| 751 |
|
|---|
| 752 | =head2 References
|
|---|
| 753 |
|
|---|
| 754 | Alvestrand, Harald Tveit. 1995. I<RFC 1766: Tags for the
|
|---|
| 755 | Identification of Languages.>
|
|---|
| 756 | C<ftp://ftp.isi.edu/in-notes/rfc1766.txt>
|
|---|
| 757 | [Now see RFC 3066.]
|
|---|
| 758 |
|
|---|
| 759 | Callon, Ross, editor. 1996. I<RFC 1925: The Twelve
|
|---|
| 760 | Networking Truths.>
|
|---|
| 761 | C<ftp://ftp.isi.edu/in-notes/rfc1925.txt>
|
|---|
| 762 |
|
|---|
| 763 | Drepper, Ulrich, Peter Miller,
|
|---|
| 764 | and FranE<ccedil>ois Pinard. 1995-2001. GNU
|
|---|
| 765 | C<gettext>. Available in C<ftp://prep.ai.mit.edu/pub/gnu/>, with
|
|---|
| 766 | extensive docs in the distribution tarball. [Since
|
|---|
| 767 | I wrote this article in 1998, I now see that the
|
|---|
| 768 | gettext docs are now trying more to come to terms with
|
|---|
| 769 | plurality. Whether useful conclusions have come from it
|
|---|
| 770 | is another question altogether. -- SMB, May 2001]
|
|---|
| 771 |
|
|---|
| 772 | Forbes, Nevill. 1964. I<Russian Grammar.> Third Edition, revised
|
|---|
| 773 | by J. C. Dumbreck. Oxford University Press.
|
|---|
| 774 |
|
|---|
| 775 | =cut
|
|---|
| 776 |
|
|---|
| 777 | #End
|
|---|
| 778 |
|
|---|