htmlpurifier/docs/proposal-language.txt

We are going to model our I18N/L10N off of MediaWiki's system.  Their's is
obviously quite complicated, so we're going to simplify it a bit for our needs.

== Structure ==

First, you have a Language object.  This object contains all the localisable
message strings, as well as other important language-specific settings and
custom behavior (uppercasing, lowercasing, printing dates, formatting
numbers, etc.)

The object is constructed from two sources: subclassed versions of itself
(classes) and Message files (messages).

== General use ==

You load a language object by calling the Language::factory() function.
This function the class file for the object (taking in account fallback
languages by using the fallback langauge's object but overloading the
language key) and returns that object. Nothing else happens.

When a message/etc is requested, a lazy load initializor is called.  Now the
real work starts.  We're first going to take the scenario that the language
is not cached.  The system loads the Messages file by:

    require( $filename );
    $cache = compact( self::$mLocalisationKeys );

...where self::$mLocalisationKeys is the name of variables that could be used
in the localization file. This lets you use things like:

    $fallback = false;
    $rtl = false;

...and easily siphon them into arrays.

Then, we load the $fallback language (if not set, English) to fill in the gaps in
the messages.  There is specialized behavior for certain keys, as they can be
mergeable maps, lists or alias lists (not sure what the last one is).

== Caching ==

MediaWiki has lots of caching mechanisms built in, which make the code somewhat
more difficult to understand.  Before doing any loading, MediaWiki will check
the following places to see if we can be lazy:

1. $mLocalisationCache[$code] -  just a variable where it may have been stashed
2. serialized/$code.ser -  compiled serialized language file
3. Memcached version of file (with expiration checking)

Expiration checking consists of by ensuring all dependencies have filemtime
that match the ones bundled with the cached copy. Similar checking could be
implemented for serialized versions, as it seems that they are not updated
until manually recompiled.

== Behavior ==

Things that are localizable:

-  Weekdays (and abbrev)
-  Months (and abbrev)
-  Bookstores
-  Skin names
-  Date preferences / Custom date format
-  Default date format
-  Default user option overrides
-+ Language names
-  Timezones
-+ Character encoding conversion via iconv
-  UpperLowerCase first (needs casemaps for some)
-  UpperLowerCase
-  Uppercase words
-  Uppercase word breaks
-  Case folding
-  Strip punctuation for MySQL search
-  Get first character
-+ Alternate encoding
-+ Recoding for edit (and then recode input)
-+ RTL
-+ Direction mark character depending on RTL
-? Arrow depending on RTL
-  Languages where italics cannot be used
-+ Number formatting (commafy, transform digits, transform separators)
-  Truncate (multibyte)
-  Grammar conversions for inflected languages
-  Plural transformations
-  Formatting expiry times
-  Segmenting for diffs (Chinese)
-  Convert to variants of language
-  Language specific user preference options
-  Link trails [[foo]]bar
-+ Language code (RFC 3066)

Neat functionality:

-  I18N sprintfDate
-  Roman numeral formatting

Items marked with a + likely need to be addressed by HTML Purifier