diff --git a/docs/language.txt b/docs/language.txt new file mode 100644 index 00000000..3bfa9943 --- /dev/null +++ b/docs/language.txt @@ -0,0 +1,98 @@ +We are going to model our I18N/L10N off of MediaWiki's system. Their's is +obviously quite complicated, so we're going to simplify it a bit for our needs. + +== Structure == + +First, you have a Language object. This object contains all the localisable +message strings, as well as other important language-specific settings and +custom behavior (uppercasing, lowercasing, printing dates, formatting +numbers, etc.) + +The object is constructed from two sources: subclassed versions of itself +(classes) and Message files (messages). + +== General use == + +You load a language object by calling the Language::factory() function. +This function the class file for the object (taking in account fallback +languages by using the fallback langauge's object but overloading the +language key) and returns that object. Nothing else happens. + +When a message/etc is requested, a lazy load initializor is called. Now the +real work starts. We're first going to take the scenario that the language +is not cached. The system loads the Messages file by: + + require( $filename ); + $cache = compact( self::$mLocalisationKeys ); + +...where self::$mLocalisationKeys is the name of variables that could be used +in the localization file. This lets you use things like: + + $fallback = false; + $rtl = false; + +...and easily siphon them into arrays. + +Then, we load the $fallback language (if not set, English) to fill in the gaps in +the messages. There is specialized behavior for certain keys, as they can be +mergeable maps, lists or alias lists (not sure what the last one is). + +== Caching == + +MediaWiki has lots of caching mechanisms built in, which make the code somewhat +more difficult to understand. Before doing any loading, MediaWiki will check +the following places to see if we can be lazy: + +1. $mLocalisationCache[$code] - just a variable where it may have been stashed +2. serialized/$code.ser - compiled serialized language file +3. Memcached version of file (with expiration checking) + +Expiration checking consists of by ensuring all dependencies have filemtime +that match the ones bundled with the cached copy. Similar checking could be +implemented for serialized versions, as it seems that they are not updated +until manually recompiled. + +== Behavior == + +Things that are localizable: + +- Weekdays (and abbrev) +- Months (and abbrev) +- Bookstores +- Skin names +- Date preferences / Custom date format +- Default date format +- Default user option overrides +-+ Language names +- Timezones +-+ Character encoding conversion via iconv +- UpperLowerCase first (needs casemaps for some) +- UpperLowerCase +- Uppercase words +- Uppercase word breaks +- Case folding +- Strip punctuation for MySQL search +- Get first character +-+ Alternate encoding +-+ Recoding for edit (and then recode input) +-+ RTL +-+ Direction mark character depending on RTL +-? Arrow depending on RTL +- Languages where italics cannot be used +-+ Number formatting (commafy, transform digits, transform separators) +- Truncate (multibyte) +- Grammar conversions for inflected languages +- Plural transformations +- Formatting expiry times +- Segmenting for diffs (Chinese) +- Convert to variants of language +- Language specific user preference options +- Link trails [[foo]]bar +-+ Language code (RFC 3066) + +Neat functionality: + +- I18N sprintfDate +- Roman numeral formatting + +Items marked with a + likely need to be addressed by HTML Purifier