mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2024-12-23 00:41:52 +00:00
99 lines
3.3 KiB
Plaintext
99 lines
3.3 KiB
Plaintext
|
We are going to model our I18N/L10N off of MediaWiki's system. Their's is
|
||
|
obviously quite complicated, so we're going to simplify it a bit for our needs.
|
||
|
|
||
|
== Structure ==
|
||
|
|
||
|
First, you have a Language object. This object contains all the localisable
|
||
|
message strings, as well as other important language-specific settings and
|
||
|
custom behavior (uppercasing, lowercasing, printing dates, formatting
|
||
|
numbers, etc.)
|
||
|
|
||
|
The object is constructed from two sources: subclassed versions of itself
|
||
|
(classes) and Message files (messages).
|
||
|
|
||
|
== General use ==
|
||
|
|
||
|
You load a language object by calling the Language::factory() function.
|
||
|
This function the class file for the object (taking in account fallback
|
||
|
languages by using the fallback langauge's object but overloading the
|
||
|
language key) and returns that object. Nothing else happens.
|
||
|
|
||
|
When a message/etc is requested, a lazy load initializor is called. Now the
|
||
|
real work starts. We're first going to take the scenario that the language
|
||
|
is not cached. The system loads the Messages file by:
|
||
|
|
||
|
require( $filename );
|
||
|
$cache = compact( self::$mLocalisationKeys );
|
||
|
|
||
|
...where self::$mLocalisationKeys is the name of variables that could be used
|
||
|
in the localization file. This lets you use things like:
|
||
|
|
||
|
$fallback = false;
|
||
|
$rtl = false;
|
||
|
|
||
|
...and easily siphon them into arrays.
|
||
|
|
||
|
Then, we load the $fallback language (if not set, English) to fill in the gaps in
|
||
|
the messages. There is specialized behavior for certain keys, as they can be
|
||
|
mergeable maps, lists or alias lists (not sure what the last one is).
|
||
|
|
||
|
== Caching ==
|
||
|
|
||
|
MediaWiki has lots of caching mechanisms built in, which make the code somewhat
|
||
|
more difficult to understand. Before doing any loading, MediaWiki will check
|
||
|
the following places to see if we can be lazy:
|
||
|
|
||
|
1. $mLocalisationCache[$code] - just a variable where it may have been stashed
|
||
|
2. serialized/$code.ser - compiled serialized language file
|
||
|
3. Memcached version of file (with expiration checking)
|
||
|
|
||
|
Expiration checking consists of by ensuring all dependencies have filemtime
|
||
|
that match the ones bundled with the cached copy. Similar checking could be
|
||
|
implemented for serialized versions, as it seems that they are not updated
|
||
|
until manually recompiled.
|
||
|
|
||
|
== Behavior ==
|
||
|
|
||
|
Things that are localizable:
|
||
|
|
||
|
- Weekdays (and abbrev)
|
||
|
- Months (and abbrev)
|
||
|
- Bookstores
|
||
|
- Skin names
|
||
|
- Date preferences / Custom date format
|
||
|
- Default date format
|
||
|
- Default user option overrides
|
||
|
-+ Language names
|
||
|
- Timezones
|
||
|
-+ Character encoding conversion via iconv
|
||
|
- UpperLowerCase first (needs casemaps for some)
|
||
|
- UpperLowerCase
|
||
|
- Uppercase words
|
||
|
- Uppercase word breaks
|
||
|
- Case folding
|
||
|
- Strip punctuation for MySQL search
|
||
|
- Get first character
|
||
|
-+ Alternate encoding
|
||
|
-+ Recoding for edit (and then recode input)
|
||
|
-+ RTL
|
||
|
-+ Direction mark character depending on RTL
|
||
|
-? Arrow depending on RTL
|
||
|
- Languages where italics cannot be used
|
||
|
-+ Number formatting (commafy, transform digits, transform separators)
|
||
|
- Truncate (multibyte)
|
||
|
- Grammar conversions for inflected languages
|
||
|
- Plural transformations
|
||
|
- Formatting expiry times
|
||
|
- Segmenting for diffs (Chinese)
|
||
|
- Convert to variants of language
|
||
|
- Language specific user preference options
|
||
|
- Link trails [[foo]]bar
|
||
|
-+ Language code (RFC 3066)
|
||
|
|
||
|
Neat functionality:
|
||
|
|
||
|
- I18N sprintfDate
|
||
|
- Roman numeral formatting
|
||
|
|
||
|
Items marked with a + likely need to be addressed by HTML Purifier
|