diff --git a/docs/dev-advanced-api.html b/docs/dev-advanced-api.html index a9d9f745..907e9e08 100644 --- a/docs/dev-advanced-api.html +++ b/docs/dev-advanced-api.html @@ -17,9 +17,12 @@
HTML Purifier End-User Documentation

HTML Purifier currently natively supports only a subset of HTML's -allowed elements, attributes, and behavior. This is by design, -but as the user is always right, they'll need some method to overload -these behaviors.

+allowed elements, attributes, and behavior; specifically, this subset +is the set of elements that are safe for untrusted users to use. +However, HTML Purifier is often utilized to ensure standards-compliance +from input that is trusted (making it a sort of Tidy substitute), +and often users need to define new elements or attributes. The +advanced API is oriented specifically for these use-cases.

Our goals are to let the user:

@@ -27,20 +30,15 @@ these behaviors.

Select
Customize
-
Internals
-
@@ -57,7 +55,7 @@ is essential for standards-compliant output.

This identifier is based on the name the W3C has given to the document type and not -the DTD identifier.

+the DTD identifier, although that may be included as an alias.

This parameter is set via the configuration object:

@@ -68,86 +66,16 @@ Transitional, however, we really shouldn't be guessing what the user's doctype is. Fortunantely, people who can't be bothered to set this won't be bothered when their pages stop validating.

-

Selecting Mode

- -

Within doctypes, there are various modes of operation. -These indicate variant behaviors that, while not strictly changing the -allowed set of elements and attributes, definitely affect the output. -Currently, we have two modes, which may be used together:

- -
-
Lenient
-
-

Deprecated elements and attributes will be transformed into - standards-compliant alternatives when explicitly disallowed.

-

For example, in the XHTML 1.0 Strict doctype, a center - element would be turned into a div with the CSS property - text-align:center;, but in XHTML 1.0 Transitional - the element would be preserved.

-

This mode is on by default.

-
-
Correctional[items to correct]
-
-

Deprecated elements and attributes will be transformed into - standards-compliant alternatives whenever possible. - It may have various levels of operation.

-

Referring back to the previous example, the center element would - be transformed in both cases. However, elements without a - reasonable standards-compliant alternative will be preserved - in their form.

-

A user may want to correct certain deprecated attributes, but - not others. For example, the bgcolor attribute may be - acceptable, but the center element not; also, possibly, - an HTML Purifier transformation may be buggy, so the user wants - to forgo it. Thus, correctional accepts an array defining which - elements and attributes to cleanup, or no parameter at all, which - means everything gets corrected. This also means that each - correction needs to be given a unique ID that can be referenced - in this manner. (We may also allow globbing, like *.name or a.* - for mass-enabling correction, and subtractive mode, where things - specified stop correction.) This array gets passed into the - constructor of the mode's module.

-

This mode is on by default.

-
-
- -

A possible call to select modes would be:

- -
$config->set('HTML', 'Mode', array('correctional', 'lenient'));
- -

If modes have extra parameters, a hash is necessary:

- -
$config->set('HTML', 'Mode', array(
-    'correctional' => 'center,a.name',
-    'lenient' => true // this one's just boolean
-));
- -

Modes may be specified along with the doctype declaration (we may want -to get a better set of separator characters):

- -
$config->setDoctype('XHTML Transitional 1.0', '+correctional[center,a.name] -lenient');
- -

-With regards to the various levels of operation conjectured in the -Correctional mode, this is prompted by the fact that a user may want to -correct certain problems but not others, for example, fix the center -element but not the u element, both of which are deprecated. -Having an integer level will not work very well for such fine -grained tweaking, but an array of specific settings might.

-

Selecting Elements / Attributes / Modules

-

+

HTML Purifier will, by default, allow as many elements and attributes +as possible. However, a user may decide to roll their own filterset by +selecting modules, elements and attributes to allow for their own +specific use-case.

-

If this cookie cutter approach doesn't appeal to a user, they may -decide to roll their own filterset by selecting modules, elements and -attributes to allow.

- -

This would make use of the same facilities -as a filterset author would use, except that it would go under an -anonymous filterset that would be auto-selected if any of the -relevant module/elements/attribute selection configuration directives were -non-null.

+

The currently un-documented Filterset interface +will offer a way of encapsulating the following declarations, so that +a user can pick a recipe of tags that is thought to be commonly used.

In practice, this is the most commonly demanded feature. Most users are perfectly happy defining a filterset that looks like:

@@ -156,7 +84,8 @@ perfectly happy defining a filterset that looks like:

The directive %HTML.Allowed is a convenience function that may be fully expressed with the legacy interface, and thus is -given its own setter.

+given its own setter, or implemented by intercepting the set() function +call, parsing, and assigning to the finer grained directives accordingly.

We currently support a separated interface, which also must be preserved:

@@ -170,23 +99,45 @@ $config->setAllowedHTML('Hypertext,Text,Lists');

But it is not expected that this feature will be widely used.

-

The granularity of these modules is too coarse for -the average user (for example, the core module loads everything from -the essential p element to the not-so-safe h1 -element). How do we make this still a viable solution? Possible answers -may be sub-modules or module parameters. This may not even be a problem, -considering that most people won't be selecting modules.

+

Module selection will work slightly differently +from the other AllowedElements and AllowedAttributes directives by +directly modifying the doctype you are operating in. You cannot, +however, add modules: there is a separate interface for that.

Modules are distinguished from regular elements by the case of their first letter. While XML distinguishes between and allows -lower and uppercase letters in element names, most well-known XML -languages use only lower-case +lower and uppercase letters in element names, XHTML uses only lower-case element names for sake of consistency.

-

Considering that, internally speaking, as mandated by -the XHTML 1.1 Modularization specification, we have organized our -elements around modules, considerable gymnastics will be needed to -get this sort of functionality working.

+

Selecting Tidy

+ +

The name of this segment of functionality is inspired off of Dave +Ragget's program HTML Tidy, which purported to help clean up HTML. In +HTML Purifier, Tidy functionality involves turning unsupported and +deprecated elements into standards-compliant ones, maintaining +backwards compatibility, and enforcing best practices.

+ +

Tidy is optional, when on, it has several coarse +levels of operations, as well as directives that can be used to fine-tune +the output. The coarse levels, set at %HTML.TidyLevel, are:

+ +
+
Lenient
+
Preserve any non standards-compliant aspects by transforming + them into standards-compliant equivalents.
+
Correctional
+
Default: Be lenient and enforce good practices.
+
Aggressive
+
Be correctional and transform as many deprecated elements as + possible to CSS forms
+
+ +

The distinction between correctional and aggressive is fuzzy, +so the user will also have %HTML.TidyAdd and %HTML.TidyRemove, in +which they may list the names of transforms they want and don't want, +using the broad level as a starting point. The naming convention +has not been established yet, but it will be something along the lines +of 'element.attribute', with globs and special cases supported.

Unified selector

@@ -194,7 +145,7 @@ get this sort of functionality working.

is a chore, we may wish to offer a specialized configuration method for selecting a filterset. Possibility:

-
function selectFilter($doctype, $filterset, $mode)
+
function selectFilter($doctype, $filterset, $tidy)

...which is simply a light wrapper over the individual configuration calls. A custom config file format or text format could also be adopted.

@@ -255,7 +206,8 @@ the usual things required are:

This suggests an API like this:

-
function addElement($element, $type, $content_model, $attributes = array());
+
function addElement($element, $type, $contents,
+    $attr_collections = array(); $attributes = array());

Each parameter explained in depth:

@@ -264,11 +216,15 @@ the usual things required are:

Element name, ex. 'label'
$type
Content set to register in, ex. 'Inline' or 'Flow'
-
$content_model
+
$contents
Description of allowed children. This is a merged form of HTMLPurifier_ElementDef's member variables $content_model and $content_model_type, - where the form is Type: Model, ex. 'Optional: Inline'.
+ where the form is Type: Model, ex. 'Optional: Inline'. + There are also a number of predefined templates one may use. +
$attr_collections
+
Array (or string if only one) of attribute collection(s) to + merge into the attributes array.
$attributes
Array of attribute names to attribute definitions, much like the above-described attribute customization.
@@ -276,11 +232,10 @@ the usual things required are:

A possible usage:

-
$def->addElement('font', 'Inline', 'Optional: Inline',
-    array(0 => array('Common'), 'color' => 'Color'));
+
$def->addElement('font', 'Inline', 'Optional: Inline', 'Common',
+    array('color' => 'Color'));
-

We may want to Common attribute collection inclusion to be added -by default.

+

See HTMLPurifier/HTMLModule.php for details.

$Id$