diff --git a/docs/dev-advanced-api.html b/docs/dev-advanced-api.html index a9d9f745..907e9e08 100644 --- a/docs/dev-advanced-api.html +++ b/docs/dev-advanced-api.html @@ -17,9 +17,12 @@
HTML Purifier currently natively supports only a subset of HTML's -allowed elements, attributes, and behavior. This is by design, -but as the user is always right, they'll need some method to overload -these behaviors.
+allowed elements, attributes, and behavior; specifically, this subset +is the set of elements that are safe for untrusted users to use. +However, HTML Purifier is often utilized to ensure standards-compliance +from input that is trusted (making it a sort of Tidy substitute), +and often users need to define new elements or attributes. The +advanced API is oriented specifically for these use-cases.Our goals are to let the user:
@@ -27,20 +30,15 @@ these behaviors.This identifier is based on the name the W3C has given to the document type and not -the DTD identifier.
+the DTD identifier, although that may be included as an alias.This parameter is set via the configuration object:
@@ -68,86 +66,16 @@ Transitional, however, we really shouldn't be guessing what the user's doctype is. Fortunantely, people who can't be bothered to set this won't be bothered when their pages stop validating. -Within doctypes, there are various modes of operation. -These indicate variant behaviors that, while not strictly changing the -allowed set of elements and attributes, definitely affect the output. -Currently, we have two modes, which may be used together:
- -Deprecated elements and attributes will be transformed into - standards-compliant alternatives when explicitly disallowed.
-For example, in the XHTML 1.0 Strict doctype, a center
- element would be turned into a div
with the CSS property
- text-align:center;
, but in XHTML 1.0 Transitional
- the element would be preserved.
This mode is on by default.
-Deprecated elements and attributes will be transformed into - standards-compliant alternatives whenever possible. - It may have various levels of operation.
-Referring back to the previous example, the center
element would
- be transformed in both cases. However, elements without a
- reasonable standards-compliant alternative will be preserved
- in their form.
A user may want to correct certain deprecated attributes, but
- not others. For example, the bgcolor
attribute may be
- acceptable, but the center
element not; also, possibly,
- an HTML Purifier transformation may be buggy, so the user wants
- to forgo it. Thus, correctional accepts an array defining which
- elements and attributes to cleanup, or no parameter at all, which
- means everything gets corrected. This also means that each
- correction needs to be given a unique ID that can be referenced
- in this manner. (We may also allow globbing, like *.name or a.*
- for mass-enabling correction, and subtractive mode, where things
- specified stop correction.) This array gets passed into the
- constructor of the mode's module.
This mode is on by default.
-A possible call to select modes would be:
- -$config->set('HTML', 'Mode', array('correctional', 'lenient'));- -
If modes have extra parameters, a hash is necessary:
- -$config->set('HTML', 'Mode', array( - 'correctional' => 'center,a.name', - 'lenient' => true // this one's just boolean -));- -
Modes may be specified along with the doctype declaration (we may want -to get a better set of separator characters):
- -$config->setDoctype('XHTML Transitional 1.0', '+correctional[center,a.name] -lenient');- -
-With regards to the various levels of operation conjectured in the
-Correctional mode, this is prompted by the fact that a user may want to
-correct certain problems but not others, for example, fix the center
-element but not the u
element, both of which are deprecated.
-Having an integer level
will not work very well for such fine
-grained tweaking, but an array of specific settings might.
HTML Purifier will, by default, allow as many elements and attributes +as possible. However, a user may decide to roll their own filterset by +selecting modules, elements and attributes to allow for their own +specific use-case.
-If this cookie cutter approach doesn't appeal to a user, they may -decide to roll their own filterset by selecting modules, elements and -attributes to allow.
- -This would make use of the same facilities
-as a filterset author would use, except that it would go under an
-anonymous
filterset that would be auto-selected if any of the
-relevant module/elements/attribute selection configuration directives were
-non-null.
The currently un-documented Filterset interface +will offer a way of encapsulating the following declarations, so that +a user can pick a recipe of tags that is thought to be commonly used.
In practice, this is the most commonly demanded feature. Most users are perfectly happy defining a filterset that looks like:
@@ -156,7 +84,8 @@ perfectly happy defining a filterset that looks like:The directive %HTML.Allowed is a convenience function that may be fully expressed with the legacy interface, and thus is -given its own setter.
+given its own setter, or implemented by intercepting the set() function +call, parsing, and assigning to the finer grained directives accordingly.We currently support a separated interface, which also must be preserved:
@@ -170,23 +99,45 @@ $config->setAllowedHTML('Hypertext,Text,Lists');But it is not expected that this feature will be widely used.
-The granularity of these modules is too coarse for
-the average user (for example, the core module loads everything from
-the essential p
element to the not-so-safe h1
-element). How do we make this still a viable solution? Possible answers
-may be sub-modules or module parameters. This may not even be a problem,
-considering that most people won't be selecting modules.
Module selection will work slightly differently +from the other AllowedElements and AllowedAttributes directives by +directly modifying the doctype you are operating in. You cannot, +however, add modules: there is a separate interface for that.
Modules are distinguished from regular elements by the case of their first letter. While XML distinguishes between and allows -lower and uppercase letters in element names, most well-known XML -languages use only lower-case +lower and uppercase letters in element names, XHTML uses only lower-case element names for sake of consistency.
-Considering that, internally speaking, as mandated by -the XHTML 1.1 Modularization specification, we have organized our -elements around modules, considerable gymnastics will be needed to -get this sort of functionality working.
+The name of this segment of functionality is inspired off of Dave +Ragget's program HTML Tidy, which purported to help clean up HTML. In +HTML Purifier, Tidy functionality involves turning unsupported and +deprecated elements into standards-compliant ones, maintaining +backwards compatibility, and enforcing best practices.
+ +Tidy is optional, when on, it has several coarse +levels of operations, as well as directives that can be used to fine-tune +the output. The coarse levels, set at %HTML.TidyLevel, are:
+ +The distinction between correctional and aggressive is fuzzy, +so the user will also have %HTML.TidyAdd and %HTML.TidyRemove, in +which they may list the names of transforms they want and don't want, +using the broad level as a starting point. The naming convention +has not been established yet, but it will be something along the lines +of 'element.attribute', with globs and special cases supported.
function selectFilter($doctype, $filterset, $mode)+
function selectFilter($doctype, $filterset, $tidy)
...which is simply a light wrapper over the individual configuration calls. A custom config file format or text format could also be adopted.
@@ -255,7 +206,8 @@ the usual things required are:This suggests an API like this:
-function addElement($element, $type, $content_model, $attributes = array());+
function addElement($element, $type, $contents, + $attr_collections = array(); $attributes = array());
Each parameter explained in depth:
@@ -264,11 +216,15 @@ the usual things required are:$type
$content_model
$contents
HTMLPurifier_ElementDef
's member variables
$content_model
and $content_model_type
,
- where the form is Type: Model, ex. 'Optional: Inline'.
Type: Model, ex. 'Optional: Inline'. + There are also a number of predefined templates one may use. +
$attr_collections
$attributes
A possible usage:
-$def->addElement('font', 'Inline', 'Optional: Inline', - array(0 => array('Common'), 'color' => 'Color'));+
$def->addElement('font', 'Inline', 'Optional: Inline', 'Common', + array('color' => 'Color'));-
We may want to Common attribute collection inclusion to be added -by default.
+See HTMLPurifier/HTMLModule.php
for details.