Warning: This document may be out-of-date. When in doubt, consult the source code documentation.
HTML Purifier currently natively supports only a subset of HTML's allowed elements, attributes, and behavior; specifically, this subset is the set of elements that are safe for untrusted users to use. However, HTML Purifier is often utilized to ensure standards-compliance from input that is trusted (making it a sort of Tidy substitute), and often users need to define new elements or attributes. The advanced API is oriented specifically for these use-cases.
Our goals are to let the user:
For basic use, the user will have to specify some basic parameters. This is not strictly necessary, as HTML Purifier's default setting will always output safe code, but is required for standards-compliant output.
The first thing to select is the doctype. This is essential for standards-compliant output.
This identifier is based on the name the W3C has given to the document type and not the DTD identifier.
This parameter is set via the configuration object:
$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');
Due to historical reasons, the default doctype is XHTML 1.0 Transitional, however, we really shouldn't be guessing what the user's doctype is. Fortunantely, people who can't be bothered to set this won't be bothered when their pages stop validating.
HTML Purifier will, by default, allow as many elements and attributes as possible. However, a user may decide to roll their own filterset by selecting modules, elements and attributes to allow for their own specific use-case. This can be done using %HTML.Allowed:
$config->set('HTML', 'Allowed', 'a[href|title],em,p,blockquote');
The directive %HTML.Allowed is a convenience feature that may be fully expressed with the legacy interface.
We currently support another interface from older versions:
$config->set('HTML', 'AllowedElements', 'a,em,p,blockquote'); $config->set('HTML', 'AllowedAttributes', 'a.href,a.title');
A user may also choose to allow modules using a specialized directive:
$config->set('HTML', 'AllowedModules', 'Hypertext,Text,Lists');
But it is not expected that this feature will be widely used.
Module selection will work slightly differently from the other AllowedElements and AllowedAttributes directives by directly modifying the doctype you are operating in, in the spirit of XHTML 1.1's modularization. We stop users from shooting themselves in the foot by mandating the modules in %HTML.CoreModules be used.
Modules are distinguished from regular elements by the case of their first letter. While XML distinguishes between and allows lower and uppercase letters in element names, XHTML uses only lower-case element names for sake of consistency.
The name of this segment of functionality is inspired off of Dave Ragget's program HTML Tidy, which purported to help clean up HTML. In HTML Purifier, Tidy functionality involves turning unsupported and deprecated elements into standards-compliant ones, maintaining backwards compatibility, and enforcing best practices.
This is a complicated feature, and is explained more in depth at the Tidy documentation page.
By reviewing topic posts in the support forum, we determined that there were two primarily demanded customization features people wanted: to add an attribute to an existing element, and to add an element. Thus, we'll want to create convenience functions for these common use-cases.
Note that the functions described here are only available if
a raw copy of HTMLPurifier_HTMLDefinition
was retrieved.
Furthermore, caching may prevent your changes from immediately
being seen: consult enduser-customize.html on how
to work around this.
An attribute is bound to an element by a name and has a specific
AttrDef
that validates it. The interface is therefore:
function addAttribute($element, $attribute, $attribute_def);
Example of the functionality in action:
$def->addAttribute('a', 'rel', 'Enum#nofollow');
The $attribute_def
value is flexible,
to make things simpler. It can be a literal object or:
HTMLPurifier_AttrTypes
to resolve it for you. Any data that follows a hash mark (#) will
be used to customize the attribute type: in the example above,
we specify which values for Enum to allow.An element requires certain information as specified by
HTMLPurifier_ElementDef
. However, not all of it is necessary,
the usual things required are:
This suggests an API like this:
function addElement($element, $type, $contents, $attr_collections = array(); $attributes = array());
Each parameter explained in depth:
$element
$type
$contents
HTMLPurifier_ElementDef
's member variables
$content_model
and $content_model_type
,
where the form is Type: Model, ex. 'Optional: Inline'. There are also a number of predefined templates one may use.
$attr_collections
$attributes
A possible usage:
$def->addElement('font', 'Inline', 'Optional: Inline', 'Common', array('color' => 'Color'));
See HTMLPurifier/HTMLModule.php
for details.