Advanced API

Filed under Development

Return to the index.

HTML Purifier End-User Documentation

It makes no sense to adopt a one-size-fits-all approach to filtersets: therefore, users must be able to define their own sets of allowed elements, as well as switch in-between doctypes of HTML.

Our goals are to let the user:

Select

Doctype
Filtersets: Rich / Plain / Full ...
Collections: Safe / Unsafe / Leniency(?) / Corrections(?) [advanced]
Modules / Tags / Attributes

Customize

Tags / Attributes / Attribute Types
Filtersets
Root Node

Create

Modules / Tags / Attributes / Attribute Types
Filtersets
Doctype

Select

Selecting a Doctype

By default, users will use a doctype-based, permissive but secure whitelist. They must define a doctype, and this serves as the first method of determining a filterset.

This identifier is based on the name the W3C has given to the document type and not the DTD identifier.

This parameter is set via the configuration object:

$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');

Selecting a Filterset

However, selecting this doctype doesn't mean much, because if we adhered exactly to the definition we would be letting XSS and other nasties through. HTML Purifier must, in its filterset, allow a subset of the doctype, which we shall call a filterset.

By default, HTML Purifier will use the Rich filterset, which allows as many elements as possible with untrusted sources. Other possible filtersets could be:

Full: Allows the full span of elements in the doctype, good if you want HTML Purifier to work as a Tidy substitute but not to strip anything out.
Plain: Provides a minimum set of tags for semantic markup of things like blog comments.

Extension-authors would be able to define custom filtersets for other users to use.

A possible call to select a filterset would be:

$config->set('HTML', 'Filterset', 'Rich');

Selecting Modules / Tags / Attributes

If this cookie cutter approach doesn't appeal to a user, they may decide to roll their own filterset by selecting modules, tags and attributes to allow.

This would make use of the same facilities as a filterset author would use, except that it would go under an anonymous filterset that would be auto-selected if any of the relevant module/tag/attribute selection configuration directives were non-null.

$Id$