mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2025-01-05 06:01:52 +00:00
12b811d749
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
219 lines
8.2 KiB
HTML
219 lines
8.2 KiB
HTML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
|
<meta name="description" content="Specification for HTML Purifier's advanced API for defining custom filtering behavior." />
|
|
<link rel="stylesheet" type="text/css" href="style.css" />
|
|
|
|
<title>Advanced API - HTML Purifier</title>
|
|
|
|
</head><body>
|
|
|
|
<h1>Advanced API</h1>
|
|
|
|
<div id="filing">Filed under Development</div>
|
|
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
|
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
|
|
|
|
<p>
|
|
<strong>Warning:</strong> This document may be out-of-date. When in doubt,
|
|
consult the source code documentation.
|
|
</p>
|
|
|
|
<p>HTML Purifier currently natively supports only a subset of HTML's
|
|
allowed elements, attributes, and behavior; specifically, this subset
|
|
is the set of elements that are safe for untrusted users to use.
|
|
However, HTML Purifier is often utilized to ensure standards-compliance
|
|
from input that is trusted (making it a sort of Tidy substitute),
|
|
and often users need to define new elements or attributes. The
|
|
advanced API is oriented specifically for these use-cases.</p>
|
|
|
|
<p>Our goals are to let the user:</p>
|
|
|
|
<dl>
|
|
<dt>Select</dt>
|
|
<dd><ul>
|
|
<li>Doctype</li>
|
|
<!-- <li>Filterset</li> -->
|
|
<li>Elements / Attributes / Modules</li>
|
|
<li>Tidy</li>
|
|
</ul></dd>
|
|
<dt>Customize</dt>
|
|
<dd><ul>
|
|
<li>Attributes</li>
|
|
<li>Elements</li>
|
|
<!--<li>Doctypes</li>-->
|
|
</ul></dd>
|
|
</dl>
|
|
|
|
<h2>Select</h2>
|
|
|
|
<p>For basic use, the user will have to specify some basic parameters. This
|
|
is not strictly necessary, as HTML Purifier's default setting will always
|
|
output safe code, but is required for standards-compliant output.</p>
|
|
|
|
<h3>Selecting a Doctype</h3>
|
|
|
|
<p>The first thing to select is the <strong>doctype</strong>. This
|
|
is essential for standards-compliant output.</p>
|
|
|
|
<p class="technical">This identifier is based
|
|
on the name the W3C has given to the document type and <em>not</em>
|
|
the DTD identifier.</p>
|
|
|
|
<p>This parameter is set via the configuration object:</p>
|
|
|
|
<pre>$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');</pre>
|
|
|
|
<p>Due to historical reasons, the default doctype is XHTML 1.0
|
|
Transitional, however, we really shouldn't be guessing what the user's
|
|
doctype is. Fortunantely, people who can't be bothered to set this won't
|
|
be bothered when their pages stop validating.</p>
|
|
|
|
<h3>Selecting Elements / Attributes / Modules</h3>
|
|
|
|
<p>HTML Purifier will, by default, allow as many elements and attributes
|
|
as possible. However, a user may decide to roll their own filterset by
|
|
selecting modules, elements and attributes to allow for their own
|
|
specific use-case. This can be done using %HTML.Allowed:</p>
|
|
|
|
<pre>$config->set('HTML', 'Allowed', 'a[href|title],em,p,blockquote');</pre>
|
|
|
|
<p class="technical">The directive %HTML.Allowed is a convenience feature
|
|
that may be fully expressed with the legacy interface.</p>
|
|
|
|
<p>We currently support another interface from older versions:</p>
|
|
|
|
<pre>$config->set('HTML', 'AllowedElements', 'a,em,p,blockquote');
|
|
$config->set('HTML', 'AllowedAttributes', 'a.href,a.title');</pre>
|
|
|
|
<p>A user may also choose to allow modules using a specialized
|
|
directive:</p>
|
|
|
|
<pre>$config->set('HTML', 'AllowedModules', 'Hypertext,Text,Lists');</pre>
|
|
|
|
<p>But it is not expected that this feature will be widely used.</p>
|
|
|
|
<p class="technical">Module selection will work slightly differently
|
|
from the other AllowedElements and AllowedAttributes directives by
|
|
directly modifying the doctype you are operating in, in the spirit of
|
|
XHTML 1.1's modularization. We stop users from shooting themselves in the
|
|
foot by mandating the modules in %HTML.CoreModules be used.</p>
|
|
|
|
<p class="technical">Modules are distinguished from regular elements by the
|
|
case of their first letter. While XML distinguishes between and allows
|
|
lower and uppercase letters in element names, XHTML uses only lower-case
|
|
element names for sake of consistency.</p>
|
|
|
|
<h3>Selecting Tidy</h3>
|
|
|
|
<p>The name of this segment of functionality is inspired off of Dave
|
|
Ragget's program HTML Tidy, which purported to help clean up HTML. In
|
|
HTML Purifier, Tidy functionality involves turning unsupported and
|
|
deprecated elements into standards-compliant ones, maintaining
|
|
backwards compatibility, and enforcing best practices.</p>
|
|
|
|
<p>This is a complicated feature, and is explained more in depth at
|
|
<a href="enduser-tidy.html">the Tidy documentation page</a>.</p>
|
|
|
|
<!--
|
|
<h3>Unified selector</h3>
|
|
|
|
<p>Because selecting each and every one of these configuration options
|
|
is a chore, we may wish to offer a specialized configuration method
|
|
for selecting a filterset. Possibility:</p>
|
|
|
|
<pre>function selectFilter($doctype, $filterset, $tidy)</pre>
|
|
|
|
<p>...which is simply a light wrapper over the individual configuration
|
|
calls. A custom config file format or text format could also be adopted.</p>
|
|
-->
|
|
|
|
<h2>Customize</h2>
|
|
|
|
<p>By reviewing topic posts in the support forum, we determined that
|
|
there were two primarily demanded customization features people wanted:
|
|
to add an attribute to an existing element, and to add an element.
|
|
Thus, we'll want to create convenience functions for these common
|
|
use-cases.</p>
|
|
|
|
<p>Note that the functions described here are only available if
|
|
a raw copy of <code>HTMLPurifier_HTMLDefinition</code> was retrieved.
|
|
Furthermore, caching may prevent your changes from immediately
|
|
being seen: consult <a href="enduser-customize.html">enduser-customize.html</a> on how
|
|
to work around this.</p>
|
|
|
|
<h3>Attributes</h3>
|
|
|
|
<p>An attribute is bound to an element by a name and has a specific
|
|
<code>AttrDef</code> that validates it. The interface is therefore:</p>
|
|
|
|
<pre>function addAttribute($element, $attribute, $attribute_def);</pre>
|
|
|
|
<p>Example of the functionality in action:</p>
|
|
|
|
<pre>$def->addAttribute('a', 'rel', 'Enum#nofollow');</pre>
|
|
|
|
<p>The <code>$attribute_def</code> value is flexible,
|
|
to make things simpler. It can be a literal object or:</p>
|
|
|
|
<ul>
|
|
<!--<li>Class name: We'll instantiate it for you</li>
|
|
<li>Function name: We'll create an <code>HTMLPurifier_AttrDef_Anonymous</code>
|
|
class with that function registered as a callback.</li>-->
|
|
<li>String attribute type: We'll use <code>HTMLPurifier_AttrTypes</code>
|
|
to resolve it for you. Any data that follows a hash mark (#) will
|
|
be used to customize the attribute type: in the example above,
|
|
we specify which values for Enum to allow.</li>
|
|
</ul>
|
|
|
|
<h3>Elements</h3>
|
|
|
|
<p>An element requires certain information as specified by
|
|
<code>HTMLPurifier_ElementDef</code>. However, not all of it is necessary,
|
|
the usual things required are:</p>
|
|
|
|
<ul>
|
|
<li>Attributes</li>
|
|
<li>Content model/type</li>
|
|
<li>Registration in a content set</li>
|
|
</ul>
|
|
|
|
<p>This suggests an API like this:</p>
|
|
|
|
<pre>function addElement($element, $type, $contents,
|
|
$attr_collections = array(); $attributes = array());</pre>
|
|
|
|
<p>Each parameter explained in depth:</p>
|
|
|
|
<dl>
|
|
<dt><code>$element</code></dt>
|
|
<dd>Element name, ex. 'label'</dd>
|
|
<dt><code>$type</code></dt>
|
|
<dd>Content set to register in, ex. 'Inline' or 'Flow'</dd>
|
|
<dt><code>$contents</code></dt>
|
|
<dd>Description of allowed children. This is a merged form of
|
|
<code>HTMLPurifier_ElementDef</code>'s member variables
|
|
<code>$content_model</code> and <code>$content_model_type</code>,
|
|
where the form is <q>Type: Model</q>, ex. 'Optional: Inline'.
|
|
There are also a number of predefined templates one may use.</dd>
|
|
<dt><code>$attr_collections</code></dt>
|
|
<dd>Array (or string if only one) of attribute collection(s) to
|
|
merge into the attributes array.</dd>
|
|
<dt><code>$attributes</code></dt>
|
|
<dd>Array of attribute names to attribute definitions, much like
|
|
the above-described attribute customization.</dd>
|
|
</dl>
|
|
|
|
<p>A possible usage:</p>
|
|
|
|
<pre>$def->addElement('font', 'Inline', 'Optional: Inline', 'Common',
|
|
array('color' => 'Color'));</pre>
|
|
|
|
<p>See <code>HTMLPurifier/HTMLModule.php</code> for details.</p>
|
|
|
|
</body></html>
|
|
|
|
<!-- vim: et sw=4 sts=4 -->
|