diff --git a/INSTALL b/INSTALL index 8ee41e86..2b4069e2 100644 --- a/INSTALL +++ b/INSTALL @@ -231,12 +231,12 @@ HTML Purifier uses iconv to support other character encodings, as such, any encoding that iconv supports HTML Purifier supports with this code: - $config->set('Core', 'Encoding', /* put your encoding here */); + $config->set('Core.Encoding', /* put your encoding here */); An example usage for Latin-1 websites (the most common encoding for English websites): - $config->set('Core', 'Encoding', 'ISO-8859-1'); + $config->set('Core.Encoding', 'ISO-8859-1'); Note that HTML Purifier's support for non-Unicode encodings is crippled by the fact that any character not supported by that encoding will be silently @@ -251,7 +251,7 @@ reason, I do not include the solution in this document). For those of you using HTML 4.01 Transitional, you can disable XHTML output like this: - $config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); + $config->set('HTML.Doctype', 'HTML 4.01 Transitional'); Other supported doctypes include: @@ -277,14 +277,14 @@ are, respectively, %HTML.Allowed, %URI.MakeAbsolute and %URI.Base, and %AutoFormat.AutoParagraph. The %Namespace.Directive naming convention translates to: - $config->set('Namespace', 'Directive', $value); + $config->set('Namespace.Directive', $value); E.g. - $config->set('HTML', 'Allowed', 'p,b,a[href],i'); - $config->set('URI', 'Base', 'http://www.example.com'); - $config->set('URI', 'MakeAbsolute', true); - $config->set('AutoFormat', 'AutoParagraph', true); + $config->set('HTML.Allowed', 'p,b,a[href],i'); + $config->set('URI.Base', 'http://www.example.com'); + $config->set('URI.MakeAbsolute', true); + $config->set('AutoFormat.AutoParagraph', true); --------------------------------------------------------------------------- @@ -318,11 +318,11 @@ If you are unable or unwilling to give write permissions to the cache directory, you can either disable the cache (and suffer a performance hit): - $config->set('Core', 'DefinitionCache', null); + $config->set('Core.DefinitionCache', null); Or move the cache directory somewhere else (no trailing slash): - $config->set('Cache', 'SerializerPath', '/home/user/absolute/path'); + $config->set('Cache.SerializerPath', '/home/user/absolute/path'); --------------------------------------------------------------------------- @@ -363,8 +363,8 @@ If your website is in a different encoding or doctype, use this code: require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php'; $config = HTMLPurifier_Config::createDefault(); - $config->set('Core', 'Encoding', 'ISO-8859-1'); // replace with your encoding - $config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); // replace with your doctype + $config->set('Core.Encoding', 'ISO-8859-1'); // replace with your encoding + $config->set('HTML.Doctype', 'HTML 4.01 Transitional'); // replace with your doctype $purifier = new HTMLPurifier($config); $clean_html = $purifier->purify($dirty_html); diff --git a/TODO b/TODO index 6bc6aa71..565ad487 100644 --- a/TODO +++ b/TODO @@ -18,14 +18,13 @@ afraid to cast your vote for the next feature to be implemented! - Incorporate download and resize support as implemented here: http://htmlpurifier.org/phorum/read.php?3,2795,3628 - Think about allowing explicit order of operations hooks for transforms -- Make it dead easy for other authors to maintain their own configuration - pools. Encourage them to namespace them (this flies counter to our - "hey, let's use convention idea", so that's why the "register" extra - field will end up being a good idea: because it means we can forgo - convention for external things +- Add "register" field to config schemas to eliminate dependence on + naming conventions - Make it easy for people to cache their entire configuration (so that they have one script they run to change configuration, and then a stub loader to get that configuration) +- Add examples to everything (make built-in which also automatically + gives output) FUTURE VERSIONS --------------- diff --git a/docs/dev-advanced-api.html b/docs/dev-advanced-api.html index 0233a56d..5b7aaa3c 100644 --- a/docs/dev-advanced-api.html +++ b/docs/dev-advanced-api.html @@ -17,202 +17,9 @@
HTML Purifier End-User Documentation

- Warning: This document may be out-of-date. When in doubt, - consult the source code documentation. + Please see Customize!

-

HTML Purifier currently natively supports only a subset of HTML's -allowed elements, attributes, and behavior; specifically, this subset -is the set of elements that are safe for untrusted users to use. -However, HTML Purifier is often utilized to ensure standards-compliance -from input that is trusted (making it a sort of Tidy substitute), -and often users need to define new elements or attributes. The -advanced API is oriented specifically for these use-cases.

- -

Our goals are to let the user:

- -
-
Select
-
    -
  • Doctype
  • - -
  • Elements / Attributes / Modules
  • -
  • Tidy
  • -
-
Customize
-
    -
  • Attributes
  • -
  • Elements
  • - -
-
- -

Select

- -

For basic use, the user will have to specify some basic parameters. This -is not strictly necessary, as HTML Purifier's default setting will always -output safe code, but is required for standards-compliant output.

- -

Selecting a Doctype

- -

The first thing to select is the doctype. This -is essential for standards-compliant output.

- -

This identifier is based -on the name the W3C has given to the document type and not -the DTD identifier.

- -

This parameter is set via the configuration object:

- -
$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');
- -

Due to historical reasons, the default doctype is XHTML 1.0 -Transitional, however, we really shouldn't be guessing what the user's -doctype is. Fortunantely, people who can't be bothered to set this won't -be bothered when their pages stop validating.

- -

Selecting Elements / Attributes / Modules

- -

HTML Purifier will, by default, allow as many elements and attributes -as possible. However, a user may decide to roll their own filterset by -selecting modules, elements and attributes to allow for their own -specific use-case. This can be done using %HTML.Allowed:

- -
$config->set('HTML', 'Allowed', 'a[href|title],em,p,blockquote');
- -

The directive %HTML.Allowed is a convenience feature -that may be fully expressed with the legacy interface.

- -

We currently support another interface from older versions:

- -
$config->set('HTML', 'AllowedElements', 'a,em,p,blockquote');
-$config->set('HTML', 'AllowedAttributes', 'a.href,a.title');
- -

A user may also choose to allow modules using a specialized -directive:

- -
$config->set('HTML', 'AllowedModules', 'Hypertext,Text,Lists');
- -

But it is not expected that this feature will be widely used.

- -

Module selection will work slightly differently -from the other AllowedElements and AllowedAttributes directives by -directly modifying the doctype you are operating in, in the spirit of -XHTML 1.1's modularization. We stop users from shooting themselves in the -foot by mandating the modules in %HTML.CoreModules be used.

- -

Modules are distinguished from regular elements by the -case of their first letter. While XML distinguishes between and allows -lower and uppercase letters in element names, XHTML uses only lower-case -element names for sake of consistency.

- -

Selecting Tidy

- -

The name of this segment of functionality is inspired off of Dave -Ragget's program HTML Tidy, which purported to help clean up HTML. In -HTML Purifier, Tidy functionality involves turning unsupported and -deprecated elements into standards-compliant ones, maintaining -backwards compatibility, and enforcing best practices.

- -

This is a complicated feature, and is explained more in depth at -the Tidy documentation page.

- - - -

Customize

- -

By reviewing topic posts in the support forum, we determined that -there were two primarily demanded customization features people wanted: -to add an attribute to an existing element, and to add an element. -Thus, we'll want to create convenience functions for these common -use-cases.

- -

Note that the functions described here are only available if -a raw copy of HTMLPurifier_HTMLDefinition was retrieved. -Furthermore, caching may prevent your changes from immediately -being seen: consult enduser-customize.html on how -to work around this.

- -

Attributes

- -

An attribute is bound to an element by a name and has a specific -AttrDef that validates it. The interface is therefore:

- -
function addAttribute($element, $attribute, $attribute_def);
- -

Example of the functionality in action:

- -
$def->addAttribute('a', 'rel', 'Enum#nofollow');
- -

The $attribute_def value is flexible, -to make things simpler. It can be a literal object or:

- - - -

Elements

- -

An element requires certain information as specified by -HTMLPurifier_ElementDef. However, not all of it is necessary, -the usual things required are:

- - - -

This suggests an API like this:

- -
function addElement($element, $type, $contents,
-    $attr_collections = array(); $attributes = array());
- -

Each parameter explained in depth:

- -
-
$element
-
Element name, ex. 'label'
-
$type
-
Content set to register in, ex. 'Inline' or 'Flow'
-
$contents
-
Description of allowed children. This is a merged form of - HTMLPurifier_ElementDef's member variables - $content_model and $content_model_type, - where the form is Type: Model, ex. 'Optional: Inline'. - There are also a number of predefined templates one may use.
-
$attr_collections
-
Array (or string if only one) of attribute collection(s) to - merge into the attributes array.
-
$attributes
-
Array of attribute names to attribute definitions, much like - the above-described attribute customization.
-
- -

A possible usage:

- -
$def->addElement('font', 'Inline', 'Optional: Inline', 'Common',
-    array('color' => 'Color'));
- -

See HTMLPurifier/HTMLModule.php for details.

-