From a846f4e70b8c8a97cd187633ddc17887e5109b96 Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang" <edwardzyang@thewritingpot.com>
Date: Wed, 16 May 2007 03:35:57 +0000
Subject: [PATCH] [1.7.0] Update Advanced API documentation to reflect new
 changes.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1066 48356398-32a2-884e-a903-53898d9a118a
---
 docs/dev-advanced-api.html | 175 ++++++++++++++-----------------------
 1 file changed, 65 insertions(+), 110 deletions(-)

diff --git a/docs/dev-advanced-api.html b/docs/dev-advanced-api.html
index a9d9f745..907e9e08 100644
--- a/docs/dev-advanced-api.html
+++ b/docs/dev-advanced-api.html
@@ -17,9 +17,12 @@
 <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
 
 <p>HTML Purifier currently natively supports only a subset of HTML's
-allowed elements, attributes, and behavior. This is by design,
-but as the user is always right, they'll need some method to overload
-these behaviors.</p>
+allowed elements, attributes, and behavior; specifically, this subset
+is the set of elements that are safe for untrusted users to use.
+However, HTML Purifier is often utilized to ensure standards-compliance
+from input that is trusted (making it a sort of Tidy substitute),
+and often users need to define new elements or attributes. The
+advanced API is oriented specifically for these use-cases.</p>
 
 <p>Our goals are to let the user:</p>
 
@@ -27,20 +30,15 @@ these behaviors.</p>
     <dt>Select</dt>
     <dd><ul>
         <li>Doctype</li>
-        <li>Mode: Lenient / Correctional</li>
+        <li><em>Filterset</em></li>
         <li>Elements / Attributes / Modules</li>
-        <li>Filterset</li>
+        <li>Tidy</li>
     </ul></dd>
     <dt>Customize</dt>
     <dd><ul>
         <li>Attributes</li>
         <li>Elements</li>
-    </ul></dd>
-    <dt>Internals</dt>
-    <dd><ul>
-        <li>Modules / Elements / Attributes / Attribute Types</li>
-        <li>Filtersets</li>
-        <li>Doctype</li>
+        <li>Doctypes</li>
     </ul></dd>
 </dl>
 
@@ -57,7 +55,7 @@ is essential for standards-compliant output.</p>
 
 <p class="technical">This identifier is based
 on the name the W3C has given to the document type and <em>not</em>
-the DTD identifier.</p>
+the DTD identifier, although that may be included as an alias.</p>
 
 <p>This parameter is set via the configuration object:</p>
 
@@ -68,86 +66,16 @@ Transitional, however, we really shouldn't be guessing what the user's
 doctype is. Fortunantely, people who can't be bothered to set this won't
 be bothered when their pages stop validating.</p>
 
-<h3>Selecting Mode</h3>
-
-<p>Within doctypes, there are various <strong>modes</strong> of operation.
-These indicate variant behaviors that, while not strictly changing the
-allowed set of elements and attributes, definitely affect the output.
-Currently, we have two modes, which may be used together:</p>
-
-<dl>
-    <dt>Lenient</dt>
-    <dd>
-        <p>Deprecated elements and attributes will be transformed into
-        standards-compliant alternatives when explicitly disallowed.</p>
-        <p>For example, in the XHTML 1.0 Strict doctype, a <code>center</code>
-        element would be turned into a <code>div</code> with the CSS property
-        <code>text-align:center;</code>, but in XHTML 1.0 Transitional
-        the element would be preserved.</p>
-        <p>This mode is on by default.</p>
-    </dd>
-    <dt>Correctional[items to correct]</dt>
-    <dd>
-        <p>Deprecated elements and attributes will be transformed into
-        standards-compliant alternatives whenever possible.
-        It may have various levels of operation.</p>
-        <p>Referring back to the previous example, the <code>center</code> element would
-        be transformed in both cases. However, elements without a
-        reasonable standards-compliant alternative will be preserved
-        in their form.</p>
-        <p>A user may want to correct certain deprecated attributes, but
-        not others. For example, the <code>bgcolor</code> attribute may be
-        acceptable, but the <code>center</code> element not; also, possibly,
-        an HTML Purifier transformation may be buggy, so the user wants
-        to forgo it. Thus, correctional accepts an array defining which
-        elements and attributes to cleanup, or no parameter at all, which
-        means everything gets corrected. This also means that each
-        correction needs to be given a unique ID that can be referenced
-        in this manner. (We may also allow globbing, like *.name or a.*
-        for mass-enabling correction, and subtractive mode, where things
-        specified stop correction.) This array gets passed into the
-        constructor of the mode's module.</p>
-        <p>This mode is on by default.</p>
-    </dd>
-</dl>
-
-<p>A possible call to select modes would be:</p>
-
-<pre>$config->set('HTML', 'Mode', array('correctional', 'lenient'));</pre>
-
-<p>If modes have extra parameters, a hash is necessary:</p>
-
-<pre>$config->set('HTML', 'Mode', array(
-    'correctional' => 'center,a.name',
-    'lenient' => true // this one's just boolean
-));</pre>
-
-<p>Modes may be specified along with the doctype declaration (we may want
-to get a better set of separator characters):</p>
-
-<pre>$config->setDoctype('XHTML Transitional 1.0', '+correctional[center,a.name] -lenient');</pre>
-
-<p>
-With regards to the various levels of operation conjectured in the
-Correctional mode, this is prompted by the fact that a user may want to
-correct certain problems but not others, for example, fix the <code>center</code>
-element but not the <code>u</code> element, both of which are deprecated.
-Having an integer <q>level</q> will not work very well for such fine
-grained tweaking, but an array of specific settings might.</p>
-
 <h3>Selecting Elements / Attributes / Modules</h3>
 
-<p></p>
+<p>HTML Purifier will, by default, allow as many elements and attributes
+as possible. However, a user may decide to roll their own filterset by
+selecting modules, elements and attributes to allow for their own
+specific use-case.</p>
 
-<p>If this cookie cutter approach doesn't appeal to a user, they may
-decide to roll their own filterset by selecting modules, elements and
-attributes to allow.</p>
-
-<p class="technical">This would make use of the same facilities
-as a filterset author would use, except that it would go under an
-<q>anonymous</q> filterset that would be auto-selected if any of the
-relevant module/elements/attribute selection configuration directives were
-non-null.</p>
+<p class="technical">The currently un-documented Filterset interface
+will offer a way of encapsulating the following declarations, so that
+a user can pick a recipe of tags that is thought to be commonly used.</p>
 
 <p>In practice, this is the most commonly demanded feature. Most users are
 perfectly happy defining a filterset that looks like:</p>
@@ -156,7 +84,8 @@ perfectly happy defining a filterset that looks like:</p>
 
 <p class="technical">The directive %HTML.Allowed is a convenience function
 that may be fully expressed with the legacy interface, and thus is
-given its own setter.</p>
+given its own setter, or implemented by intercepting the set() function
+call, parsing, and assigning to the finer grained directives accordingly.</p>
 
 <p>We currently support a separated interface, which also must be preserved:</p>
 
@@ -170,23 +99,45 @@ $config->setAllowedHTML('Hypertext,Text,Lists');</pre>
 
 <p>But it is not expected that this feature will be widely used.</p>
 
-<p class="fixme">The granularity of these modules is too coarse for
-the average user (for example, the core module loads everything from
-the essential <code>p</code> element to the not-so-safe <code>h1</code>
-element). How do we make this still a viable solution? Possible answers
-may be sub-modules or module parameters. This may not even be a problem,
-considering that most people won't be selecting modules.</p>
+<p class="technical">Module selection will work slightly differently
+from the other AllowedElements and AllowedAttributes directives by
+directly modifying the doctype you are operating in. You cannot,
+however, add modules: there is a separate interface for that.</p>
 
 <p class="technical">Modules are distinguished from regular elements by the
 case of their first letter. While XML distinguishes between and allows
-lower and uppercase letters in element names, most well-known XML
-languages use only lower-case
+lower and uppercase letters in element names, XHTML uses only lower-case
 element names for sake of consistency.</p>
 
-<p class="technical">Considering that, internally speaking, as mandated by
-the XHTML 1.1 Modularization specification, we have organized our
-elements around modules, considerable gymnastics will be needed to
-get this sort of functionality working.</p>
+<h3>Selecting Tidy</h3>
+
+<p>The name of this segment of functionality is inspired off of Dave
+Ragget's program HTML Tidy, which purported to help clean up HTML. In
+HTML Purifier, Tidy functionality involves turning unsupported and
+deprecated elements into standards-compliant ones, maintaining
+backwards compatibility, and enforcing best practices.</p>
+
+<p>Tidy is optional, when on, it has several coarse
+levels of operations, as well as directives that can be used to fine-tune
+the output. The coarse levels, set at %HTML.TidyLevel, are:</p>
+
+<dl>
+    <dt>Lenient</dt>
+    <dd>Preserve any non standards-compliant aspects by transforming
+        them into standards-compliant equivalents.</dd>
+    <dt>Correctional</dt>
+    <dd>Default: Be lenient and enforce good practices.</dd>
+    <dt>Aggressive</dt>
+    <dd>Be correctional and transform as many deprecated elements as
+        possible to CSS forms</dd>
+</dl>
+
+<p>The distinction between correctional and aggressive is fuzzy,
+so the user will also have %HTML.TidyAdd and %HTML.TidyRemove, in
+which they may list the names of transforms they want and don't want,
+using the broad level as a starting point. The naming convention
+has not been established yet, but it will be something along the lines
+of 'element.attribute', with globs and special cases supported.</p>
 
 <h3>Unified selector</h3>
 
@@ -194,7 +145,7 @@ get this sort of functionality working.</p>
 is a chore, we may wish to offer a specialized configuration method
 for selecting a filterset. Possibility:</p>
 
-<pre>function selectFilter($doctype, $filterset, $mode)</pre>
+<pre>function selectFilter($doctype, $filterset, $tidy)</pre>
 
 <p>...which is simply a light wrapper over the individual configuration
 calls. A custom config file format or text format could also be adopted.</p>
@@ -255,7 +206,8 @@ the usual things required are:</p>
 
 <p>This suggests an API like this:</p>
 
-<pre>function addElement($element, $type, $content_model, $attributes = array());</pre>
+<pre>function addElement($element, $type, $contents,
+    $attr_collections = array(); $attributes = array());</pre>
 
 <p>Each parameter explained in depth:</p>
 
@@ -264,11 +216,15 @@ the usual things required are:</p>
     <dd>Element name, ex. 'label'</dd>
     <dt><code>$type</code></dt>
     <dd>Content set to register in, ex. 'Inline' or 'Flow'</dd>
-    <dt><code>$content_model</code></dt>
+    <dt><code>$contents</code></dt>
     <dd>Description of allowed children. This is a merged form of
         <code>HTMLPurifier_ElementDef</code>'s member variables
         <code>$content_model</code> and <code>$content_model_type</code>,
-        where the form is <q>Type: Model</q>, ex. 'Optional: Inline'.</dd>
+        where the form is <q>Type: Model</q>, ex. 'Optional: Inline'.
+        There are also a number of predefined templates one may use.</dd>
+    <dt><code>$attr_collections</code></dt>
+    <dd>Array (or string if only one) of attribute collection(s) to
+        merge into the attributes array.</dd>
     <dt><code>$attributes</code></dt>
     <dd>Array of attribute names to attribute definitions, much like
         the above-described attribute customization.</dd>
@@ -276,11 +232,10 @@ the usual things required are:</p>
 
 <p>A possible usage:</p>
 
-<pre>$def->addElement('font', 'Inline', 'Optional: Inline',
-    array(0 => array('Common'), 'color' => 'Color'));</pre>
+<pre>$def->addElement('font', 'Inline', 'Optional: Inline', 'Common',
+    array('color' => 'Color'));</pre>
 
-<p>We may want to Common attribute collection inclusion to be added
-by default.</p>
+<p>See <code>HTMLPurifier/HTMLModule.php</code> for details.</p>
 
 <div id="version">$Id$</div>