Update documentation.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@147 48356398-32a2-884e-a903-53898d9a118a
2024-12-22 08:21:52 +00:00 · 2006-08-03 01:37:28 +00:00 · 2006-08-03 01:37:28 +00:00 · f0deae1fc0
commit f0deae1fc0
parent 26733183b7
2 changed files with 27 additions and 115 deletions
--- a/docs/spec.txt
+++ b/docs/spec.txt
@ -56,111 +56,6 @@ all this functionality (or not written in HTML at all).

 The rest of this document is pending moving into their associated classes.

-
-
-
-
-
-
-
-
-
-
-
-
-== STAGE 4 - check nesting ==
-
-    Status: B (table custom definition needs to be implemented)
-
-We know that the document is now well formed. The tokenizer should now take
-things in nodes: when you hit a start tag, keep on going until you get its
-ending tag, and then handle everything inside there. Fortunantely, no
-fancy recursion is necessary as going to the next node is as simple as
-scrolling to the next start tag.
-
-Suppose we have a node and encounter a problem with one of its children.
-Depending on the complexity of the rule, we will either delete the children,
-or delete the entire node itself.
-
-The simplest type of rule is zero or more valid elements, denoted like:
-
-  ( el1 | el2 | el3 )*
-
-The next simplest is with one or more valid elements:
-
-  ( li )+
-
-And then you have complex cases:
-
- table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))
- map ((%block; | form | %misc;)+ | area+)
- html (head, body)
- head (%head.misc;,
-     ((title, %head.misc;, (base, %head.misc;)?) |
-      (base, %head.misc;, (title, %head.misc;))))
-
-Each of these has to be dealt with. Case 1 is a joy, because you can zap
-as many as you want, but you'll never actually have to kill the node. Two
-and three need the entire node to be killed if you have a problem. This
-can be problematic, as the missing node might cause its parent node to now
-be incorrect. Granted, it's unlikely, and I'm fairly certain that HTML, let
-alone the simplified set I'm allowing will have this problem, but it's worth
-checking for.
-
-The way, I suppose, one would check for it, is whenever a node is removed,
-scroll to it's parent start, and re-evaluate it. Make sure you're able to do
-that with minimal code repetition.
-
-The most complex case can probably be done by using some fancy regexp
-expressions and transformations. However, it doesn't seem right that, say,
-a stray <b> in a <table> can cause the entire table to be removed. Depending
-on how much work we want to do, this will at least need a custom child
-definition, and at most require extra element bubbling capabilities to be
-added.
-
--
-
-So, the way we define these cases should work like this:
-
-class ChildDef with validateChildren($children_tags)
-
-The function needs to parse into nodes, then into the regex array.
-It can result in one of three actions: the removal of the entire parent node,
-replacement of all of the original child tags with a new set of child
-tags which it returns, or no changes. They shall be denoted as, respectively,
-
-Remove entire parent node    = false
-Replace child tags with this = array of tags
-No changes                   = true
-
-If we remove the entire parent node, we must scroll back to the parent of the
-parent.
-
--
-
-Also, what do we do with elements if they're not allowed somewhere? We need
-some sort of default behavior. I reckon that we should be allowed to:
-
-1. Delete the node
-2. Translate it into text (not okay for areas that don't allow #PCDATA)
-3. Move the node to somewhere where it is okay
-
-What complicates the matter is that Firefox has the ability to construct
-DOMs and render invalid nestings of elements (like <b><div>asdf</div></b>).
-This means that behavior for stray pcdata in ul/ol is undefined. Behavior
-with data in a table gets bubbled to the start of the table (assuming
-that we actually custom-make the table child validation class).
-
-So... I say delete the node when PCDATA isn't allowed (or the regex is too
-complicated to determine where PCDATA could be inserted), and translate the node
-to text when PCDATA is allowed.
-
--
-
-ins/del are allowed in block and inline content, but it is
-inappropriate to include block content within an ins element
-occurring in inline content. How would we fix this?
-
 == STAGE 4 - check attributes ==

    STATUS: F (currently implementing core/i18n)
@ -261,11 +156,3 @@ These are the elements that only have %attrs and need an alignment transform
 ----

 Prepend style transformations, as CSS takes precedence.
-
-== PART 5 - stringify ==
-
-    Status: A+ (done completely!)
-
-Should be fairly simple as long as we delegate to appropriate functions.
-It's probably too much trouble to indent the stuff properly, so just output
-stuff raw.
--- a/library/HTMLPurifier/Strategy/FixNesting.php
+++ b/library/HTMLPurifier/Strategy/FixNesting.php
@ -3,8 +3,33 @@
 require_once 'HTMLPurifier/Strategy.php';
 require_once 'HTMLPurifier/Definition.php';

-// EXTRA: provide a mechanism for elements to be bubbled OUT of a node
-// or "Replace Nodes while including the parent nodes too"
+/**
+ * Takes a well formed list of tokens and fixes their nesting.
+ * 
+ * HTML elements dictate which elements are allowed to be their children,
+ * for example, you can't have a p tag in a span tag.  Other elements have
+ * much more rigorous definitions: tables, for instance, require a specific
+ * order for their elements.  There are also constraints not expressible by
+ * document type definitions, such as the chameleon nature of ins/del
+ * tags and global child exclusions.
+ * 
+ * The first major objective of this strategy is to iterate through all the
+ * nodes (not tokens) of the list of tokens and determine whether or not
+ * their children conform to the element's definition.  If they do not, the
+ * child definition may optionally supply an amended list of elements that
+ * is valid or require that the entire node be deleted (and the previous
+ * node rescanned).
+ * 
+ * The second objective is to ensure that explicitly excluded elements of
+ * an element do not appear in its children.  Code that accomplishes this
+ * task is pervasive through the strategy, though the two are distinct tasks
+ * and could, theoretically, be seperated (although it's not recommended).
+ * 
+ * @note Whether or not unrecognized children are silently dropped or
+ *       translated into text depends on the child definitions.
+ * 
+ * @todo Enable nodes to be bubbled out of the structure.
+ */

 class HTMLPurifier_Strategy_FixNesting extends HTMLPurifier_Strategy
 {