This mega-patch rips out the FixNesting implementation and the related
ChildDef components. The primary algorithmic change is to convert from
use of tokens to tree nodes, which are far more amenable to the style
of processing that FixNesting uses. Additionally, FixNesting has been
changed to go bottom-up rather than top-down, in order to avoid needing
to implement backtracking.
This patch simplifies a good deal of the relevant logic, since we no
longer need to continually recalculate the nesting structure when
processing things. However, the conversion to the alternate format
incurs some overhead, so for small inputs these changes are not a win.
One possibility to greatly reduce the constant factors here is to switch
to entirely using libxml's representation, and never serializing tokens;
this would require one to rewrite injectors, however.
The iterative post-order traversal in FixNesting is a bit subtle, but
we have essentially reified the stack and continuations.
We've removed support for %Core.EscapeInvalidChildren.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
This commit is a limited implementation of the "active formatting
elements" algorithm implemented in HTML5, which preserves certain
formatting elements such as <a> and <b> when exiting or entering nodes.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
The newest autoclose code uses the elements property in whether or not an
element should be closed by a particular tag. The heuristic is simple; if
the element doesn't allow that tag as a child, it closes the parent
container. This doesn't work, however, with <blockquote>, which while not
allowing inline styles under Strict doctypes, requires them to be passed
through MakeWellFormed.
The fix was to transition MakeWellFormed to call a method to retrieve the
elements, and then have StrictBlockquote implement a special version of
this method. Future versions of HTML Purifier may be more flexible in this
regard--further study of the HTML5 specification is required.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
- Removed HTMLPurifier_Error
- Documentation updates
- Removed more copy() methods in favor of clone
- HTMLPurifier::getInstance() to HTMLPurifier::instance()
- Fix InterchangeBuilder to use HTMLPURIFIER_PREFIX
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1689 48356398-32a2-884e-a903-53898d9a118a
- Update TODO list
- URISchemeRegistry doesn't return a reference for instance anymore, should do the same for other singletons
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1477 48356398-32a2-884e-a903-53898d9a118a
- Added @public identifiers to properties that the Printers are using.
- Augmented Printer::getClass() to include meta-info about the object (contained inside parentheses). Currently supports: enum, composite and multiple.
- Remove all linebreaks from Printer output
- Document Printer_HTMLDefinition's methods.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@581 48356398-32a2-884e-a903-53898d9a118a
- Partially finished migrating to new Context object (done in r485).
- Created HTMLPurifier_Harness to assist with testing, ChildDefTest migrated to that framework.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@484 48356398-32a2-884e-a903-53898d9a118a
- Added notes on HTML versus XML attribute whitespace handling
- Noted that HTMLPurifier_ChildDef_Custom isn't being used
- Noted that config object's definitions are cached versions
- Hooked up HTMLPurifier_ChildDef_Custom's unit tests (they weren't being run)
- Tester named "HTML Purifier" not "HTMLPurifier"
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@472 48356398-32a2-884e-a903-53898d9a118a
- Defined new directive %Core.EscapeInvalidChildren, for previously commented out functionality
- Removed convenience configuration generation: you *have* to pass it unless you're interfacing with HTMLPurifier
- Homogenized function parameters even when only a few of them are used
- Rewrote unit tests that expected previous behavior
- Introduced configuration object to ChildDef tests
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@243 48356398-32a2-884e-a903-53898d9a118a
- Move around child definitions so they make a little more sense (rename to Custom) and also add $allow_empty property to help FixNesting.php determine whether or not to double-check.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@136 48356398-32a2-884e-a903-53898d9a118a