mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2024-11-10 15:48:42 +00:00
700d5bcbfc
Injector rewind: Injectors can now use the method rewind() in order to move the input index backwards, so that they can reprocess tokens (other injectors are not affected by a rewind). This functionality was necessary to implement nested node removals in %AutoFormat.RemoveEmpty. End to start ref: To facilitate rewinding, HTMLPurifier_Token_End now maintains a reference called $start to the starting token for their node. %AutoFormat.RemoveEmpty removes empty nodes. Lots of people have requested it, so here is a partially effective implementation. Because it is implemented as an Injector, it's not possible for it to handle newly introduced empty nodes by later validators, specifically auto-closing and child validation. The Injector is only meant to be used on HTML-ish languages. Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
45 lines
1.6 KiB
Plaintext
45 lines
1.6 KiB
Plaintext
AutoFormat.RemoveEmpty
|
|
TYPE: bool
|
|
VERSION: 3.1.2
|
|
DEFAULT: false
|
|
--DESCRIPTION--
|
|
<p>
|
|
When enabled, HTML Purifier will attempt to remove empty elements that
|
|
contribute no semantic information to the document. The following types
|
|
of nodes will be removed:
|
|
</p>
|
|
<ul><li>
|
|
Tags with no attributes and no content, and that are not empty
|
|
elements (remove <code><a></a></code> but not
|
|
<code><br /></code>), and
|
|
</li>
|
|
<li>
|
|
Tags with no content, except for:<ul>
|
|
<li>The <code>colgroup</code> element, or</li>
|
|
<li>
|
|
Elements with the <code>id</code> or <code>name</code> attribute,
|
|
when those attributes are permitted on those elements.
|
|
</li>
|
|
</ul></li>
|
|
</ul>
|
|
<p>
|
|
Please be very careful when using this functionality; while it may not
|
|
seem that empty elements contain useful information, they can alter the
|
|
layout of a document given appropriate styling. This directive is most
|
|
useful when you are processing machine-generated HTML, please avoid using
|
|
it on regular user HTML.
|
|
</p>
|
|
<p>
|
|
Elements that contain only whitespace will be treated as empty. Non-breaking
|
|
spaces, however, do not count as whitespace.
|
|
</p>
|
|
<p>
|
|
This algorithm is not perfect; you may still notice some empty tags,
|
|
particularly if a node had elements, but those elements were later removed
|
|
because they were not permitted in that context, or tags that, after
|
|
being auto-closed by another tag, where empty. This is for safety reasons
|
|
to prevent clever code from breaking validation. The general rule of thumb:
|
|
if a tag looked empty on the way end, it will get removed; if HTML Purifier
|
|
made it empty, it will stay.
|
|
</p>
|