0
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2024-11-08 14:58:42 +00:00

Rename and rewrite content models docs.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1123 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
Edward Z. Yang 2007-06-02 18:51:50 +00:00
parent b442d09ea6
commit 2e089477a5
2 changed files with 48 additions and 18 deletions

View File

@ -0,0 +1,48 @@
Handling Content Model Changes
1. Context
The distinction between Transitional and Strict document types is somewhat
of an anomaly in the lineage of XHTML document types (following 1.0, no
doctypes do not have flavors: instead, modularization is used to let
document authors vary their elements). This transition is usually quite
straight-forward, as W3C usually deprecates attributes or elements, which
are quite easily handled using tag and attribute transforms.
However, for two elements, <blockquote>, <body> and <address>, W3C elected
to also change the content model. <blockquote> and <body> originally
accepted both inline and block elements, but in the strict doctype they
only allow block elements. With <address>, the situation is inverted:
<p> tags were now forbidden from appearing within this tag.
2. Current situation
Currently, HTML Purifier treats <blockquote> specially during Tidy mode
using a custom ChildDef class StrictBlockquote. StrictBlockquote
operates similarly to Required, except that when it encounters an inline
element, it will wrap it in a block tag (as specified by
%HTML.BlockWrapper, the default is <p>). The naming suggests it can
only be used for <blockquote>s, although it may be possible to
genericize it to work on other cases of this nature (this would be of
little practical application, as no other element in XHTML 1.1 or earlier
has a block-only content model).
Tidy currently contains no custom, lenient implementation for <address>.
If one were to be written, it would likely operate on the principle that,
when a <p> tag were to be encountered, it would be replaced with a
leading and trailing <br /> tag (the contents of <p>, being inline, are
not an issue). There is no prior work with this sort of operation.
3. Outside applicability
There are a number of other elements that contain restrictive content
models, such as <ul> or <span> (the latter is restrictive in that it
does not allow block elements). In the former case, an errant node
is eliminated completely, in the latter case, the text of the node
would is preserved (as the parent node does allow PCDATA). Custom
content model implementations probably are not the best way of handling
these cases, instead, node bubbling should be implemented instead.

View File

@ -1,18 +0,0 @@
Loose versus Strict
[rename/deprecation pending]
The most common change between doctypes are between the two flavors of HTML 4.01 and
XHTML 1.0: Transitional (Loose) and Strict. Besides deprecated attributes and elements
(which are quite easy to identify), there are two content model changes that were
made:
BLOCKQUOTE changes from 'flow' to 'block'
current behavior: inline inner contents should not be nuked, block-ify as necessary
ADDRESS from potpourri to Inline (removes p tags)
current behavior: block tags silently dropped
ideal behavior: replace block elements with something like <br>. (not high priority,
somewhat difficult to implement)
We're missing strict support for U, S, STRIKE: this needs to be fixed soon (and
is quite simple to fix).