mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2024-12-23 00:41:52 +00:00
Update documentation (esp. revamp status)
git-svn-id: http://htmlpurifier.org/svnroot/html_purifier/trunk@59 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
parent
ff8f24458d
commit
dadfa87acc
@ -56,9 +56,11 @@ HTML tags. Things like blog comments are, in all likelihood, most appropriately
|
|||||||
written in an extremely restrictive set of markup that doesn't require
|
written in an extremely restrictive set of markup that doesn't require
|
||||||
all this functionality (or not written in HTML at all).
|
all this functionality (or not written in HTML at all).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
== STAGE 1 - parsing ==
|
== STAGE 1 - parsing ==
|
||||||
|
|
||||||
: Status - largely FINISHED with a few quirks to work out
|
Status: A (see source, mainly internal raw)
|
||||||
|
|
||||||
We've got two options for this: HTMLSax or my MarkupLexer. Hopefully, we
|
We've got two options for this: HTMLSax or my MarkupLexer. Hopefully, we
|
||||||
can make the two interfaces compatible. This means that we need a lot
|
can make the two interfaces compatible. This means that we need a lot
|
||||||
@ -77,9 +79,11 @@ Ignorable/not being implemented (although we probably want to output them raw):
|
|||||||
|
|
||||||
Prefixed with MF (Markup Fragment). We'll make 'em all immutable value objects.
|
Prefixed with MF (Markup Fragment). We'll make 'em all immutable value objects.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
== STAGE 2 - remove foreign elements ==
|
== STAGE 2 - remove foreign elements ==
|
||||||
|
|
||||||
: Status - Core functionality finished, transformations not started
|
Status: A- (transformations need to be implemented)
|
||||||
|
|
||||||
At this point, the parser needs to start knowing about the DTD. Since we
|
At this point, the parser needs to start knowing about the DTD. Since we
|
||||||
hold everything in an associative $info array, if it's set, it's valid, and
|
hold everything in an associative $info array, if it's set, it's valid, and
|
||||||
@ -93,9 +97,11 @@ into Data (although I don't see why we can't do that at the start).
|
|||||||
One last thing: the remove foreign elements has to do the element
|
One last thing: the remove foreign elements has to do the element
|
||||||
transformations, from FONT to SPAN, etc.
|
transformations, from FONT to SPAN, etc.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
== STAGE 3 - make well formed ==
|
== STAGE 3 - make well formed ==
|
||||||
|
|
||||||
: Finished, but could have better well-formedness fixing
|
Status: A- (not as good as possible)
|
||||||
|
|
||||||
Now we step through the whole thing and correct nesting issues. Most of the
|
Now we step through the whole thing and correct nesting issues. Most of the
|
||||||
time, it's making sure the tags match up, but there's some trickery going on
|
time, it's making sure the tags match up, but there's some trickery going on
|
||||||
@ -112,9 +118,11 @@ for HTML's quirks. They are:
|
|||||||
|
|
||||||
We also want to do translations, like from FONT to SPAN with STYLE.
|
We also want to do translations, like from FONT to SPAN with STYLE.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
== STAGE 4 - check nesting ==
|
== STAGE 4 - check nesting ==
|
||||||
|
|
||||||
: Child definitions finished, actual function body not started
|
Status: B (table custom definition needs to be implemented)
|
||||||
|
|
||||||
We know that the document is now well formed. The tokenizer should now take
|
We know that the document is now well formed. The tokenizer should now take
|
||||||
things in nodes: when you hit a start tag, keep on going until you get its
|
things in nodes: when you hit a start tag, keep on going until you get its
|
||||||
@ -272,6 +280,8 @@ implement custom handlers for each one that specify the stuff correctly.
|
|||||||
|
|
||||||
== STAGE 4 - check attributes ==
|
== STAGE 4 - check attributes ==
|
||||||
|
|
||||||
|
STATUS: N (not started)
|
||||||
|
|
||||||
While we're doing all this nesting hocus-pocus, attributes are also being
|
While we're doing all this nesting hocus-pocus, attributes are also being
|
||||||
checked. The reason why we need this to be done with the nesting stuff
|
checked. The reason why we need this to be done with the nesting stuff
|
||||||
is if a REQUIRED attribute is not there, we might need to kill the tag (or
|
is if a REQUIRED attribute is not there, we might need to kill the tag (or
|
||||||
@ -341,6 +351,8 @@ that HTML_Safe defines to help determine a whitelist.
|
|||||||
|
|
||||||
== PART 5 - stringify ==
|
== PART 5 - stringify ==
|
||||||
|
|
||||||
|
Status: A+ (done completely!)
|
||||||
|
|
||||||
Should be fairly simple as long as we delegate to appropriate functions.
|
Should be fairly simple as long as we delegate to appropriate functions.
|
||||||
It's probably too much trouble to indent the stuff properly, so just output
|
It's probably too much trouble to indent the stuff properly, so just output
|
||||||
stuff raw.
|
stuff raw.
|
||||||
|
Loading…
Reference in New Issue
Block a user