0
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2025-01-09 07:21:54 +00:00

Update documentation (esp. revamp status)

git-svn-id: http://htmlpurifier.org/svnroot/html_purifier/trunk@59 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
Edward Z. Yang 2006-07-20 00:40:04 +00:00
parent ff8f24458d
commit dadfa87acc

View File

@ -56,9 +56,11 @@ HTML tags. Things like blog comments are, in all likelihood, most appropriately
written in an extremely restrictive set of markup that doesn't require written in an extremely restrictive set of markup that doesn't require
all this functionality (or not written in HTML at all). all this functionality (or not written in HTML at all).
== STAGE 1 - parsing == == STAGE 1 - parsing ==
: Status - largely FINISHED with a few quirks to work out Status: A (see source, mainly internal raw)
We've got two options for this: HTMLSax or my MarkupLexer. Hopefully, we We've got two options for this: HTMLSax or my MarkupLexer. Hopefully, we
can make the two interfaces compatible. This means that we need a lot can make the two interfaces compatible. This means that we need a lot
@ -77,9 +79,11 @@ Ignorable/not being implemented (although we probably want to output them raw):
Prefixed with MF (Markup Fragment). We'll make 'em all immutable value objects. Prefixed with MF (Markup Fragment). We'll make 'em all immutable value objects.
== STAGE 2 - remove foreign elements == == STAGE 2 - remove foreign elements ==
: Status - Core functionality finished, transformations not started Status: A- (transformations need to be implemented)
At this point, the parser needs to start knowing about the DTD. Since we At this point, the parser needs to start knowing about the DTD. Since we
hold everything in an associative $info array, if it's set, it's valid, and hold everything in an associative $info array, if it's set, it's valid, and
@ -93,9 +97,11 @@ into Data (although I don't see why we can't do that at the start).
One last thing: the remove foreign elements has to do the element One last thing: the remove foreign elements has to do the element
transformations, from FONT to SPAN, etc. transformations, from FONT to SPAN, etc.
== STAGE 3 - make well formed == == STAGE 3 - make well formed ==
: Finished, but could have better well-formedness fixing Status: A- (not as good as possible)
Now we step through the whole thing and correct nesting issues. Most of the Now we step through the whole thing and correct nesting issues. Most of the
time, it's making sure the tags match up, but there's some trickery going on time, it's making sure the tags match up, but there's some trickery going on
@ -112,9 +118,11 @@ for HTML's quirks. They are:
We also want to do translations, like from FONT to SPAN with STYLE. We also want to do translations, like from FONT to SPAN with STYLE.
== STAGE 4 - check nesting == == STAGE 4 - check nesting ==
: Child definitions finished, actual function body not started Status: B (table custom definition needs to be implemented)
We know that the document is now well formed. The tokenizer should now take We know that the document is now well formed. The tokenizer should now take
things in nodes: when you hit a start tag, keep on going until you get its things in nodes: when you hit a start tag, keep on going until you get its
@ -272,6 +280,8 @@ implement custom handlers for each one that specify the stuff correctly.
== STAGE 4 - check attributes == == STAGE 4 - check attributes ==
STATUS: N (not started)
While we're doing all this nesting hocus-pocus, attributes are also being While we're doing all this nesting hocus-pocus, attributes are also being
checked. The reason why we need this to be done with the nesting stuff checked. The reason why we need this to be done with the nesting stuff
is if a REQUIRED attribute is not there, we might need to kill the tag (or is if a REQUIRED attribute is not there, we might need to kill the tag (or
@ -341,6 +351,8 @@ that HTML_Safe defines to help determine a whitelist.
== PART 5 - stringify == == PART 5 - stringify ==
Status: A+ (done completely!)
Should be fairly simple as long as we delegate to appropriate functions. Should be fairly simple as long as we delegate to appropriate functions.
It's probably too much trouble to indent the stuff properly, so just output It's probably too much trouble to indent the stuff properly, so just output
stuff raw. stuff raw.