mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2025-01-08 15:11:51 +00:00
Update documentation.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@319 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
parent
dcec92e7b3
commit
ca1453401f
19
TODO
19
TODO
@ -3,23 +3,32 @@ Todo List
|
|||||||
Core:
|
Core:
|
||||||
- Finish table and shorthand CSS attributes
|
- Finish table and shorthand CSS attributes
|
||||||
- border-collapse, caption-side, empty-cells, table-layout, vertical-align
|
- border-collapse, caption-side, empty-cells, table-layout, vertical-align
|
||||||
- background
|
- background (and friends)
|
||||||
- border, border-*
|
- border, border-*
|
||||||
- font
|
- font
|
||||||
- list-style
|
- list-style
|
||||||
- Implement all non-essential attribute transforms
|
- Implement all non-essential attribute transforms
|
||||||
- Microsoft Word HTML cleaning
|
- Microsoft Word HTML cleaning
|
||||||
- Plugins for major CMSes
|
- Plugins for major CMSes
|
||||||
|
- Rewrite *Definition and Config relationship, add various "levels" of cleaning
|
||||||
|
- Support other character encodings out-of-the-box
|
||||||
|
- Allow strict HTML 4.01, loose HTML 4.01 and strict XHTML 1.0 output
|
||||||
|
|
||||||
Code issues:
|
Code issues:
|
||||||
- Massive profiling, make it faster!
|
- Massive profiling, make it faster!
|
||||||
- Make URI validation routines tighter (especially mailto)
|
- Make URI validation routines tighter (especially mailto)
|
||||||
- Distinguish between different types of URIs, for instance, a mailto URI
|
- Distinguish between different types of URIs, for instance, a mailto URI
|
||||||
in IMG SRC is nonsensical
|
in IMG SRC is nonsensical
|
||||||
- Factor out Host validation to its own AttrDef
|
- Rewrite table's child definition to be faster, smart, and regexp free
|
||||||
- Rewrite table's child definition
|
- Silently drop content inbetween SCRIPT tags (can be generalized to allow
|
||||||
- Silently drop content inbetween SCRIPT tags
|
specification of elements that, when detected as foreign, trigger removal
|
||||||
|
of children, although unbalanced tags could wreck havoc (or at least delete
|
||||||
|
the rest of the document).
|
||||||
|
|
||||||
Enhancements:
|
Enhancements:
|
||||||
- Do fixes for Firefox's inability to handle COL alignment props (Bug 915)
|
- Fixes for Firefox's inability to handle COL alignment props (Bug 915)
|
||||||
- Pretty-printing HTML
|
- Pretty-printing HTML
|
||||||
|
- Hooks for adding custom processors to custom namespaced tags and attributes,
|
||||||
|
offer default implementation
|
||||||
|
- Auto-paragraphing (be sure to leverage fact that we know when things
|
||||||
|
shouldn't be paragraphed, such as lists and tables).
|
||||||
|
@ -21,13 +21,15 @@ AttrDef
|
|||||||
variable overwriting, missing validation for query, fragment and path,
|
variable overwriting, missing validation for query, fragment and path,
|
||||||
no percent-encode fixing
|
no percent-encode fixing
|
||||||
CSS - parser doesn't accept advanced CSS (fringe)
|
CSS - parser doesn't accept advanced CSS (fringe)
|
||||||
|
Number - constructor interface is inconsistent with Integer
|
||||||
AttrTransform - doesn't accept AttrContext, non-validating
|
AttrTransform - doesn't accept AttrContext, non-validating
|
||||||
Lang - invalid xml:lang value can overwrite valid lang value (fringe)
|
|
||||||
ChildDef - not-allowed nodes translated to text, likely invalid handling
|
ChildDef - not-allowed nodes translated to text, likely invalid handling
|
||||||
Config - "load configuration" hooks missing, rich set* accessors missing
|
Config - "load configuration" hooks missing, rich set* accessors missing,
|
||||||
|
needs redefined relationship with the definitions
|
||||||
Strategy
|
Strategy
|
||||||
FixNesting - cannot bubble nodes out of structures
|
FixNesting - cannot bubble nodes out of structures
|
||||||
MakeWellFormed - insufficient automatic closing definitions
|
MakeWellFormed - insufficient automatic closing definitions (check HTML
|
||||||
|
spec for optional end tags).
|
||||||
RemoveForeignElements - should be run in parallel with MakeWellFormed
|
RemoveForeignElements - should be run in parallel with MakeWellFormed
|
||||||
URIScheme - needs to have callable generic checks
|
URIScheme - needs to have callable generic checks
|
||||||
ftp - missing typecode check
|
ftp - missing typecode check
|
||||||
|
@ -28,6 +28,7 @@ time. Note the naming convention: %Namespace.Directive
|
|||||||
|
|
||||||
%Attr.MaxWidth,
|
%Attr.MaxWidth,
|
||||||
%Attr.MaxHeight - caps for width and height related checks.
|
%Attr.MaxHeight - caps for width and height related checks.
|
||||||
|
(a hack in Pixels for an image crashing attack could be replaced by this)
|
||||||
|
|
||||||
%URI.Munge - will munge all URIs to a different URI, which should redirect
|
%URI.Munge - will munge all URIs to a different URI, which should redirect
|
||||||
the user to the applicable page. A urlencoded version of the URI
|
the user to the applicable page. A urlencoded version of the URI
|
||||||
|
@ -17,6 +17,8 @@ are passed. These classes are: HTMLPurifier::*, Generator::generateFromTokens
|
|||||||
and Lexer::tokenizeHTML. However, whenever a valid configuration object
|
and Lexer::tokenizeHTML. However, whenever a valid configuration object
|
||||||
is defined, that object should be used.
|
is defined, that object should be used.
|
||||||
|
|
||||||
|
-- the following is projected changes to the configuration system --
|
||||||
|
|
||||||
In relation to HTMLDefinition and CSSDefinition, there are going to be some
|
In relation to HTMLDefinition and CSSDefinition, there are going to be some
|
||||||
major structural changes to enable the easy configuration of these objects.
|
major structural changes to enable the easy configuration of these objects.
|
||||||
Due to the intricacy of these objects, it's not feasible to ask an average
|
Due to the intricacy of these objects, it's not feasible to ask an average
|
||||||
|
@ -9,11 +9,11 @@ to be effective. Things to remember:
|
|||||||
1. UTF-8. Currently, the parser runs under the assumption that it is dealing
|
1. UTF-8. Currently, the parser runs under the assumption that it is dealing
|
||||||
with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no
|
with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no
|
||||||
character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as
|
character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as
|
||||||
your character encoding, you should switch. Now. (in future versions, however,
|
your character encoding, you should switch. Now. Make sure any input is
|
||||||
I may make the character encoding configurable, but there's only so much I
|
properly converted to UTF-8, or the parser will mangle it badly
|
||||||
can do). Make sure any input is properly converted to UTF-8, or the parser
|
(though it won't be a security risk if you're outputting it as UTF-8 though).
|
||||||
will mangle it badly (though it won't be a security risk if you're outputting
|
We will be adding out-of-the-box support for the other major character
|
||||||
it as UTF-8 though).
|
encodings shortly.
|
||||||
|
|
||||||
2. XHTML 1.0 Transitional. This is what the parser is outputting. For the most
|
2. XHTML 1.0 Transitional. This is what the parser is outputting. For the most
|
||||||
part, it's compatible with HTML 4.01, but XHTML enforces some very nice things
|
part, it's compatible with HTML 4.01, but XHTML enforces some very nice things
|
||||||
@ -23,8 +23,9 @@ strict in order to prevent ourselves from being too draconic on users, but
|
|||||||
this may be configurable in the future.
|
this may be configurable in the future.
|
||||||
|
|
||||||
3. IDs. They need to be unique, but without some knowledge of the
|
3. IDs. They need to be unique, but without some knowledge of the
|
||||||
rest of the document, it's difficult to know what's unique. Without setting
|
rest of the document, it's difficult to know what's unique. %Attr.IDBlacklist
|
||||||
%Attr.IDBlacklist to the proper
|
needs to be set: we may want to consider disallowing IDs by default to
|
||||||
|
save lazy programmers.
|
||||||
|
|
||||||
4. [PROJECTED] Links. We're not going to try for spam protection (although
|
4. [PROJECTED] Links. We're not going to try for spam protection (although
|
||||||
some hooks for such a module might be nice) but we may offer the ability to
|
some hooks for such a module might be nice) but we may offer the ability to
|
||||||
@ -36,4 +37,4 @@ to protect your pages from being attacked by garish colors and plain old
|
|||||||
bad taste. A neat feature would be the ability to define acceptable colors
|
bad taste. A neat feature would be the ability to define acceptable colors
|
||||||
in a document, but that's not likely to be implemented for a while. In the
|
in a document, but that's not likely to be implemented for a while. In the
|
||||||
meantime, be sure to make sure that floated elements (permitted, since they
|
meantime, be sure to make sure that floated elements (permitted, since they
|
||||||
can be quite useful) cna't mess up your layout.
|
can be quite useful) can't mess up your layout.
|
||||||
|
@ -29,7 +29,8 @@ output is valid XHTML or send the HTML through a draconic XML parser (and yet
|
|||||||
still get the nesting wrong: SafeHtmlChecker.class.php does not prevent <a>
|
still get the nesting wrong: SafeHtmlChecker.class.php does not prevent <a>
|
||||||
tags from being nested within each other).
|
tags from being nested within each other).
|
||||||
|
|
||||||
This document seeks to detail the inner workings of HTML Purifier. The first
|
This document no longer is a detailed description of how HTMLPurifier works,
|
||||||
|
as those descriptions have been moved to the appropriate code. The first
|
||||||
draft was drawn up after two rough code sketches and the implementation of a
|
draft was drawn up after two rough code sketches and the implementation of a
|
||||||
forgiving lexer. You may also be interested in the unit tests located in the
|
forgiving lexer. You may also be interested in the unit tests located in the
|
||||||
tests/ folder, which provide a living document on how exactly the filter deals
|
tests/ folder, which provide a living document on how exactly the filter deals
|
||||||
@ -52,4 +53,5 @@ In summary:
|
|||||||
HTML Purifier is best suited for documents that require a rich array of
|
HTML Purifier is best suited for documents that require a rich array of
|
||||||
HTML tags. Things like blog comments are, in all likelihood, most appropriately
|
HTML tags. Things like blog comments are, in all likelihood, most appropriately
|
||||||
written in an extremely restrictive set of markup that doesn't require
|
written in an extremely restrictive set of markup that doesn't require
|
||||||
all this functionality (or not written in HTML at all).
|
all this functionality (or not written in HTML at all), although this may
|
||||||
|
be changing in the future.
|
||||||
|
Loading…
Reference in New Issue
Block a user