0
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2024-11-08 14:58:42 +00:00

Update docs, esp in context of soon to be added tag transforms.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@666 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
Edward Z. Yang 2007-01-20 03:59:07 +00:00
parent fbe2c25f8a
commit a8db22dfff
4 changed files with 20 additions and 10 deletions

View File

@ -7,6 +7,7 @@ and it's up to you to provide it the proper information and proper context
to be effective. Things to remember:
1. Character Encoding: UTF-8.
This segment will soon be obsoleted by enduser-utf8.html
Currently, the parser runs under the assumption that it is dealing
with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no
character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as
@ -27,6 +28,7 @@ this may be configurable in the future. Do you want standards compliance?
The doctype is a good place to start.
3. IDs
This segment is obsoleted by enduser-id.html
They need to be unique, but without some knowledge of the
rest of the document, it's difficult to know what's unique. %Attr.IDBlacklist
needs to be set: we may want to consider disallowing IDs by default to

View File

@ -14,15 +14,15 @@ Since configuration is dependant on context, internal classes require a
configuration object to be passed as a parameter. (They also require a
Context object).
In relation to HTMLDefinition and CSSDefinition, there is a special class
In relation to HTMLDefinition and CSSDefinition, there could be a special class
of directives that influence the *construction* of the Definition object.
A standard call pattern would look like:
A theoretical call pattern would look like:
1. Client calls Config->getHTMLDefinition()
2. Config calls HTMLDefinition->createNew(this)
3. HTMLDefinition constructs itself with base configuration
4. HTMLDefinition calls Config->get('HTMLDefinition')
5. Config returns array of directives that later construction
4. HTMLDefinition calls Config->get('HTML')
5. Config returns array of directives
6. HTMLDefinition performs operations and changes specified by directives
7. HTMLPurifier returns constructed definition
8. Config caches definition so it doesn't have to be generated again
@ -33,3 +33,7 @@ custom copy, which OVERRIDES all directives. Only the base, vanilla copy
is the Singleton, the object actually interfaced with is a operated-upon
clone of that object. Also, if an update to the directives would update
the definition, you'd have to force reconstruction.
In practice, the pulling directives from the config object are
solely need-based, and the flex points are littered throughout the
setup() function. Some sort of refactoring is likely in order.

View File

@ -15,7 +15,10 @@ and properties to allow. HTMLDefinition makes a big part of what HTMLPurifier
is.
The idea, then, is to setup fundamentally different set of definitions, which
can further be customized using simpler configuration options.
can further be customized using simpler configuration options. Alternatively,
they could be implemented as configuration profiles, which simply load
a set of recommended directives to acheive a desired affect (no simpler
config options though).
Here are some fuzzy levels you could set:

View File

@ -2,8 +2,8 @@
Is HTML Purifier Strict or Transitional?
A little bit of helpful guidance
Despite the fact that HTML Purifier professes only to support transitional
HTML, it rejects a lot of attributes and elements that are actually, indeed,
Despite the fact that HTML Purifier professes to support both transitional and
strict HTML, it rejects a lot of attributes and elements that are actually, indeed,
valid. You can investigate progress.html to find out precisely what we
are doing to these *deprecated* attributes.
@ -11,8 +11,8 @@ However, users have found that Strict HTML imposes some quite unreasonable
restrictions on certain things. The start and value attributes in ol and
li (respectively) perhaps are the most contested. There's is currently no
widely supported browser method short of JavaScript that can replace these
two deprecated elements. HTML Purifier does not currently support them, but
it might behoove us to do so while our output is still transitional.
two deprecated elements. It behooves us to allow these deprecated
attributes when the output is transitional.
Fortunantely, that's the only real bugger case. The others have near-perfect
CSS equivalents, and were presentational anyway. However, the other question
@ -32,5 +32,6 @@ these loose-only constructs in loose mode:
The changed child definitions as well as the ul.start li.value are the most
compelling reasons why loose should be used. We may want offer disabling <u>,
<strike> and <s> by themselves.
<strike> and <s> by themselves. We may also want to offer no pre-emptive
deprecated conversions. This all must be unified.