Update txt docs.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1134 48356398-32a2-884e-a903-53898d9a118a
2024-12-22 08:21:52 +00:00 · 2007-06-09 14:53:21 +00:00 · 2007-06-09 14:53:21 +00:00 · 58f00105c8
commit 58f00105c8
parent 8d15d1ce13
6 changed files with 42 additions and 92 deletions
--- a/docs/enduser-security.txt
+++ b/docs/enduser-security.txt
@ -8,15 +8,11 @@ to be effective. Things to remember:

 1. Character Encoding: see enduser-utf8.html for more info.

-2. Doctype: document pending feature completion
-Not strictly necessary, actually. More in-depth discussion once we figure
-out how to get strict loose mode working.
+2. IDs: see enduser-id.html for more info

-3. IDs: see enduser-id.html for more info
-
-4. Links: document pending feature completion
+3. Links: document pending feature completion
 Rudimentary blacklisting, we should also allow only relative URIs. We
 need a doc to explain the stuff.

-5. CSS: document pending
+4. CSS: document pending
 Explain which CSS styles we blocked and why.
--- a/docs/index.html
+++ b/docs/index.html
@ -141,12 +141,6 @@ the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
    <td>List of vendor-specific tags we may want to transform to W3C compliant markup.</td>
 </tr>

-<tr>
-    <td>Reference</td>
-    <td><a href="ref-strictness.txt">Strictness</a></td>
-    <td>Short essay on how loose definition isn't really loose.</td>
-</tr>
-
 <tr>
    <td>Reference</td>
    <td><a href="ref-html-modularization.txt">Modularization of HTMLDefinition</a></td>
--- a/docs/proposal-config.txt
+++ b/docs/proposal-config.txt
@ -1,6 +1,5 @@

 Configuration
-    [needs updating]

 Configuration is documented on a per-use case: if a class uses a certain
 value from the configuration object, it has to define its name and what the
@ -13,29 +12,10 @@ the documentation in ConfigDef for more information on these namespaces.

 Since configuration is dependant on context, internal classes require a
 configuration object to be passed as a parameter.  (They also require a
-Context object).
+Context object). A majority of classes do not need the config object,
+but for those who do, it is a lifesaver.

-In relation to HTMLDefinition and CSSDefinition, there could be a special class
-of directives that influence the *construction* of the Definition object.
-A theoretical call pattern would look like:
-
-1. Client calls Config->getHTMLDefinition()
-2. Config calls HTMLDefinition->createNew(this)
-3. HTMLDefinition constructs itself with base configuration
-4. HTMLDefinition calls Config->get('HTML')
-5. Config returns array of directives
-6. HTMLDefinition performs operations and changes specified by directives
-7. HTMLPurifier returns constructed definition
-8. Config caches definition so it doesn't have to be generated again
-9. Config returns definition
-
-You could also override Config's copy of the definition with your own
-custom copy, which OVERRIDES all directives.  Only the base, vanilla copy
-is the Singleton, the object actually interfaced with is a operated-upon
-clone of that object.  Also, if an update to the directives would update
-the definition, you'd have to force reconstruction.
-
-In practice, the pulling directives from the config object are
-solely need-based, and the flex points are littered throughout the
-setup() function.  Some sort of refactoring is likely in order. See
-ref-xhtml-1.1.txt for more info.
+Definition objects are complex datatypes influenced by their respective
+directive namespaces (HTMLDefinition with HTML and CSSDefinition with CSS).
+If any of these directives is updated, HTML Purifier forces the definition
+to be regenerated.
--- a/docs/proposal-filter-levels.txt
+++ b/docs/proposal-filter-levels.txt
@ -2,23 +2,16 @@
 Filter Levels
    When one size *does not* fit all

-The more I think about it, the less sense it makes for maintaining one huge
-monolithic HTMLDefinition class.  There's simply so much variation that
-could go into this definition: the set of HTML good for blog entries is
-definitely too large for HTML that would be allowed in blog comments. Going
-from Transitional to Strict requires changes to the definition.
+It makes little sense to constrain users to one set of HTML elements and
+attributes and tell them that they are not allowed to mold this in
+any fashion.  Many users demand to be able to custom-select which elements
+and attributes they want.  This is fine: because HTML Purifier keeps close
+track of what elements are safe to use, there is no way for them to
+accidently allow an XSS-able tag.

-Allowing users to specify their own whitelists is one step (implemented, btw), 
-but I have doubts on only doing this. Simply put, the typical programmer is too 
-lazy to actually go through the trouble of investigating which tags, attributes 
-and properties to allow. HTMLDefinition makes a big part of what HTMLPurifier 
-is. 
-
-The idea, then, is to setup fundamentally different set of definitions, which
-can further be customized using simpler configuration options.  Alternatively,
-they could be implemented as configuration profiles, which simply load
-a set of recommended directives to acheive a desired affect (no simpler
-config options though).
+However, combing through the HTML spec to make your own whitelist can
+be a daunting task.  HTML Purifier ought to offer pre-canned filter levels
+that amateur users can select based on what they think is their use-case.

 Here are some fuzzy levels you could set:

@ -46,6 +39,10 @@ make forbidden element to text transformations desirable (for example, images).

 == Element Risk Analysis ==

+Although none of the currently supported elements presents a security
+threat per-say, some can cause problems for page layouts or be
+extremely complicated.
+
 Legend:
    [danger level] - regular tags / uncommon tags ~ deprecated tags
    [danger level]* - rare tags
@ -130,6 +127,7 @@ any CSS properties that are not currently implemented (such as position).
 Dangerous, can go outside container - float
 Easy to abuse - font-size, font-family (font), width
 Colored - background-color (background), border-color (border), color
+    (see proposal-colors.html)
 Dramatic - border, list-style-position (list-style), margin, padding,
    text-align, text-indent, text-transform, vertical-align, line-height

--- a/docs/ref-strictness.txt
+++ b/docs/ref-strictness.txt
@ -1,33 +0,0 @@
-
-Is HTML Purifier Strict or Transitional?
-    [rename/deprecation pending]
-
-Despite the fact that HTML Purifier professes to support both transitional and
-strict HTML, it rejects a lot of attributes and elements that are actually, indeed,
-valid. You can investigate progress.html to find out precisely what we
-are doing to these *deprecated* attributes.
-
-However, users have found that Strict HTML imposes some quite unreasonable
-restrictions on certain things. The start and value attributes in ol and
-li (respectively) perhaps are the most contested. There's is currently no
-widely supported browser method short of JavaScript that can replace these
-two deprecated elements. It behooves us to allow these deprecated
-attributes when the output is transitional.
-
-Fortunantely, that's the only real bugger case. The others have near-perfect
-CSS equivalents, and were presentational anyway. However, the other question
-pops up: should we always convert these to the CSS forms when 1. the spec
-allows them anyway and 2. older browsers support them better? After all, the
-whole point about CSS is to seperate styling from content, so inline styling
-doesn't solve that problem.
-
-[new material]
-
-HTML Purifier 1.7 creates a new organizational system for deprecated attribute/
-element transformations. They will be unified under the title of "Tidy", which
-is what they are: cleaning up after deprecated user markup into standards-compliant
-versions. There will also be a change in the default behavior (athough, to the
-end user not inspecting the HTML, there will be no change: in fact, it may
-work even better).
-
-Consult the Advanced API for more details.
--- a/docs/ref-whatwg.txt
+++ b/docs/ref-whatwg.txt
@ -2,8 +2,23 @@
 Web Hypertext Application Technology Working Group
    WHATWG

-I don't think we need to worry about them.  Untrusted users shouldn't be
-submitting applications, eh?  But if some interesting attribute pops up in
-their spec, and might be worth supporting, stick it here.
+== HTML 5 ==

-HTML 5!!!
+URL: http://www.whatwg.org/specs/web-apps/current-work/
+
+HTML 5 defines a kaboodle of new elements and attributes, as well as
+some well-defined, "quirks mode" HTML parsing.  Although WHATWG professes
+to be targeted towards web applications, many of their semantic additions
+would be quite useful in regular documents. Eventually, HTML
+Purifier will need to audit their lists and figure out what changes need
+to be made.  This process is complicated by the fact that the WHATWG
+doesn't buy into W3C's modularization of XHTML 1.1: we may need
+to remodularize HTML 5 (probably done by section name). No sense in
+committing ourselves till the spec stabilizes, though.
+
+More immediately speaking though, however, is the well-defined parsing
+behavior that HTML 5 adds. While I have little interest in writing
+another DirectLex parser, other parsers like ph5p 
+<http://jero.net/lab/ph5p/> can be adapted to DOMLex to support much more
+flexible HTML parsing (a cool feature I've seen is how they resolve
+<b>bold<i>both</b>italic</i>).