Update docs.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@254 48356398-32a2-884e-a903-53898d9a118a
2025-01-03 05:11:52 +00:00 · 2006-08-14 21:21:54 +00:00 · 2006-08-14 21:21:54 +00:00 · 4ef26bbd31
commit 4ef26bbd31
parent 218eb67167
2 changed files with 14 additions and 114 deletions
--- a/docs/security.txt
+++ b/docs/security.txt
@ -13,24 +13,27 @@ your character encoding, you should switch. Now. (in future versions, however,
 I may make the character encoding configurable, but there's only so much I
 can do). Make sure any input is properly converted to UTF-8, or the parser
 will mangle it badly (though it won't be a security risk if you're outputting
-it as UTF-8).
+it as UTF-8 though).

 2. XHTML 1.0 Transitional. This is what the parser is outputting. For the most
 part, it's compatible with HTML 4.01, but XHTML enforces some very nice things
 that all web developers should use. Regardless, NO DOCTYPE is a NO. Quirks mode
-has waaaay too many quirks for a little parser to handle. We did not select
-strict in order to prevent ourselves from being too draconic on users.
+has waaaay too many quirks for a little parser to handle.  We did not select
+strict in order to prevent ourselves from being too draconic on users, but
+this may be configurable in the future.

-3. [PROJECTED] IDs. They need to be unique, but without some knowledge of the
-rest of the document, it's difficult to know what's unique. I project default
-behavior being a customizable prefix to all ID declarations in the document,
-so make sure you don't use that prefix. Might cause problems for multiple
-instances of HTML escaped output too (especially when it comes to caching).
-Best to just zap them completely, perhaps. This will be configurable, and you'll
-have to pick the correct one.
+3. IDs. They need to be unique, but without some knowledge of the
+rest of the document, it's difficult to know what's unique. Without setting
+%Attr.IDBlacklist to the proper 

 4. [PROJECTED] Links. We're not going to try for spam protection (although
 some hooks for such a module might be nice) but we may offer the ability to
 only accept relative URLs. Pick the one that's right for you.

-5. [PROJECTED] CSS. What a knotty issue. Probably will have to be configurable.
+5. CSS. While we can prevent the most flagrant cases from affecting your
+layout (such as absolutely positioned elements), no amount of code is going
+to protect your pages from being attacked by garish colors and plain old
+bad taste.  A neat feature would be the ability to define acceptable colors
+in a document, but that's not likely to be implemented for a while.  In the
+meantime, be sure to make sure that floated elements (permitted, since they
+can be quite useful) cna't mess up your layout.
--- a/docs/spec.txt
+++ b/docs/spec.txt
@ -53,106 +53,3 @@ HTML Purifier is best suited for documents that require a rich array of
 HTML tags.  Things like blog comments are, in all likelihood, most appropriately
 written in an extremely restrictive set of markup that doesn't require
 all this functionality (or not written in HTML at all).
-
-The rest of this document is pending moving into their associated classes.
-
-== STAGE 4 - check attributes ==
-
-    STATUS: F (currently implementing core/i18n)
-
-While we're doing all this nesting hocus-pocus, attributes are also being
-checked. The reason why we need this to be done with the nesting stuff
-is if a REQUIRED attribute is not there, we might need to kill the tag (or
-replace it with data). Fortunantely, this is rare enough that we only have
-to worry about it for certain things:
-
-* ! bdo - dir > replace with span, preserve attributes
-* ! img - src, alt > if only alt is missing, insert filename, else remove img
-* basefont - size
-* param - name
-* applet - width, height
-* map - id
-* area - alt
-* form - action
-* optgroup - label
-* textarea - rows, cols
-
-As you can see, only two of them we would remotely consider for our simplified
-tag set. But each has a different set of challenges. For the img tag, we'd
-have to be careful about deleting it. If we do hit a snag, we can supply
-a default "blank" image.
-
-So after that's all said and done, each of the different types of content
-inside the attributes needs to be handled differently.
-
-ContentType(s)  [RFC2045]
-Charset(s)      [RFC2045]
-LanguageCode    [RFC3066] (NMTOKEN)
-Character       [XML][2.2] (a single character)
-Number          /^\d+$/
-LinkTypes       [HTML][6.12] <space>
-MediaDesc       [HTML][6.13] <comma>
-URI/UriList     [RFC2396] <space>
-Datetime        (ISO date format)
-Script          ...
-StyleSheet      [CSS] (complex)
-Text            CDATA
-FrameTarget     NMTOKEN
-Length          (pixel, percentage) (?:px suffix allowed?)
-MultiLength     (pixel, percentage, or relative)
-Pixels          (integer)
-// map attributes omitted
-ImgAlign        (top|middle|bottom|left|right)
-Color           #NNNNNN, #NNN or color name (translate it
-    Black  = #000000    Green  = #008000
-    Silver = #C0C0C0    Lime   = #00FF00
-    Gray   = #808080    Olive  = #808000
-    White  = #FFFFFF    Yellow = #FFFF00
-    Maroon = #800000    Navy   = #000080
-    Red    = #FF0000    Blue   = #0000FF
-    Purple = #800080    Teal   = #008080
-    Fuchsia= #FF00FF    Aqua   = #00FFFF
-// plus some directly in the spec
-
-Everything else is either ID, or defined as a certain set of values.
-
-Unless we use reflection (which then we have to make sure the attribute exists),
-we probably want to have a function like...
-
-  validate($type, $value) where $type is like ContentType or Number
-
-and then pass it to a switch.
-
-The final problem is CSS. Get intimate with the syntax here:
-http://www.w3.org/TR/CSS21/syndata.html and also note the "bad" CSS elements
-that HTML_Safe defines to help determine a whitelist.
-
----
-
-<!ENTITY % coreattrs
- "id          ID             #IMPLIED
-  class       CDATA          #IMPLIED
-  style       %StyleSheet;   #IMPLIED
-  title       %Text;         #IMPLIED"
-  >
-
-<!ENTITY % i18n
- "lang        %LanguageCode; #IMPLIED
-  xml:lang    %LanguageCode; #IMPLIED
-  dir         (ltr|rtl)      #IMPLIED"
-  >
-
-<!ENTITY % attrs "%coreattrs; %i18n;">
-
----
-
-These are the elements that only have %attrs:
-    ul, dl, dt, dd, address, span, em, strong, dfn, code, samp, kbd, var,
-    cite, abbr, acronym, sub, sup, tt, i, b, big, small, u, s, strike
-
-These are the elements that only have %attrs and need an alignment transform
-    div, p, h1, h2, h3, h4, h5, h6
-
----
-
-Prepend style transformations, as CSS takes precedence.