From 7a6de55f76839980dc237e0c52d8b3036fdf6fa4 Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang" <edwardzyang@thewritingpot.com>
Date: Sun, 17 Sep 2006 21:53:12 +0000
Subject: [PATCH] [1.1.1] Update documentation.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@435 48356398-32a2-884e-a903-53898d9a118a
---
 docs/code-quality.txt | 22 +++++++++++-----------
 docs/optimization.txt |  3 ++-
 docs/security.txt     | 27 ++++++++++++++++++---------
 3 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/docs/code-quality.txt b/docs/code-quality.txt
index 18b6a7dd..5b54b699 100644
--- a/docs/code-quality.txt
+++ b/docs/code-quality.txt
@@ -11,24 +11,24 @@ profiling.
 Here we go:
 
 AttrDef
-    Class - doesn't support Unicode characters, uses regular expressions
-    Lang - code duplication, premature optimization, doesn't consult official
-        lists
-    Pixels/Length/MultiLength - implemented according to HTML spec (excludes
-        code reuse in CSS)
-    URI - multiple regular expressions, needs host validation routines factored
-        out for mailto scheme, IPv6 validation is broken (fringe), unintuitive
-        variable overwriting, missing validation for query, fragment and path,
+    Class - doesn't support Unicode characters (fringe); uses regular
+        expressions
+    Lang - code duplication; premature optimization; doesn't consult official
+        lists (fringe)
+    Length - easily mistaken for CSSLength
+    URI - multiple regular expressions; needs host validation routines factored
+        out for mailto scheme; missing validation for query; fragment and path,
         no percent-encode fixing
     CSS - parser doesn't accept advanced CSS (fringe)
     Number - constructor interface is inconsistent with Integer
-AttrTransform - doesn't accept AttrContext, non-validating
-ChildDef - not-allowed nodes translated to text, likely invalid handling
+AttrTransform - doesn't accept AttrContext
 Config - "load configuration" hooks missing, rich set* accessors missing
+ConfigSchema - redefinition is a mess
 Strategy
     FixNesting - cannot bubble nodes out of structures
     MakeWellFormed - insufficient automatic closing definitions (check HTML
-        spec for optional end tags).
+        spec for optional end tags, also, closing based on type (block/inline)
+        might be efficient).
     RemoveForeignElements - should be run in parallel with MakeWellFormed
 URIScheme - needs to have callable generic checks
     ftp - missing typecode check
diff --git a/docs/optimization.txt b/docs/optimization.txt
index 84c49c85..49a51794 100644
--- a/docs/optimization.txt
+++ b/docs/optimization.txt
@@ -2,7 +2,8 @@
 Optimization
 
 Here are some possible optimization techniques we can apply to code sections if
-they turn out to be slow.  Be sure not to prematurely optimize though!
+they turn out to be slow.  Be sure not to prematurely optimize: if you get
+that itch, put it here!
 
  - Make Tokens Flyweights (may prove problematic, probably not worth it)
  - Rewrite regexps into PHP code
diff --git a/docs/security.txt b/docs/security.txt
index d5b71295..695853d5 100644
--- a/docs/security.txt
+++ b/docs/security.txt
@@ -6,30 +6,39 @@ through negligence of people. This class will do its job: no more, no less,
 and it's up to you to provide it the proper information and proper context
 to be effective. Things to remember:
 
-1. UTF-8. Currently, the parser runs under the assumption that it is dealing
+1. Character Encoding: UTF-8.
+Currently, the parser runs under the assumption that it is dealing
 with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no
 character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as
-your character encoding, you should switch. Now. Make sure any input is
-properly converted to UTF-8, or the parser will mangle it badly
-(though it won't be a security risk if you're outputting it as UTF-8 though).
+your character encoding, make sure you configure HTML Purifier or switch
+to UTF-8. Now. Also, make sure any input is properly converted to UTF-8, or
+the parser will mangle it badly (though it won't be a security risk if you're
+outputting it as UTF-8 though).  Character encoding is, in general, a knotty
+issue, but do yourself a favor and learn about it:
+<http://www.joelonsoftware.com/articles/Unicode.html>
 
-2. XHTML 1.0 Transitional. This is what the parser is outputting. For the most
+2. Doctype: XHTML 1.0 Transitional
+This is what the parser is outputting. For the most
 part, it's compatible with HTML 4.01, but XHTML enforces some very nice things
 that all web developers should use. Regardless, NO DOCTYPE is a NO. Quirks mode
 has waaaay too many quirks for a little parser to handle.  We did not select
 strict in order to prevent ourselves from being too draconic on users, but
-this may be configurable in the future.
+this may be configurable in the future.  Do you want standards compliance?
+The doctype is a good place to start.
 
-3. IDs. They need to be unique, but without some knowledge of the
+3. IDs
+They need to be unique, but without some knowledge of the
 rest of the document, it's difficult to know what's unique. %Attr.IDBlacklist
 needs to be set: we may want to consider disallowing IDs by default to
 save lazy programmers.
 
-4. [PROJECTED] Links. We're not going to try for spam protection (although
+4. [PROJECTED] Links
+We're not going to try for spam protection (although
 some hooks for such a module might be nice) but we may offer the ability to
 only accept relative URLs. Pick the one that's right for you.
 
-5. CSS. While we can prevent the most flagrant cases from affecting your
+5. CSS
+While we can prevent the most flagrant cases from affecting your
 layout (such as absolutely positioned elements), no amount of code is going
 to protect your pages from being attacked by garish colors and plain old
 bad taste.  A neat feature would be the ability to define acceptable colors