From a43a2730bc24459a44225aabb2b45eda2a1da27f Mon Sep 17 00:00:00 2001
From: "Edward Z. Yang" <edwardzyang@thewritingpot.com>
Date: Sat, 26 Aug 2006 18:44:50 +0000
Subject: [PATCH] Add filter levels document, detailing how to extend
 Definition.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@320 48356398-32a2-884e-a903-53898d9a118a
---
 docs/filter-levels.txt | 67 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)
 create mode 100644 docs/filter-levels.txt

diff --git a/docs/filter-levels.txt b/docs/filter-levels.txt
new file mode 100644
index 00000000..09b0563f
--- /dev/null
+++ b/docs/filter-levels.txt
@@ -0,0 +1,67 @@
+
+Filter Levels
+    When one size *does not* fit all
+
+The more I think about it, the less sense it makes for maintaining one huge
+monolithic HTMLDefinition class.  There's simply so much variation that
+could go into this definition: the set of HTML good for blog entries is
+definitely too large for HTML that would be allowed in blog comments. Going
+from Transitional to Strict requires changes to the definition.
+
+However, allowing users to specify their own whitelists was an idea I
+rejected from the start.  Simply put, the typical programmer is too lazy
+to actually go through the trouble of investigating which tags, attributes
+and properties to allow.  HTMLDefinition makes a big part of what HTMLPurifier
+is.
+
+The idea, then, is to setup fundamentally different set of definitions, which
+can further be customized using simpler configuration options.
+
+Here are some fuzzy levels you could set:
+
+1. Comments - Wordpress recommends a, abbr, acronym, b, blockquote, cite,
+    code, em, i, strike, strong; however, you could get away with only a, b and
+    i; also having p and pre tags would be helpful.
+2. Pages - As permissive as possible without allowing XSS.  No protection
+    against bad design sense, unfortunantely.  Suitable for wiki and page
+    environments.
+3. Lint - Accept everything in the spec, a Tidy wannabe.
+
+I've also decomposed tags into risk levels.  An asterisk indicates that no one
+really uses that tag, tilde indicates it's deprecated.
+
+1 - blockquote, code, em, i, p, tt / strong, sub, sup
+1* - abbr, acronym, bdo, cite, dfn, kbd, q, samp
+2 - b, br, del, div, pre, span / ins, s, strike ~ u
+3 - h2, h3, h4, h5, h6 ~ center
+4 - h1, big ~ font
+5 - a
+7 - area, map
+
+Lists - dd, dl, dt, li, ol, ul ~ menu, dir
+Tables - caption, table, td, th, tr / col, colgroup, tbody, tfoot, thead
+Forms - fieldset, form, input, lable, legend, optgroup, option, select, textarea
+XSS - noscript, object, script ~ applet
+
+Meta - base, basefont, body, head, html, link, meta, style, title
+Frames - frame, frameset, iframe
+
+And tag specific notes:
+
+a - general problems involving linkspam
+b - too much bold is bad, typographically speaking bold is discouraged
+br - often misused
+center - CSS, usually no legit use
+del - only useful in editing context
+div - little meaning in certain contexts i.e. blog comment
+h1 - usually no legit use, as header is already set by application
+h* - not needed in blog comments
+hr - usually not necessary in blog comments
+img - could be extremely undesirable if linking to external pics
+pre - could use formatting, only useful in code contexts
+q - very little support
+s - transform into span with styling or del?
+small - technically presentational
+span - depends on attribute allowances
+sub, sup - specialized
+u - little legit use, prefer class with text-decoration