mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2024-12-22 16:31:53 +00:00
- Update filter-levels document to cover CSS and attributes
- Add colors proposal, for constraining allowed colors in document - Add strictness proposal, for attributes that are permitted by Transitional but not by HTML Purifier git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@442 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
parent
4f8d83506d
commit
801dbcafb7
23
docs/colors.txt
Normal file
23
docs/colors.txt
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
|
||||||
|
Colors
|
||||||
|
Hammering some sense into those content-makers
|
||||||
|
|
||||||
|
Your website probably has a color-scheme. Green on white, purple on yellow,
|
||||||
|
whatever. When you give users the ability to style their content, you may
|
||||||
|
want them to keep in line with your styling. If you're website is all
|
||||||
|
about light colors, you don't want a user to come in and vandalize your
|
||||||
|
page with a deep maroon.
|
||||||
|
|
||||||
|
This is an extremely silly feature proposal, but I'm writing it down anyway.
|
||||||
|
|
||||||
|
What if the user could constrain the colors specified in inline styles? You
|
||||||
|
are only allowed to use these shades of dark green for text and these shades
|
||||||
|
of light yellow for the background. At the very least, you could ensure
|
||||||
|
that we did not have pale yellow on white text.
|
||||||
|
|
||||||
|
Implementation issues:
|
||||||
|
1. Requires the color attribute definition to know, currently, what the text
|
||||||
|
and background colors are. This becomes difficult when classes are thrown
|
||||||
|
into the mix.
|
||||||
|
2. The user still has to define the permissible colors, how does one do
|
||||||
|
something like that?
|
@ -20,15 +20,32 @@ can further be customized using simpler configuration options.
|
|||||||
Here are some fuzzy levels you could set:
|
Here are some fuzzy levels you could set:
|
||||||
|
|
||||||
1. Comments - Wordpress recommends a, abbr, acronym, b, blockquote, cite,
|
1. Comments - Wordpress recommends a, abbr, acronym, b, blockquote, cite,
|
||||||
code, em, i, strike, strong; however, you could get away with only a, b and
|
code, em, i, strike, strong; however, you could get away with only a, em and
|
||||||
i; also having p and pre tags would be helpful.
|
p; also having blockquote and pre tags would be helpful.
|
||||||
2. Pages - As permissive as possible without allowing XSS. No protection
|
2. BBCode - Emulate the usual tagset for forums: b, i, img, a, blockquote,
|
||||||
|
pre, div, span and h[2-6] (the last three are for specially formatted
|
||||||
|
posts, div and span require associated classes or inline styling enabled
|
||||||
|
to be useful)
|
||||||
|
3. Pages - As permissive as possible without allowing XSS. No protection
|
||||||
against bad design sense, unfortunantely. Suitable for wiki and page
|
against bad design sense, unfortunantely. Suitable for wiki and page
|
||||||
environments.
|
environments.
|
||||||
3. Lint - Accept everything in the spec, a Tidy wannabe.
|
4. Lint - Accept everything in the spec, a Tidy wannabe. (This probably won't
|
||||||
|
get implemented as it would require routines for things like <object>
|
||||||
|
and friends to be implemented, which is a lot of work for not a lot of
|
||||||
|
benefit)
|
||||||
|
|
||||||
I've also decomposed tags into risk levels. An asterisk indicates that no one
|
One final note: when you start axing tags that are more commonly used, you
|
||||||
really uses that tag, tilde indicates it's deprecated.
|
run the risk of accidentally destroying user data, especially if the data
|
||||||
|
is incoming from a WYSIWYG eidtor that hasn't been synced accordingly. This may
|
||||||
|
make forbidden element to text transformations desirable (for example, images).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
== Element Risk Analysis ==
|
||||||
|
|
||||||
|
Legend:
|
||||||
|
[danger level] - regular tags / uncommon tags ~ deprecated tags
|
||||||
|
[danger level]* - rare tags
|
||||||
|
|
||||||
1 - blockquote, code, em, i, p, tt / strong, sub, sup
|
1 - blockquote, code, em, i, p, tt / strong, sub, sup
|
||||||
1* - abbr, acronym, bdo, cite, dfn, kbd, q, samp
|
1* - abbr, acronym, bdo, cite, dfn, kbd, q, samp
|
||||||
@ -38,30 +55,76 @@ really uses that tag, tilde indicates it's deprecated.
|
|||||||
5 - a
|
5 - a
|
||||||
7 - area, map
|
7 - area, map
|
||||||
|
|
||||||
|
These are special use tags, they should be enabled on a blanket basis.
|
||||||
|
|
||||||
Lists - dd, dl, dt, li, ol, ul ~ menu, dir
|
Lists - dd, dl, dt, li, ol, ul ~ menu, dir
|
||||||
Tables - caption, table, td, th, tr / col, colgroup, tbody, tfoot, thead
|
Tables - caption, table, td, th, tr / col, colgroup, tbody, tfoot, thead
|
||||||
|
|
||||||
Forms - fieldset, form, input, lable, legend, optgroup, option, select, textarea
|
Forms - fieldset, form, input, lable, legend, optgroup, option, select, textarea
|
||||||
XSS - noscript, object, script ~ applet
|
XSS - noscript, object, script ~ applet
|
||||||
|
|
||||||
Meta - base, basefont, body, head, html, link, meta, style, title
|
Meta - base, basefont, body, head, html, link, meta, style, title
|
||||||
Frames - frame, frameset, iframe
|
Frames - frame, frameset, iframe
|
||||||
|
|
||||||
And tag specific notes:
|
And tag specific notes:
|
||||||
|
|
||||||
a - general problems involving linkspam
|
a - general problems involving linkspam
|
||||||
b - too much bold is bad, typographically speaking bold is discouraged
|
b - too much bold is bad, typographically speaking bold is discouraged
|
||||||
br - often misused
|
br - often misused
|
||||||
center - CSS, usually no legit use
|
center - CSS, usually no legit use
|
||||||
del - only useful in editing context
|
del - only useful in editing context
|
||||||
div - little meaning in certain contexts i.e. blog comment
|
div - little meaning in certain contexts i.e. blog comment
|
||||||
h1 - usually no legit use, as header is already set by application
|
h1 - usually no legit use, as header is already set by application
|
||||||
h* - not needed in blog comments
|
h* - not needed in blog comments
|
||||||
hr - usually not necessary in blog comments
|
hr - usually not necessary in blog comments
|
||||||
img - could be extremely undesirable if linking to external pics
|
img - could be extremely undesirable if linking to external pics (CSRF, goatse)
|
||||||
pre - could use formatting, only useful in code contexts
|
pre - could use formatting, only useful in code contexts
|
||||||
q - very little support
|
q - very little support
|
||||||
s - transform into span with styling or del?
|
s - transform into span with styling or del?
|
||||||
small - technically presentational
|
small - technically presentational
|
||||||
span - depends on attribute allowances
|
span - depends on attribute allowances
|
||||||
sub, sup - specialized
|
sub, sup - specialized
|
||||||
u - little legit use, prefer class with text-decoration
|
u - little legit use, prefer class with text-decoration
|
||||||
|
|
||||||
|
Based on the riskiness of the items, we may want to offer %HTML.DisableImages
|
||||||
|
attribute and put URI filtering higher up on the priority list.
|
||||||
|
|
||||||
|
|
||||||
|
== Attribute Risk Analysis ==
|
||||||
|
|
||||||
|
We actually have a suprisingly small assortment of allowed attributes (the
|
||||||
|
rest are deprecated in strict, and thus we opted not to allow them, even
|
||||||
|
though our output is XHTML Transitional by default.)
|
||||||
|
|
||||||
|
Required URI - img.alt, img.src, a.href
|
||||||
|
Medium risk - *.class, *.dir
|
||||||
|
High risk - img.height, img.width, *.id, *.style
|
||||||
|
|
||||||
|
Table - colgroup/col.span, td/th.rowspan, td/th.colspan
|
||||||
|
Uncommon - *.title, *.lang, *.xml:lang
|
||||||
|
Rare - td/th.abbr, table.summary, {table}.charoff
|
||||||
|
Rare URI - del.cite, ins.cite, blockquote.cite, q.cite, img.longdesc
|
||||||
|
Presentational - {table}.align, {table}.valign, table.frame, table.rules,
|
||||||
|
table.border
|
||||||
|
Partially presentational - table.cellpadding, table.cellspacing,
|
||||||
|
table.width, col.width, colgroup.width
|
||||||
|
|
||||||
|
|
||||||
|
== CSS Risk Analysis ==
|
||||||
|
|
||||||
|
There are certain CSS elements that are extremely useful inline, but then
|
||||||
|
as you get to more presentation oriented styling it may not always be
|
||||||
|
appropriate to inline them.
|
||||||
|
|
||||||
|
Useful - clear, float, border-collapse, caption-side
|
||||||
|
|
||||||
|
These CSS properties can break layouts if used improperly. We have excluded
|
||||||
|
any CSS properties that are not currently implemented (such as position).
|
||||||
|
|
||||||
|
Dangerous, can go outside container - float
|
||||||
|
Easy to abuse - font-size, font-family (font), width
|
||||||
|
Colored - background-color (background), border-color (border), color
|
||||||
|
Dramatic - border, list-style-position (list-style), margin, padding,
|
||||||
|
text-align, text-indent, text-transform, vertical-align, line-height
|
||||||
|
|
||||||
|
Dramatic elements substnatially change the look of text in ways that should
|
||||||
|
probably have been reserved to other areas.
|
||||||
|
25
docs/strictness.txt
Normal file
25
docs/strictness.txt
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
|
||||||
|
Is HTML Purifier Strict or Transitional?
|
||||||
|
A little bit of helpful guidance
|
||||||
|
|
||||||
|
Despite the fact that HTML Purifier professes only to support transitional
|
||||||
|
HTML, it rejects a lot of attributes and elements that are actually, indeed,
|
||||||
|
valid. You can investigate progress.html to find out precisely what we
|
||||||
|
are doing to these *deprecated* attributes.
|
||||||
|
|
||||||
|
However, users have found that Strict HTML imposes some quite unreasonable
|
||||||
|
restrictions on certain things. The start and value attributes in ol and
|
||||||
|
li (respectively) perhaps are the most contested. There's is currently no
|
||||||
|
widely supported browser method short of JavaScript that can replace these
|
||||||
|
two deprecated elements. HTML Purifier does not currently support them, but
|
||||||
|
it might behoove us to do so while our output is still transitional.
|
||||||
|
|
||||||
|
Fortunantely, that's the only real bugger case. The others have near-perfect
|
||||||
|
CSS equivalents, and were presentational anyway. However, the other question
|
||||||
|
pops up: should we always convert these to the CSS forms when 1. the spec
|
||||||
|
allows them anyway and 2. older browsers support them better? After all, the
|
||||||
|
whole point about CSS is to seperate styling from content, so inline styling
|
||||||
|
doesn't solve that problem.
|
||||||
|
|
||||||
|
It's an icky question, and we'll have to deal with it as more and more
|
||||||
|
transforms get implemented.
|
Loading…
Reference in New Issue
Block a user