mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2024-11-09 15:28:40 +00:00
Add a security document, detailing issues that white-listing won't resolve.
git-svn-id: http://htmlpurifier.org/svnroot/html_purifier/trunk@45 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
parent
83f735ea7e
commit
4d2ec806ac
34
docs/security.txt
Normal file
34
docs/security.txt
Normal file
@ -0,0 +1,34 @@
|
||||
== Possible Security Issues ==
|
||||
|
||||
Like anything that claims to afford security, HTML_Purifier can be circumvented
|
||||
through negligence of people. This class will do its job: no more, no less,
|
||||
and it's up to you to provide it the proper information and proper context
|
||||
to be effective. Things to remember:
|
||||
|
||||
1. UTF-8. Currently, the parser runs under the assumption that it is dealing
|
||||
with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no
|
||||
character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as
|
||||
your character encoding, you should switch. Now. (in future versions, however,
|
||||
I may make the character encoding configurable, but there's only so much I
|
||||
can do). Make sure any input is properly converted to UTF-8, or the parser
|
||||
will mangle it badly (though it won't be a security risk if you're outputting
|
||||
it as UTF-8).
|
||||
|
||||
2. XHTML 1.0. This is what the parser is outputting. For the most part, it's
|
||||
compatible with HTML 4.01, but XHTML enforces some very nice things that all
|
||||
web developers should use. Regardless, NO DOCTYPE is a NO. Quirks mode has
|
||||
waaaay too many quirks for a little parser to handle.
|
||||
|
||||
3. [PROJECTED] IDs. They need to be unique, but without some knowledge of the
|
||||
rest of the document, it's difficult to know what's unique. I project default
|
||||
behavior being a customizable prefix to all ID declarations in the document,
|
||||
so make sure you don't use that prefix. Might cause problems for multiple
|
||||
instances of HTML escaped output too (especially when it comes to caching).
|
||||
Best to just zap them completely, perhaps. This will be configurable, and you'll
|
||||
have to pick the correct one.
|
||||
|
||||
4. [PROJECTED] Links. We're not going to try for spam protection (although
|
||||
some hooks for such a module might be nice) but we may offer the ability to
|
||||
only accept relative URLs. Pick the one that's right for you.
|
||||
|
||||
5. [PROJECTED] CSS. What a knotty issue. Probably will have to be configurable.
|
Loading…
Reference in New Issue
Block a user