0
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2024-12-22 16:31:53 +00:00
htmlpurifier/docs/security.txt
Edward Z. Yang 4ef26bbd31 Update docs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@254 48356398-32a2-884e-a903-53898d9a118a
2006-08-14 21:21:54 +00:00

40 lines
2.1 KiB
Plaintext

Security
Like anything that claims to afford security, HTML_Purifier can be circumvented
through negligence of people. This class will do its job: no more, no less,
and it's up to you to provide it the proper information and proper context
to be effective. Things to remember:
1. UTF-8. Currently, the parser runs under the assumption that it is dealing
with UTF-8. Not ISO-8859-1 or Windows-1252, UTF-8. And definitely not "no
character encoding explicitly stated" or UTF-7. If you're not using UTF-8 as
your character encoding, you should switch. Now. (in future versions, however,
I may make the character encoding configurable, but there's only so much I
can do). Make sure any input is properly converted to UTF-8, or the parser
will mangle it badly (though it won't be a security risk if you're outputting
it as UTF-8 though).
2. XHTML 1.0 Transitional. This is what the parser is outputting. For the most
part, it's compatible with HTML 4.01, but XHTML enforces some very nice things
that all web developers should use. Regardless, NO DOCTYPE is a NO. Quirks mode
has waaaay too many quirks for a little parser to handle. We did not select
strict in order to prevent ourselves from being too draconic on users, but
this may be configurable in the future.
3. IDs. They need to be unique, but without some knowledge of the
rest of the document, it's difficult to know what's unique. Without setting
%Attr.IDBlacklist to the proper
4. [PROJECTED] Links. We're not going to try for spam protection (although
some hooks for such a module might be nice) but we may offer the ability to
only accept relative URLs. Pick the one that's right for you.
5. CSS. While we can prevent the most flagrant cases from affecting your
layout (such as absolutely positioned elements), no amount of code is going
to protect your pages from being attacked by garish colors and plain old
bad taste. A neat feature would be the ability to define acceptable colors
in a document, but that's not likely to be implemented for a while. In the
meantime, be sure to make sure that floated elements (permitted, since they
can be quite useful) cna't mess up your layout.