0
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2024-12-22 08:21:52 +00:00

Update INSTALL document.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1176 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
Edward Z. Yang 2007-06-20 22:36:10 +00:00
parent 10c970760d
commit 840f9f7434

34
INSTALL
View File

@ -8,13 +8,11 @@ installation GUI, you've come to the wrong place!) The impatient can scroll
down to the bottom of this INSTALL document to see the code, but you really
should make sure a few things are properly done.
Todo: Convert to using the array syntax for configuration.
1. Compatibility
HTML Purifier works in both PHP 4 and PHP 5, from PHP 4.3.2 and up. It has no
core dependencies with other libraries. (Whoopee!)
core dependencies with other libraries.
Optional extensions are iconv (usually installed) and tidy (also common).
If you use UTF-8 and don't plan on pretty-printing HTML, you can get away with
@ -50,6 +48,7 @@ be standards compliant. HTML Purifier can deal with these doctypes:
* XHTML 1.0 Strict
* HTML 4.01 Transitional
* HTML 4.01 Strict
* XHTML 1.1 sans Ruby
...and these character encodings:
@ -68,11 +67,7 @@ the doctype from this code in your HTML documents:
<meta http-equiv="Content-type" content="text/html;charset=ENCODING">
For legacy codebases these declarations may be missing. If that is the case,
STOP, and read up on character encodings and doctypes (in that order). Here
are some links:
* http://www.joelonsoftware.com/articles/Unicode.html
* http://alistapart.com/stories/doctype/
STOP, and read docs/enduser-utf8.html
You may currently be vulnerable to XSS and other security threats, and HTML
Purifier won't be able to fix that.
@ -116,23 +111,20 @@ websites):
Note that HTML Purifier's support for non-Unicode encodings is crippled by the
fact that any character not supported by that encoding will be silently
dropped, EVEN if it is ampersand escaped. This is a current limitation of
HTML Purifier that we are NOT actively working to fix. Patches are welcome,
but there are so many other gotchas and problems in I18N for non-Unicode
encodings that this functionality is low priority. See
<http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html> for a more
detailed lowdown on the topic.
dropped, EVEN if it is ampersand escaped. If you want to work around
this, you are welcome to read docs/enduser-utf8.html for a workaround,
but please be cognizant of the issues the "solution" creates.
4.2. Setting a different doctype
For those of you stuck using HTML 4.01 Transitional, you can disable
For those of you using HTML 4.01 Transitional, you can disable
XHTML output like this:
$config->set('HTML', 'Doctype', 'HTML 4.01 Transitional');
Supported doctypes include:
Other supported doctypes include:
* HTML 4.01 Strict
* HTML 4.01 Transitional
@ -191,4 +183,12 @@ If your website is in a different encoding or doctype, use this code:
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
?>
?>
7. Caching
HTML Purifier generates some cache files to speed up its execution. For
maximum performance, make sure that library/HTMLPurifier/DefinitionCache/Serializer
is writeable by the webserver.