2006-08-16 04:14:59 +00:00
|
|
|
|
|
|
|
Install
|
2006-09-01 14:57:47 +00:00
|
|
|
How to install HTML Purifier
|
2006-08-16 04:14:59 +00:00
|
|
|
|
|
|
|
Being a library, there's no fancy GUI that will take you step-by-step through
|
2006-09-01 14:57:47 +00:00
|
|
|
configuring database credentials and other mumbo-jumbo. HTML Purifier is
|
2006-08-16 04:14:59 +00:00
|
|
|
designed to run "out of the box." Regardless, there are still a couple of
|
|
|
|
things you should be mindful of.
|
|
|
|
|
|
|
|
|
|
|
|
|
2006-09-01 14:57:47 +00:00
|
|
|
0. Compatibility
|
|
|
|
|
|
|
|
HTML Purifier works in both PHP 4 and PHP 5. I have run the test suite on
|
|
|
|
these versions:
|
|
|
|
|
|
|
|
- 4.3.9, 4.3.11
|
|
|
|
- 4.4.0, 4.4.4
|
|
|
|
- 5.0.0, 5.0.4
|
2006-09-01 17:18:49 +00:00
|
|
|
- 5.1.0, 5.1.6
|
2006-09-01 14:57:47 +00:00
|
|
|
|
|
|
|
And can confidently say that HTML Purifier should work in all versions
|
|
|
|
between and afterwards. HTML Purifier definitely does not support PHP 4.2,
|
|
|
|
and PHP 4.3 branch support may go further back than that, but I haven't tested
|
|
|
|
any earlier versions.
|
|
|
|
|
|
|
|
I have been unable to get PHP 5.0.5 working on my computer, so if someone
|
|
|
|
wants to test that, be my guest. All tests were done on Windows XP Home,
|
2006-09-16 00:37:33 +00:00
|
|
|
but operating system should not be a major factor in the library.
|
2006-09-01 14:57:47 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
2006-08-16 04:14:59 +00:00
|
|
|
1. Including the proper files
|
|
|
|
|
2006-09-01 14:57:47 +00:00
|
|
|
The library/ directory must be added to your path: HTML Purifier will not be
|
2006-08-16 04:14:59 +00:00
|
|
|
able to find the necessary includes otherwise. This is as simple as:
|
|
|
|
|
2006-09-09 21:10:04 +00:00
|
|
|
set_include_path('/path/to/htmlpurifier/library' . PATH_SEPARATOR .
|
|
|
|
get_include_path() );
|
2006-08-16 04:14:59 +00:00
|
|
|
|
|
|
|
...replacing /path/to/htmlpurifier with the actual location of the folder. Don't
|
2006-09-01 14:57:47 +00:00
|
|
|
worry, HTML Purifier is namespaced so unless you have another file named
|
2006-08-16 04:14:59 +00:00
|
|
|
HTMLPurifier.php, the files won't collide with any of your includes.
|
|
|
|
|
|
|
|
Then, it's a simple matter of including the base file:
|
|
|
|
|
2006-09-01 00:54:38 +00:00
|
|
|
require_once 'HTMLPurifier.php';
|
2006-08-16 04:14:59 +00:00
|
|
|
|
|
|
|
...and you're good to go.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2. Preparing the proper environment
|
|
|
|
|
|
|
|
While no configuration is necessary, you first should take precautions regarding
|
|
|
|
the other output HTML that the filtered content will be going along with. Here
|
|
|
|
is a (short) checklist:
|
|
|
|
|
|
|
|
* Have I specified XHTML 1.0 Transitional as the doctype?
|
|
|
|
* Have I specified UTF-8 as the character encoding?
|
|
|
|
|
2006-09-09 21:10:04 +00:00
|
|
|
To find out what these are, browse to your website and view its source code.
|
|
|
|
You can figure out the doctype from the a declaration that looks like
|
|
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
|
|
or no doctype. You can figure out the character encoding by looking for
|
|
|
|
<meta http-equiv="Content-type" content="text/html;charset=ENCODING">
|
|
|
|
|
2006-08-16 04:14:59 +00:00
|
|
|
I cannot stress the importance of these two bullets enough. Omitting either
|
|
|
|
of them could have dire consequences not only for security but for plain
|
|
|
|
old usability. You can find a more in-depth discussion of why this is needed
|
|
|
|
in docs/security.txt, in the meantime, try to change your output so this is
|
2006-09-16 00:37:33 +00:00
|
|
|
the case. If you can't, well, we might be able to accomodate you (read
|
|
|
|
section 3).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3. Configuring HTML Purifier
|
|
|
|
|
|
|
|
HTML Purifier is designed to run out-of-the-box, but occasionally HTML
|
|
|
|
Purifier needs to be told what to do.
|
2006-08-16 04:14:59 +00:00
|
|
|
|
|
|
|
If, for some reason, you are unable to switch to UTF-8 immediately, you can
|
2006-09-01 14:57:47 +00:00
|
|
|
switch HTML Purifier's encoding. Note that the availability of encodings is
|
2006-09-01 00:54:38 +00:00
|
|
|
dependent on iconv, and you'll be missing characters if the charset you
|
|
|
|
choose doesn't have them.
|
|
|
|
|
2006-09-09 21:10:04 +00:00
|
|
|
$config->set('Core', 'Encoding', /* put your encoding here */);
|
|
|
|
|
|
|
|
An example usage for Latin-1 websites:
|
|
|
|
|
|
|
|
$config->set('Core', 'Encoding', 'ISO-8859-1');
|
2006-08-16 04:14:59 +00:00
|
|
|
|
2006-09-16 00:37:33 +00:00
|
|
|
For those of you stuck using HTML 4.01 Transitional, you can disable
|
|
|
|
XHTML output like this:
|
|
|
|
|
|
|
|
$config->set('Core', 'XHTML', false);
|
|
|
|
|
|
|
|
However, I strongly recommend that you use XHTML. Currently, we can only
|
|
|
|
guarantee transitional-complaint output, future versions will also allow strict
|
|
|
|
output.
|
|
|
|
|
2006-08-16 04:14:59 +00:00
|
|
|
|
|
|
|
|
|
|
|
3. Using the code
|
|
|
|
|
2006-09-01 00:54:38 +00:00
|
|
|
The interface is mind-numbingly simple:
|
|
|
|
|
|
|
|
$purifier = new HTMLPurifier();
|
|
|
|
$clean_html = $purifier->purify($dirty_html);
|
|
|
|
|
|
|
|
Or, if you're using the configuration object:
|
2006-08-16 04:14:59 +00:00
|
|
|
|
2006-09-01 00:54:38 +00:00
|
|
|
$purifier = new HTMLPurifier($config);
|
|
|
|
$clean_html = $purifier->purify($dirty_html);
|
2006-08-16 04:14:59 +00:00
|
|
|
|
|
|
|
That's it. For more examples, check out docs/examples/. Also, SLOW gives
|
2006-09-01 14:57:47 +00:00
|
|
|
advice on what to do if HTML Purifier is slowing down your application.
|
2006-09-09 21:10:04 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4. Quick install
|
|
|
|
|
2006-09-16 00:37:33 +00:00
|
|
|
If your website is in UTF-8 and XHTML Transitional, use this code:
|
2006-09-09 21:10:04 +00:00
|
|
|
|
|
|
|
<?php
|
|
|
|
set_include_path('/path/to/htmlpurifier/library'
|
|
|
|
. PATH_SEPARATOR . get_include_path() );
|
|
|
|
require_once 'HTMLPurifier.php';
|
|
|
|
$purifier = new HTMLPurifier();
|
|
|
|
|
|
|
|
$clean_html = $purifier->purify($dirty_html);
|
|
|
|
|
2006-09-16 00:37:33 +00:00
|
|
|
If your website is in a different encoding or doctype, use this code:
|
2006-09-09 21:10:04 +00:00
|
|
|
|
|
|
|
<?php
|
|
|
|
set_include_path('/path/to/htmlpurifier/library'
|
|
|
|
. PATH_SEPARATOR . get_include_path() );
|
|
|
|
require_once 'HTMLPurifier.php';
|
|
|
|
|
|
|
|
$config = HTMLPurifier_Config::createDefault();
|
|
|
|
$config->set('Core', 'Encoding', 'ISO-8859-1'); //replace with your encoding
|
2006-09-16 00:37:33 +00:00
|
|
|
$config->set('Core', 'XHTML', true); //replace with false if HTML 4.01
|
2006-09-09 21:10:04 +00:00
|
|
|
$purifier = new HTMLPurifier($config);
|
|
|
|
|
|
|
|
$clean_html = $purifier->purify($dirty_html);
|
|
|
|
?>
|