Install
How to install HTML Purifier
Being a library, there's no fancy GUI that will take you step-by-step through
configuring database credentials and other mumbo-jumbo. HTML Purifier is
designed to run "out of the box." Regardless, there are still a couple of
things you should be mindful of.
0. Compatibility
HTML Purifier works in both PHP 4 and PHP 5. I have run the test suite on
these versions:
- 4.3.9, 4.3.11
- 4.4.0, 4.4.4
- 5.0.0, 5.0.4
- 5.1.0, 5.1.6
And can confidently say that HTML Purifier should work in all versions
between and afterwards. HTML Purifier definitely does not support PHP 4.2,
and PHP 4.3 branch support may go further back than that, but I haven't tested
any earlier versions.
I have been unable to get PHP 5.0.5 working on my computer, so if someone
wants to test that, be my guest. All tests were done on Windows XP Home,
but operating system is quite irrelevant in this particular case.
1. Including the proper files
The library/ directory must be added to your path: HTML Purifier will not be
able to find the necessary includes otherwise. This is as simple as:
set_include_path('/path/to/htmlpurifier/library' . PATH_SEPARATOR .
get_include_path() );
...replacing /path/to/htmlpurifier with the actual location of the folder. Don't
worry, HTML Purifier is namespaced so unless you have another file named
HTMLPurifier.php, the files won't collide with any of your includes.
Then, it's a simple matter of including the base file:
require_once 'HTMLPurifier.php';
...and you're good to go.
2. Preparing the proper environment
While no configuration is necessary, you first should take precautions regarding
the other output HTML that the filtered content will be going along with. Here
is a (short) checklist:
* Have I specified XHTML 1.0 Transitional as the doctype?
* Have I specified UTF-8 as the character encoding?
To find out what these are, browse to your website and view its source code.
You can figure out the doctype from the a declaration that looks like
or no doctype. You can figure out the character encoding by looking for
I cannot stress the importance of these two bullets enough. Omitting either
of them could have dire consequences not only for security but for plain
old usability. You can find a more in-depth discussion of why this is needed
in docs/security.txt, in the meantime, try to change your output so this is
the case.
If, for some reason, you are unable to switch to UTF-8 immediately, you can
switch HTML Purifier's encoding. Note that the availability of encodings is
dependent on iconv, and you'll be missing characters if the charset you
choose doesn't have them.
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', /* put your encoding here */);
An example usage for Latin-1 websites:
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', 'ISO-8859-1');
3. Using the code
The interface is mind-numbingly simple:
$purifier = new HTMLPurifier();
$clean_html = $purifier->purify($dirty_html);
Or, if you're using the configuration object:
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
That's it. For more examples, check out docs/examples/. Also, SLOW gives
advice on what to do if HTML Purifier is slowing down your application.
4. Quick install
If your website is in UTF-8, use this code:
purify($dirty_html);
If your website is in a different encoding, use this code:
set('Core', 'Encoding', 'ISO-8859-1'); //replace with your encoding
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
?>