Install
    How to install HTML Purifier

HTML Purifier is designed to run out of the box,  so actually using the library
is extremely easy. (Although, if you were looking for a step-by-step
installation GUI, you've come to the wrong place!)  The impatient can scroll
down to the bottom of this INSTALL document to see the code, but you really
should make sure a few things are properly done.




1.  Compatibility

HTML Purifier works in both PHP 4 and PHP 5, from PHP 4.3.2 and up. It has no
core dependencies with other libraries.

Optional extensions are iconv (usually installed) and tidy (also common).
If you use UTF-8 and don't plan on pretty-printing HTML, you can get away with
not having either of these extensions.



2.  Including the library

Simply use:

    require_once '/path/to/library/HTMLPurifier.auto.php';

...and you're good to go.  Since HTML Purifier's codebase is fairly
large, I recommend only including HTML Purifier when you need it.

If you don't like your include_path to be fiddled around with, simply set
HTML Purifier's library/ directory to the include path yourself and then:

    require_once 'HTMLPurifier.php';

Only the contents in the library/ folder are necessary, so you can remove
everything else when using HTML Purifier in a production environment.  



3.  Preparing the proper output environment

HTML Purifier is all about web-standards, so accordingly your webpages should
be standards compliant.  HTML Purifier can deal with these doctypes:

* XHTML 1.0 Transitional (default)
* XHTML 1.0 Strict
* HTML 4.01 Transitional
* HTML 4.01 Strict
* XHTML 1.1 (sans Ruby)

...and these character encodings:

* UTF-8 (default)
* Any encoding iconv supports (support is crippled for i18n though)

The defaults are there for a reason: they are best-practice choices that
should not be changed lightly.  For those of you in the dark, you can determine
the doctype from this code in your HTML documents:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

...and the character encoding from this code:

    <meta http-equiv="Content-type" content="text/html;charset=ENCODING">

For legacy codebases these declarations may be missing.  If that is the case,
STOP, and read docs/enduser-utf8.html





You may currently be vulnerable to XSS and other security threats, and HTML
Purifier won't be able to fix that.



4. Configuration

HTML Purifier is designed to run out-of-the-box, but occasionally HTML
Purifier needs to be told what to do.  If you answered no to any of these
questions, read on, otherwise, you can skip to the next section (or, if you're
into configuring things just for the heck of it, skip to 4.3).

* Am I using UTF-8?
* Am I using XHTML 1.0 Transitional?

If you answered no to any of these questions, instantiate a configuration
object and read on:

    $config = HTMLPurifier_Config::createDefault();



4.1. Setting a different character encoding

You really shouldn't use any other encoding except UTF-8, especially if you
plan to support multilingual websites (read section three for more details).
However, switching to UTF-8 is not always immediately feasible, so we can
adapt.

HTML Purifier uses iconv to support other character encodings, as such,
any encoding that iconv supports <http://www.gnu.org/software/libiconv/>
HTML Purifier supports with this code:

    $config->set('Core', 'Encoding', /* put your encoding here */);

An example usage for Latin-1 websites (the most common encoding for English
websites):

    $config->set('Core', 'Encoding', 'ISO-8859-1');

Note that HTML Purifier's support for non-Unicode encodings is crippled by the
fact that any character not supported by that encoding will be silently
dropped, EVEN if it is ampersand escaped.  If you want to work around
this, you are welcome to read docs/enduser-utf8.html for a fix,
but please be cognizant of the issues the "solution" creates (for this
reason, I do not include the solution in this document).






4.2. Setting a different doctype

For those of you using HTML 4.01 Transitional, you can disable
XHTML output like this:

    $config->set('HTML', 'Doctype', 'HTML 4.01 Transitional');

Other supported doctypes include:


    * HTML 4.01 Strict
    * HTML 4.01 Transitional
    * XHTML 1.0 Strict
    * XHTML 1.0 Transitional
    * XHTML 1.1



4.3. Other settings

There are more configuration directives which can be read about
here: <http://htmlpurifier.org/live/configdoc/plain.html>  They're a bit boring,
but they can help out for those of you who like to exert maximum control over
your code.  Some of the more interesting ones are configurable at the
demo <http://htmlpurifier.org/demo.php> and are well worth looking into
for your own system.



5.   Using the code

The interface is mind-numbingly simple:

    $purifier = new HTMLPurifier();
    $clean_html = $purifier->purify( $dirty_html );

...or, if you're using the configuration object:

    $purifier = new HTMLPurifier($config);
    $clean_html = $purifier->purify( $dirty_html );

That's it!  For more examples, check out docs/examples/ (they aren't very
different though).  Also, docs/enduser-slow.html gives advice on what to
do if HTML Purifier is slowing down your application.



6.   Quick install

First, make sure library/HTMLPurifier/DefinitionCache/Serializer is
writable by the webserver (see Section 7: Caching below for details).
If your website is in UTF-8 and XHTML Transitional, use this code:

<?php
    require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
    
    $purifier = new HTMLPurifier();
    $clean_html = $purifier->purify($dirty_html);
?>

If your website is in a different encoding or doctype, use this code:

<?php
    require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
    
    $config = HTMLPurifier_Config::createDefault();
    $config->set('Core', 'Encoding', 'ISO-8859-1'); // replace with your encoding
    $config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); // replace with your doctype
    $purifier = new HTMLPurifier($config);
    
    $clean_html = $purifier->purify($dirty_html);
?>



7. Caching

HTML Purifier generates some cache files (generally one or two) to speed up
its execution. For maximum performance, make sure that
library/HTMLPurifier/DefinitionCache/Serializer is writeable by the webserver.

If you are in the library/ folder of HTML Purifier, you can set the
appropriate permissions using:

    chmod -R 0755 HTMLPurifier/DefinitionCache/Serializer

If the above command doesn't work, you may need to assign write permissions
to all. This may be necessary if your webserver runs as nobody, but is
not recommended since it means any other user can write files in the
directory. Use:

    chmod -R 0777 HTMLPurifier/DefinitionCache/Serializer

You can also chmod files via your FTP client; this option
is usually accessible by right clicking the corresponding directory and
then selecting "chmod" or "file permissions".

Starting with 2.0.1, HTML Purifier will generate friendly error messages
that will tell you exactly what you have to chmod the directory to, if in doubt,
follow its advice.

If you are unable or unwilling to give write permissions to the cache
directory, you can either disable the cache (and suffer a performance
hit):

    $config->set('Core', 'DefinitionCache', null);

Or move the cache directory somewhere else (no trailing slash):

    $config->set('Cache', 'SerializerPath', '/home/user/absolute/path');