diff --git a/INSTALL b/INSTALL index d33f9612..090b157d 100644 --- a/INSTALL +++ b/INSTALL @@ -10,6 +10,7 @@ While the impatient can get going immediately with some of the sample code at the bottom of this library, it's well worth performing some basic sanity checks to get the most out of this library. + --------------------------------------------------------------------------- 1. Compatibility @@ -23,6 +24,7 @@ These optional extensions can enhance the capabilities of HTML Purifier: * iconv : Converts text to and from non-UTF-8 encodings * tidy : Used for pretty-printing HTML + --------------------------------------------------------------------------- 2. Reconnaissance @@ -42,7 +44,7 @@ HTML Purifier can process these doctypes: ...and these character encodings: * UTF-8 (default) -* Any encoding iconv supports (but crippled internationalization support) +* Any encoding iconv supports (with crippled internationalization support) These defaults reflect what my choices where be if I were authoring an HTML document, however, what you choose depends on the nature of your @@ -59,8 +61,9 @@ the doctype from this identifier at the top of your source code: If the character encoding declaration is missing, STOP NOW, and read 'docs/enduser-utf8.html' (web accessible at http://htmlpurifier.org/docs/enduser-utf8.html). In fact, even if it is -present, read that anyway: most websites specify character encoding -incorrectly. +present, read this document anyway, as most websites specify character +encoding incorrectly. + --------------------------------------------------------------------------- 3. Including the library @@ -70,7 +73,8 @@ The procedure is quite simple: require_once '/path/to/library/HTMLPurifier.auto.php'; I recommend only including HTML Purifier when you need it, because that -call represents the inclusion of a lot of PHP files. +call represents the inclusion of a lot of PHP files which constitute +the bulk of HTML Purifier's memory usage. If you don't like your include_path to be fiddled around with, simply set HTML Purifier's library/ directory to the include path yourself and then: @@ -98,7 +102,6 @@ object and read on: $config = HTMLPurifier_Config::createDefault(); - 4.1. Setting a different character encoding You really shouldn't use any other encoding except UTF-8, especially if you @@ -125,7 +128,6 @@ but please be cognizant of the issues the "solution" creates (for this reason, I do not include the solution in this document). - 4.2. Setting a different doctype For those of you using HTML 4.01 Transitional, you can disable @@ -142,7 +144,6 @@ Other supported doctypes include: * XHTML 1.1 - 4.3. Other settings There are more configuration directives which can be read about @@ -151,56 +152,25 @@ but they can help out for those of you who like to exert maximum control over your code. Some of the more interesting ones are configurable at the demo and are well worth looking into for your own system. - ---------------------------------------------------------------------------- -5. Using the code +For example, you can fine tune allowed elements and attributes, convert +relative URLs to absolute ones, and even autoparagraph input text! These +are, respectively, %HTML.Allowed, %URI.MakeAbsolute and %URI.Base, and +%AutoFormat.AutoParagraph. The %Namespace.Directive naming convention +translates to: -The interface is mind-numbingly simple: + $config->set('Namespace', 'Directive', $value); - $purifier = new HTMLPurifier(); - $clean_html = $purifier->purify( $dirty_html ); +E.g. -...or, if you're using the configuration object: - - $purifier = new HTMLPurifier($config); - $clean_html = $purifier->purify( $dirty_html ); - -That's it! For more examples, check out docs/examples/ (they aren't very -different though). Also, docs/enduser-slow.html gives advice on what to -do if HTML Purifier is slowing down your application. + $config->set('HTML', 'Allowed', 'p,b,a[href],i'); + $config->set('URI', 'Base', 'http://www.example.com'); + $config->set('URI', 'MakeAbsolute', true); + $config->set('AutoFormat', 'AutoParagraph', true); --------------------------------------------------------------------------- -6. Quick install - -First, make sure library/HTMLPurifier/DefinitionCache/Serializer is -writable by the webserver (see Section 7: Caching below for details). -If your website is in UTF-8 and XHTML Transitional, use this code: - -purify($dirty_html); -?> - -If your website is in a different encoding or doctype, use this code: - -set('Core', 'Encoding', 'ISO-8859-1'); // replace with your encoding - $config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); // replace with your doctype - $purifier = new HTMLPurifier($config); - - $clean_html = $purifier->purify($dirty_html); -?> - - ---------------------------------------------------------------------------- -7. Caching +5. Caching HTML Purifier generates some cache files (generally one or two) to speed up its execution. For maximum performance, make sure that @@ -236,3 +206,49 @@ Or move the cache directory somewhere else (no trailing slash): $config->set('Cache', 'SerializerPath', '/home/user/absolute/path'); + +--------------------------------------------------------------------------- +6. Using the code + +The interface is mind-numbingly simple: + + $purifier = new HTMLPurifier(); + $clean_html = $purifier->purify( $dirty_html ); + +...or, if you're using the configuration object: + + $purifier = new HTMLPurifier($config); + $clean_html = $purifier->purify( $dirty_html ); + +That's it! For more examples, check out docs/examples/ (they aren't very +different though). Also, docs/enduser-slow.html gives advice on what to +do if HTML Purifier is slowing down your application. + + +--------------------------------------------------------------------------- +7. Quick install + +First, make sure library/HTMLPurifier/DefinitionCache/Serializer is +writable by the webserver (see Section 5: Caching above for details). +If your website is in UTF-8 and XHTML Transitional, use this code: + +purify($dirty_html); +?> + +If your website is in a different encoding or doctype, use this code: + +set('Core', 'Encoding', 'ISO-8859-1'); // replace with your encoding + $config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); // replace with your doctype + $purifier = new HTMLPurifier($config); + + $clean_html = $purifier->purify($dirty_html); +?> +