diff --git a/NEWS b/NEWS index 34d9e1c8..9ef2fe26 100644 --- a/NEWS +++ b/NEWS @@ -13,7 +13,7 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier (major feature release) 1.3.3, unknown release date, may be dropped -(security/bugfix/minor feature release) +! Moved SLOW to docs/enduser-slow.html and added code examples 1.3.2, released 2006-12-25 ! HTMLPurifier object now accepts configuration arrays, no need to manually diff --git a/README b/README index 78e171ad..bfd270d8 100644 --- a/README +++ b/README @@ -1,13 +1,22 @@ README - All about HTMLPurifier + All about HTML Purifier -HTMLPurifier is an HTML filtering solution. It uses a unique combination of -robust whitelists and agressive parsing to ensure that not only are XSS -attacks thwarted, but the resulting HTML is standards compliant. +HTML Purifier is an HTML filtering solution that uses a unique combination +of robust whitelists and agressive parsing to ensure that not only are +XSS attacks thwarted, but the resulting HTML is standards compliant. -See INSTALL on how to use the library. See docs/ for more developer-oriented -documentation as well as some code examples. Users of TinyMCE or FCKeditor -may be especially interested in WYSIWYG. +HTML Purifier is oriented towards richly formatted documents from +untrusted sources that require CSS and a full tag-set. This library can +be configured to accept a more restrictive set of tags, but it won't be +as efficient as more bare-bones parsers. It will, however, do the job +right, which may be more important. -HTMLPurifier can be found on the web at: http://hp.jpsband.org/ +Places to go: + +* See INSTALL for a quick installation guide +* See docs/ for developer-oriented documentation, code examples and + an in-depth installation guide. +* See WYSIWYG for information on editors like TinyMCE and FCKeditor + +HTML Purifier can be found on the web at: http://hp.jpsband.org/ diff --git a/SLOW b/SLOW deleted file mode 100644 index bc8616d9..00000000 --- a/SLOW +++ /dev/null @@ -1,40 +0,0 @@ - -SLOW - also known as the HELP ME LIBRARY IS TOO SLOW MY PAGE TAKE TOO LONG LOAD page - -HTML Purifier is a very powerful library. But with power comes great -responsibility, or, at least, longer execution times. Remember, this -library isn't lightly grazing over submitted HTML: it's deconstructing -the whole thing, rigorously checking the parts, and then putting it -back together. - -So, if it so turns out that HTML Purifier is kinda too slow for outbound -filtering, you've got a few options: - -1. Inbound filtering - perform filtering of HTML when it's submitted by the -user. Since the user is already submitting something, an extra half a -second tacked on to the load time probably isn't going to be that huge of -a problem. Then, displaying the content is a simple a manner of outputting -it directly from your database/filesystem. The trouble with this method is -that your user loses the original text, and when doing edits, will be -handling the filtered text. While this may be a good thing, especially if -you're using a WYSIWYG editor, it can also result in data-loss if a user -makes a typo. - -2. Caching the filtered output - accept the submitted text and put it -unaltered into the database, but then also generate a filtered version and -stash that in the database. Serve the filtered version to readers, and the -unaltered version to editors. If need be, you can invalidate the cache and -have the cached filtered version be regenerated on the first page view. Pros? -Full data retention. Cons? It's more complicated, and opens other editors -up to XSS if they are using a WYSIWYG editor (to fix that, they'd have to -be able to get their hands on the *really* original text served in plaintext -mode). - -In short, inbound filtering is almost as simple as outbound filtering, but -it has some drawbacks which cannot be fixed unless you save both the original -and the filtered versions. - -There is a third option: profile and optimize HTMLPurifier yourself. Be sure -to report back your results if you decide to do that! Especially if you -port HTML Purifier to C++. ;-) diff --git a/WYSIWYG b/WYSIWYG index 6fab8bcc..718f8959 100644 --- a/WYSIWYG +++ b/WYSIWYG @@ -18,4 +18,5 @@ HTML Purifier is perfect for filtering pure-HTML input from WYSIWYG editors. Enough said. There is a proof-of-concept integration of HTML Purifier with the Mantis -bugtracker at http://hp.jpsband.org/mantis/ +bugtracker at http://hp.jpsband.org/mantis/ You can see notes on how +this integration was acheived at http://hp.jpsband.org/mantis_notes.txt diff --git a/docs/enduser-slow.html b/docs/enduser-slow.html new file mode 100644 index 00000000..bac0704d --- /dev/null +++ b/docs/enduser-slow.html @@ -0,0 +1,116 @@ + + +
+ + + + +HTML Purifier is a very powerful library. But with power comes great +responsibility, in the form of longer execution times. Remember, this +library isn't lightly grazing over submitted HTML: it's deconstructing +the whole thing, rigorously checking the parts, and then putting it back +together.
+ +So, if it so turns out that HTML Purifier is kinda too slow for outbound +filtering, you've got a few options:
+ +Perform filtering of HTML when it's submitted by the user. Since the +user is already submitting something, an extra half a second tacked on +to the load time probably isn't going to be that huge of a problem. +Then, displaying the content is a simple a manner of outputting it +directly from your database/filesystem. The trouble with this method is +that your user loses the original text, and when doing edits, will be +handling the filtered text. While this may be a good thing, especially +if you're using a WYSIWYG editor, it can also result in data-loss if a +user makes a typo.
+ +Example (non-functional):
+ +<?php + /** + * FORM SUBMISSION PAGE + * display_error($message) : displays nice error page with message + * display_success() : displays a nice success page + * display_form() : displays the HTML submission form + * database_insert($html) : inserts data into database as new row + */ + if (!empty($_POST)) { + require_once '/path/to/library/HTMLPurifier.auto.php'; + require_once 'HTMLPurifier.func.php'; + $dirty_html = isset($_POST['html']) ? $_POST['html'] : false; + if (!$dirty_html) { + display_error('You must write some HTML!'); + } + $html = HTMLPurifier($dirty_html); + database_insert($html); + display_success(); + // notice that $dirty_html is *not* saved + } else { + display_form(); + } +?>+ +
Accept the submitted text and put it unaltered into the database, but +then also generate a filtered version and stash that in the database. +Serve the filtered version to readers, and the unaltered version to +editors. If need be, you can invalidate the cache and have the cached +filtered version be regenerated on the first page view. Pros? Full data +retention. Cons? It's more complicated, and opens other editors up to +XSS if they are using a WYSIWYG editor (to fix that, they'd have to be +able to get their hands on the *really* original text served in +plaintext mode).
+ +Example (non-functional):
+ +<?php + /** + * VIEW PAGE + * display_error($message) : displays nice error page with message + * cache_get($id) : retrieves HTML from fast cache (db or file) + * cache_insert($id, $html) : inserts good HTML into cache system + * database_get($id) : retrieves raw HTML from database + */ + $id = isset($_GET['id']) ? (int) $_GET['id'] : false; + if (!$id) { + display_error('Must specify ID.'); + exit; + } + $html = cache_get($id); // filesystem or database + if ($html === false) { + // cache didn't have the HTML, generate it + $raw_html = database_get($id); + require_once '/path/to/library/HTMLPurifier.auto.php'; + require_once 'HTMLPurifier.func.php'; + $html = HTMLPurifier($raw_html); + cache_insert($id, $html); + } + echo $html; +?>+ +
In short, inbound filtering is the simple option and caching is the +robust option (albeit with bigger storage requirements).
+ +There is a third option, independent of the two we've discussed: profile +and optimize HTMLPurifier yourself. Be sure to report back your results +if you decide to do that! Especially if you port HTML Purifier to C++. +;-)
+ + + \ No newline at end of file diff --git a/docs/index.html b/docs/index.html index 12d839db..5179205a 100644 --- a/docs/index.html +++ b/docs/index.html @@ -28,6 +28,9 @@ information for casual developers using HTML Purifier.