mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2024-12-22 08:21:52 +00:00
[1.3.3]
- Move SLOW to docs/enduser-slow.html and add code examples - Update README and WYSIWYG - Add warning to HTMLPurifier.func.php about naming similarities git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@635 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
parent
3ad6239dc3
commit
e2cc37724b
2
NEWS
2
NEWS
@ -13,7 +13,7 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
|
|||||||
(major feature release)
|
(major feature release)
|
||||||
|
|
||||||
1.3.3, unknown release date, may be dropped
|
1.3.3, unknown release date, may be dropped
|
||||||
(security/bugfix/minor feature release)
|
! Moved SLOW to docs/enduser-slow.html and added code examples
|
||||||
|
|
||||||
1.3.2, released 2006-12-25
|
1.3.2, released 2006-12-25
|
||||||
! HTMLPurifier object now accepts configuration arrays, no need to manually
|
! HTMLPurifier object now accepts configuration arrays, no need to manually
|
||||||
|
25
README
25
README
@ -1,13 +1,22 @@
|
|||||||
|
|
||||||
README
|
README
|
||||||
All about HTMLPurifier
|
All about HTML Purifier
|
||||||
|
|
||||||
HTMLPurifier is an HTML filtering solution. It uses a unique combination of
|
HTML Purifier is an HTML filtering solution that uses a unique combination
|
||||||
robust whitelists and agressive parsing to ensure that not only are XSS
|
of robust whitelists and agressive parsing to ensure that not only are
|
||||||
attacks thwarted, but the resulting HTML is standards compliant.
|
XSS attacks thwarted, but the resulting HTML is standards compliant.
|
||||||
|
|
||||||
See INSTALL on how to use the library. See docs/ for more developer-oriented
|
HTML Purifier is oriented towards richly formatted documents from
|
||||||
documentation as well as some code examples. Users of TinyMCE or FCKeditor
|
untrusted sources that require CSS and a full tag-set. This library can
|
||||||
may be especially interested in WYSIWYG.
|
be configured to accept a more restrictive set of tags, but it won't be
|
||||||
|
as efficient as more bare-bones parsers. It will, however, do the job
|
||||||
|
right, which may be more important.
|
||||||
|
|
||||||
HTMLPurifier can be found on the web at: http://hp.jpsband.org/
|
Places to go:
|
||||||
|
|
||||||
|
* See INSTALL for a quick installation guide
|
||||||
|
* See docs/ for developer-oriented documentation, code examples and
|
||||||
|
an in-depth installation guide.
|
||||||
|
* See WYSIWYG for information on editors like TinyMCE and FCKeditor
|
||||||
|
|
||||||
|
HTML Purifier can be found on the web at: http://hp.jpsband.org/
|
||||||
|
40
SLOW
40
SLOW
@ -1,40 +0,0 @@
|
|||||||
|
|
||||||
SLOW
|
|
||||||
also known as the HELP ME LIBRARY IS TOO SLOW MY PAGE TAKE TOO LONG LOAD page
|
|
||||||
|
|
||||||
HTML Purifier is a very powerful library. But with power comes great
|
|
||||||
responsibility, or, at least, longer execution times. Remember, this
|
|
||||||
library isn't lightly grazing over submitted HTML: it's deconstructing
|
|
||||||
the whole thing, rigorously checking the parts, and then putting it
|
|
||||||
back together.
|
|
||||||
|
|
||||||
So, if it so turns out that HTML Purifier is kinda too slow for outbound
|
|
||||||
filtering, you've got a few options:
|
|
||||||
|
|
||||||
1. Inbound filtering - perform filtering of HTML when it's submitted by the
|
|
||||||
user. Since the user is already submitting something, an extra half a
|
|
||||||
second tacked on to the load time probably isn't going to be that huge of
|
|
||||||
a problem. Then, displaying the content is a simple a manner of outputting
|
|
||||||
it directly from your database/filesystem. The trouble with this method is
|
|
||||||
that your user loses the original text, and when doing edits, will be
|
|
||||||
handling the filtered text. While this may be a good thing, especially if
|
|
||||||
you're using a WYSIWYG editor, it can also result in data-loss if a user
|
|
||||||
makes a typo.
|
|
||||||
|
|
||||||
2. Caching the filtered output - accept the submitted text and put it
|
|
||||||
unaltered into the database, but then also generate a filtered version and
|
|
||||||
stash that in the database. Serve the filtered version to readers, and the
|
|
||||||
unaltered version to editors. If need be, you can invalidate the cache and
|
|
||||||
have the cached filtered version be regenerated on the first page view. Pros?
|
|
||||||
Full data retention. Cons? It's more complicated, and opens other editors
|
|
||||||
up to XSS if they are using a WYSIWYG editor (to fix that, they'd have to
|
|
||||||
be able to get their hands on the *really* original text served in plaintext
|
|
||||||
mode).
|
|
||||||
|
|
||||||
In short, inbound filtering is almost as simple as outbound filtering, but
|
|
||||||
it has some drawbacks which cannot be fixed unless you save both the original
|
|
||||||
and the filtered versions.
|
|
||||||
|
|
||||||
There is a third option: profile and optimize HTMLPurifier yourself. Be sure
|
|
||||||
to report back your results if you decide to do that! Especially if you
|
|
||||||
port HTML Purifier to C++. ;-)
|
|
3
WYSIWYG
3
WYSIWYG
@ -18,4 +18,5 @@ HTML Purifier is perfect for filtering pure-HTML input from WYSIWYG editors.
|
|||||||
Enough said.
|
Enough said.
|
||||||
|
|
||||||
There is a proof-of-concept integration of HTML Purifier with the Mantis
|
There is a proof-of-concept integration of HTML Purifier with the Mantis
|
||||||
bugtracker at http://hp.jpsband.org/mantis/
|
bugtracker at http://hp.jpsband.org/mantis/ You can see notes on how
|
||||||
|
this integration was acheived at http://hp.jpsband.org/mantis_notes.txt
|
||||||
|
116
docs/enduser-slow.html
Normal file
116
docs/enduser-slow.html
Normal file
@ -0,0 +1,116 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||||
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
|
||||||
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||||||
|
<meta name="description" content="Explains how to speed up HTML Purifier through caching or inbound filtering." />
|
||||||
|
<link rel="stylesheet" type="text/css" href="./style.css" />
|
||||||
|
|
||||||
|
<title>Speeding up HTML Purifier - HTML Purifier</title>
|
||||||
|
|
||||||
|
</head><body>
|
||||||
|
|
||||||
|
<h1 class="subtitled">Speeding up HTML Purifier</h1>
|
||||||
|
<div class="subtitle">...also known as the HELP ME LIBRARY IS TOO SLOW MY PAGE TAKE TOO LONG page</div>
|
||||||
|
|
||||||
|
<div id="filing">Filed under End-User</div>
|
||||||
|
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||||
|
|
||||||
|
<p>HTML Purifier is a very powerful library. But with power comes great
|
||||||
|
responsibility, in the form of longer execution times. Remember, this
|
||||||
|
library isn't lightly grazing over submitted HTML: it's deconstructing
|
||||||
|
the whole thing, rigorously checking the parts, and then putting it back
|
||||||
|
together. </p>
|
||||||
|
|
||||||
|
<p>So, if it so turns out that HTML Purifier is kinda too slow for outbound
|
||||||
|
filtering, you've got a few options: </p>
|
||||||
|
|
||||||
|
<h2>Inbound filtering</h2>
|
||||||
|
|
||||||
|
<p>Perform filtering of HTML when it's submitted by the user. Since the
|
||||||
|
user is already submitting something, an extra half a second tacked on
|
||||||
|
to the load time probably isn't going to be that huge of a problem.
|
||||||
|
Then, displaying the content is a simple a manner of outputting it
|
||||||
|
directly from your database/filesystem. The trouble with this method is
|
||||||
|
that your user loses the original text, and when doing edits, will be
|
||||||
|
handling the filtered text. While this may be a good thing, especially
|
||||||
|
if you're using a WYSIWYG editor, it can also result in data-loss if a
|
||||||
|
user makes a typo. </p>
|
||||||
|
|
||||||
|
<p>Example (non-functional):</p>
|
||||||
|
|
||||||
|
<pre><?php
|
||||||
|
/**
|
||||||
|
* FORM SUBMISSION PAGE
|
||||||
|
* display_error($message) : displays nice error page with message
|
||||||
|
* display_success() : displays a nice success page
|
||||||
|
* display_form() : displays the HTML submission form
|
||||||
|
* database_insert($html) : inserts data into database as new row
|
||||||
|
*/
|
||||||
|
if (!empty($_POST)) {
|
||||||
|
require_once '/path/to/library/HTMLPurifier.auto.php';
|
||||||
|
require_once 'HTMLPurifier.func.php';
|
||||||
|
$dirty_html = isset($_POST['html']) ? $_POST['html'] : false;
|
||||||
|
if (!$dirty_html) {
|
||||||
|
display_error('You must write some HTML!');
|
||||||
|
}
|
||||||
|
$html = HTMLPurifier($dirty_html);
|
||||||
|
database_insert($html);
|
||||||
|
display_success();
|
||||||
|
// notice that $dirty_html is *not* saved
|
||||||
|
} else {
|
||||||
|
display_form();
|
||||||
|
}
|
||||||
|
?></pre>
|
||||||
|
|
||||||
|
<h2>Caching the filtered output</h2>
|
||||||
|
|
||||||
|
<p>Accept the submitted text and put it unaltered into the database, but
|
||||||
|
then also generate a filtered version and stash that in the database.
|
||||||
|
Serve the filtered version to readers, and the unaltered version to
|
||||||
|
editors. If need be, you can invalidate the cache and have the cached
|
||||||
|
filtered version be regenerated on the first page view. Pros? Full data
|
||||||
|
retention. Cons? It's more complicated, and opens other editors up to
|
||||||
|
XSS if they are using a WYSIWYG editor (to fix that, they'd have to be
|
||||||
|
able to get their hands on the *really* original text served in
|
||||||
|
plaintext mode). </p>
|
||||||
|
|
||||||
|
<p>Example (non-functional):</p>
|
||||||
|
|
||||||
|
<pre><?php
|
||||||
|
/**
|
||||||
|
* VIEW PAGE
|
||||||
|
* display_error($message) : displays nice error page with message
|
||||||
|
* cache_get($id) : retrieves HTML from fast cache (db or file)
|
||||||
|
* cache_insert($id, $html) : inserts good HTML into cache system
|
||||||
|
* database_get($id) : retrieves raw HTML from database
|
||||||
|
*/
|
||||||
|
$id = isset($_GET['id']) ? (int) $_GET['id'] : false;
|
||||||
|
if (!$id) {
|
||||||
|
display_error('Must specify ID.');
|
||||||
|
exit;
|
||||||
|
}
|
||||||
|
$html = cache_get($id); // filesystem or database
|
||||||
|
if ($html === false) {
|
||||||
|
// cache didn't have the HTML, generate it
|
||||||
|
$raw_html = database_get($id);
|
||||||
|
require_once '/path/to/library/HTMLPurifier.auto.php';
|
||||||
|
require_once 'HTMLPurifier.func.php';
|
||||||
|
$html = HTMLPurifier($raw_html);
|
||||||
|
cache_insert($id, $html);
|
||||||
|
}
|
||||||
|
echo $html;
|
||||||
|
?></pre>
|
||||||
|
|
||||||
|
<h2>Summary</h2>
|
||||||
|
|
||||||
|
<p>In short, inbound filtering is the simple option and caching is the
|
||||||
|
robust option (albeit with bigger storage requirements). </p>
|
||||||
|
|
||||||
|
<p>There is a third option, independent of the two we've discussed: profile
|
||||||
|
and optimize HTMLPurifier yourself. Be sure to report back your results
|
||||||
|
if you decide to do that! Especially if you port HTML Purifier to C++.
|
||||||
|
<tt>;-)</tt></p>
|
||||||
|
|
||||||
|
</body>
|
||||||
|
</html>
|
@ -28,6 +28,9 @@ information for casual developers using HTML Purifier.</p>
|
|||||||
<dt><a href="enduser-youtube.html">Embedding YouTube videos</a></dt>
|
<dt><a href="enduser-youtube.html">Embedding YouTube videos</a></dt>
|
||||||
<dd>Explains how to safely allow the embedding of flash from trusted sites.</dd>
|
<dd>Explains how to safely allow the embedding of flash from trusted sites.</dd>
|
||||||
|
|
||||||
|
<dt><a href="enduser-slow.html">Speeding up HTML Purifier</a></dt>
|
||||||
|
<dd>Explains how to speed up HTML Purifier through caching or inbound filtering.</dd>
|
||||||
|
|
||||||
</dl>
|
</dl>
|
||||||
|
|
||||||
<h2>Development</h2>
|
<h2>Development</h2>
|
||||||
|
@ -6,6 +6,7 @@
|
|||||||
* this is efficient for instances when you only use HTML Purifier
|
* this is efficient for instances when you only use HTML Purifier
|
||||||
* on a few of your pages, it murders bytecode caching. You still
|
* on a few of your pages, it murders bytecode caching. You still
|
||||||
* need to add HTML Purifier to your path.
|
* need to add HTML Purifier to your path.
|
||||||
|
* @note ''HTMLPurifier()'' is NOT the same as ''new HTMLPurifier()''
|
||||||
*/
|
*/
|
||||||
|
|
||||||
function HTMLPurifier($html, $config = null) {
|
function HTMLPurifier($html, $config = null) {
|
||||||
|
Loading…
Reference in New Issue
Block a user