Commit strict version of HTML Purifier.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk-strict@647 48356398-32a2-884e-a903-53898d9a118a
2025-04-24 03:24:36 +00:00 · 2007-01-16 21:59:29 +00:00 · 2007-01-16 21:59:29 +00:00 · 2bf912d528
commit 2bf912d528
parent a909632d2d
45 changed files with 1022 additions and 153 deletions
--- a/1
+++ b/1
@ -8,6 +8,7 @@ installation GUI, you've come to the wrong place!)  The impatient can scroll
 down to the bottom of this INSTALL document to see the code, but you really
 should make sure a few things are properly done.

+Todo: Convert to using the array syntax for configuration.


 1.  Compatibility
--- a/10
+++ b/10
@ -10,10 +10,14 @@ NEWS ( CHANGELOG and HISTORY )                                     HTMLPurifier
 ==========================

 1.4.0, unknown release date
-(major feature release)
+! Implemented list-style-image, URIs now allowed in list-style
+! Implemented background-image, background-repeat and background-attachment
+  CSS properties. background shorthand property HAS NOT been extended
+  to allow these, and background-position IS NOT implemented yet.
+. Implemented AttrDef_CSSURI for url(http://google.com) style declarations

-1.3.3, unknown release date, may be dropped
-(security/bugfix/minor feature release)
+1.3.3, unknown release date, likely to be dropped
+! Moved SLOW to docs/enduser-slow.html and added code examples

 1.3.2, released 2006-12-25
 ! HTMLPurifier object now accepts configuration arrays, no need to manually
--- a/25
+++ b/25
@ -1,13 +1,22 @@

 README
-    All about HTMLPurifier
+    All about HTML Purifier

-HTMLPurifier is an HTML filtering solution.  It uses a unique combination of
-robust whitelists and agressive parsing to ensure that not only are XSS
-attacks thwarted, but the resulting HTML is standards compliant.
+HTML Purifier is an HTML filtering solution that uses a unique combination 
+of robust whitelists and agressive parsing to ensure that not only are 
+XSS attacks thwarted, but the resulting HTML is standards compliant. 

-See INSTALL on how to use the library.  See docs/ for more developer-oriented
-documentation as well as some code examples.  Users of TinyMCE or FCKeditor
-may be especially interested in WYSIWYG.
+HTML Purifier is oriented towards richly formatted documents from 
+untrusted sources that require CSS and a full tag-set.  This library can 
+be configured to accept a more restrictive set of tags, but it won't be 
+as efficient as more bare-bones parsers. It will, however, do the job 
+right, which may be more important. 

-HTMLPurifier can be found on the web at: http://hp.jpsband.org/
+Places to go:
+
+* See INSTALL for a quick installation guide
+* See docs/ for developer-oriented documentation, code examples and
+  an in-depth installation guide.
+* See WYSIWYG for information on editors like TinyMCE and FCKeditor
+
+HTML Purifier can be found on the web at: http://hp.jpsband.org/
--- a/40
+++ b/40
@ -1,40 +0,0 @@
-
-SLOW
-  also known as the HELP ME LIBRARY IS TOO SLOW MY PAGE TAKE TOO LONG LOAD page
-
-HTML Purifier is a very powerful library.  But with power comes great
-responsibility, or, at least, longer execution times.  Remember, this
-library isn't lightly grazing over submitted HTML: it's deconstructing
-the whole thing, rigorously checking the parts, and then putting it
-back together.
-
-So, if it so turns out that HTML Purifier is kinda too slow for outbound
-filtering, you've got a few options:
-
-1. Inbound filtering - perform filtering of HTML when it's submitted by the
-user.  Since the user is already submitting something, an extra half a
-second tacked on to the load time probably isn't going to be that huge of
-a problem.  Then, displaying the content is a simple a manner of outputting
-it directly from your database/filesystem.  The trouble with this method is
-that your user loses the original text, and when doing edits, will be
-handling the filtered text.  While this may be a good thing, especially if
-you're using a WYSIWYG editor, it can also result in data-loss if a user
-makes a typo.
-
-2. Caching the filtered output - accept the submitted text and put it
-unaltered into the database, but then also generate a filtered version and
-stash that in the database.  Serve the filtered version to readers, and the
-unaltered version to editors.  If need be, you can invalidate the cache and
-have the cached filtered version be regenerated on the first page view.  Pros?
-Full data retention.  Cons?  It's more complicated, and opens other editors
-up to XSS if they are using a WYSIWYG editor (to fix that, they'd have to
-be able to get their hands on the *really* original text served in plaintext
-mode).
-
-In short, inbound filtering is almost as simple as outbound filtering, but
-it has some drawbacks which cannot be fixed unless you save both the original
-and the filtered versions.
-
-There is a third option: profile and optimize HTMLPurifier yourself.  Be sure
-to report back your results if you decide to do that!  Especially if you
-port HTML Purifier to C++.  ;-)
--- a/3
+++ b/3
@ -18,4 +18,5 @@ HTML Purifier is perfect for filtering pure-HTML input from WYSIWYG editors.
 Enough said.

 There is a proof-of-concept integration of HTML Purifier with the Mantis
-bugtracker at http://hp.jpsband.org/mantis/
+bugtracker at http://hp.jpsband.org/mantis/ You can see notes on how
+this integration was acheived at http://hp.jpsband.org/mantis_notes.txt
--- a/docs/dev-progress.html
+++ b/docs/dev-progress.html
@ -59,7 +59,7 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
 <tbody>
 <tr><th colspan="2">Standard</th></tr>
 <tr class="css1 impl-yes"><td>background-color</td><td>COMPOSITE(&lt;color&gt;, transparent)</td></tr>
-<tr class="css1 impl-yes"><td>background</td><td>SHORTHAND, only for color, see below for info on background-image and friends</td></tr>
+<tr class="css1 impl-partial"><td>background</td><td>SHORTHAND</td></tr>
 <tr class="css1 impl-yes"><td>border</td><td>SHORTHAND, MULTIPLE</td></tr>
 <tr class="css1 impl-yes"><td>border-color</td><td>MULTIPLE</td></tr>
 <tr class="css1 impl-yes"><td>border-style</td><td>MULTIPLE</td></tr>
@ -141,8 +141,8 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}

 <tbody>
 <tr><th colspan="2">Unknown</th></tr>
-<tr class="danger css1"><td>background-image</td><td>Dangerous, target milestone 1.3</td></tr>
-<tr class="css1"><td>background-attachment</td><td>ENUM(scroll, fixed),
+<tr class="danger css1 impl-yes"><td>background-image</td><td>Dangerous, target milestone 1.3</td></tr>
+<tr class="css1 impl-yes"><td>background-attachment</td><td>ENUM(scroll, fixed),
    Depends on background-image</td></tr>
 <tr class="css1"><td>background-position</td><td>Depends on background-image</td></tr>
 <tr class="danger impl-no"><td>cursor</td><td>Dangerous but fluffy</td></tr>
@ -151,7 +151,7 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
    inline-block has incomplete IE6 support and requires -moz-inline-box
    for Mozilla. Unknown target milestone.</td></tr>
 <tr><td class="css1">height</td><td>Interesting, why use it? Unknown target milestone.</td></tr>
-<tr class="danger css1"><td>list-style-image</td><td>Dangerous? Target milestone 1.3</td></tr>
+<tr class="danger css1 impl-yes"><td>list-style-image</td><td>Dangerous?</td></tr>
 <tr class="impl-no"><td>max-height</td><td rowspan="4">No IE 5/6</td></tr>
 <tr class="impl-no"><td>min-height</td></tr>
 <tr class="impl-no"><td>max-width</td></tr>
--- a/docs/enduser-slow.html
+++ b/docs/enduser-slow.html
@ -0,0 +1,116 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
+<meta name="description" content="Explains how to speed up HTML Purifier through caching or inbound filtering." />
+<link rel="stylesheet" type="text/css" href="./style.css" />
+
+<title>Speeding up HTML Purifier - HTML Purifier</title>
+
+</head><body>
+
+<h1 class="subtitled">Speeding up HTML Purifier</h1>
+<div class="subtitle">...also known as the HELP ME LIBRARY IS TOO SLOW MY PAGE TAKE TOO LONG page</div>
+
+<div id="filing">Filed under End-User</div>
+<div id="index">Return to the <a href="index.html">index</a>.</div>
+
+<p>HTML Purifier is a very powerful library. But with power comes great 
+responsibility, in the form of longer execution times.  Remember, this 
+library isn't lightly grazing over submitted HTML: it's deconstructing 
+the whole thing, rigorously checking the parts, and then putting it back 
+together. </p>
+
+<p>So, if it so turns out that HTML Purifier is kinda too slow for outbound 
+filtering, you've got a few options: </p>
+
+<h2>Inbound filtering</h2>
+
+<p>Perform filtering of HTML when it's submitted by the user. Since the 
+user is already submitting something, an extra half a second tacked on 
+to the load time probably isn't going to be that huge of a problem.  
+Then, displaying the content is a simple a manner of outputting it 
+directly from your database/filesystem. The trouble with this method is 
+that your user loses the original text, and when doing edits, will be 
+handling the filtered text.  While this may be a good thing, especially 
+if you're using a WYSIWYG editor, it can also result in data-loss if a 
+user makes a typo. </p>
+
+<p>Example (non-functional):</p>
+
+<pre>&lt;?php
+    /**
+     * FORM SUBMISSION PAGE
+     * display_error($message) : displays nice error page with message
+     * display_success() : displays a nice success page
+     * display_form() : displays the HTML submission form
+     * database_insert($html) : inserts data into database as new row
+     */
+    if (!empty($_POST)) {
+        require_once '/path/to/library/HTMLPurifier.auto.php';
+        require_once 'HTMLPurifier.func.php';
+        $dirty_html = isset($_POST['html']) ? $_POST['html'] : false;
+        if (!$dirty_html) {
+            display_error('You must write some HTML!');
+        }
+        $html = HTMLPurifier($dirty_html);
+        database_insert($html);
+        display_success();
+        // notice that $dirty_html is *not* saved
+    } else {
+        display_form();
+    }
+?&gt;</pre>
+
+<h2>Caching the filtered output</h2>
+
+<p>Accept the submitted text and put it unaltered into the database, but 
+then also generate a filtered version and stash that in the database.  
+Serve the filtered version to readers, and the unaltered version to 
+editors.  If need be, you can invalidate the cache and have the cached 
+filtered version be regenerated on the first page view.  Pros? Full data 
+retention. Cons? It's more complicated, and opens other editors up to 
+XSS if they are using a WYSIWYG editor (to fix that, they'd have to be 
+able to get their hands on the *really* original text served in 
+plaintext mode). </p>
+
+<p>Example (non-functional):</p>
+
+<pre>&lt;?php
+    /**
+     * VIEW PAGE
+     * display_error($message) : displays nice error page with message
+     * cache_get($id) : retrieves HTML from fast cache (db or file)
+     * cache_insert($id, $html) : inserts good HTML into cache system
+     * database_get($id) : retrieves raw HTML from database
+     */
+    $id = isset($_GET['id']) ? (int) $_GET['id'] : false;
+    if (!$id) {
+        display_error('Must specify ID.');
+        exit;
+    }
+    $html = cache_get($id); // filesystem or database
+    if ($html === false) {
+        // cache didn't have the HTML, generate it
+        $raw_html = database_get($id);
+        require_once '/path/to/library/HTMLPurifier.auto.php';
+        require_once 'HTMLPurifier.func.php';
+        $html = HTMLPurifier($raw_html);
+        cache_insert($id, $html);
+    }
+    echo $html;
+?&gt;</pre>
+
+<h2>Summary</h2>
+
+<p>In short, inbound filtering is the simple option and caching is the
+robust option (albeit with bigger storage requirements). </p>
+
+<p>There is a third option, independent of the two we've discussed: profile 
+and optimize HTMLPurifier yourself. Be sure to report back your results 
+if you decide to do that! Especially if you port HTML Purifier to C++. 
+<tt>;-)</tt></p>
+
+</body>
+</html>
--- a/docs/enduser-utf8.html
+++ b/docs/enduser-utf8.html
@ -0,0 +1,623 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head>
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
+<meta name="description" content="Describes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch." />
+<link rel="stylesheet" type="text/css" href="./style.css" />
+<script defer="defer" type="text/javascript" src="./toc-gen.js"></script>
+<style type="text/css">
+    .minor td {font-style:italic;}
+</style>
+
+<title>UTF-8 - HTML Purifier</title>
+
+<!-- Note to users: this document, though professing to be UTF-8, attempts
+to use only ASCII characters, because most webservers are configured
+to send HTML as ISO-8859-1. So I will, many times, go against my
+own advice for sake of portability.  -->
+
+</head><body>
+
+<h1>UTF-8</h1>
+
+<div id="filing">Filed under End-User</div>
+<div id="index">Return to the <a href="index.html">index</a>.</div>
+
+<p>Character encoding and character sets, in truth, are not that
+difficult to understand. But if you don't understand them, you are going
+to be caught by surprise by some of HTML Purifier's behavior, namely
+the fact that it operates UTF-8 or the limitations of the character
+encoding transformations it does. This document will walk you through
+determining the encoding of your system and how you should handle
+this information. It will stay away from excessive discussion on
+the internals of character encoding, but offer the information in
+asides that can easily be skipped.</p>
+
+<blockquote class="aside">
+<div class="label">Asides</div>
+    <p>Text in this formatting is an <strong>aside</strong>,
+    interesting tidbits for the curious but not strictly necessary material to
+    do the tutorial. If you read this text, you'll come out
+    with a greater understanding of the underlying issues.</p>
+</blockquote>
+
+<h2 id="findcharset">Finding the real encoding</h2>
+
+<p>In the beginning, there was ASCII, and things were simple. But they
+weren't good, for no one could write in Cryllic or Thai. So there
+exploded a proliferation of character encodings to remedy the problem
+by extending the characters ASCII could express. This ridiculously
+simplified version of the history of character encodings shows us that
+there are now many character encodings floating around.</p>
+
+<blockquote class="aside">
+    <p>A <strong>character encoding</strong> tells the computer how to
+    interpret raw zeroes and ones into real characters. It
+    usually does this by pairing numbers with characters.</p>
+    <p>There are many different types of character encodings floating
+    around, but the ones we deal most frequently with are ASCII, 
+    8-bit encodings, and Unicode-based encodings.</p>
+    <ul>
+        <li><strong>ASCII</strong> is a 7-bit encoding based on the
+            English alphabet.</li>
+        <li><strong>8-bit encodings</strong> are extensions to ASCII
+            that add a potpourri of useful, non-standard characters
+            like &eacute; and &aelig;. They can only add 127 characters,
+            so usually only support one script at a time. When you
+            see a page on the web, chances are it's encoded in one
+            of these encodings.</li>
+        <li><strong>Unicode-based encodings</strong> implement the
+            Unicode standard and include UTF-8, UCS-2 and UTF-16.
+            They go beyond 8-bits (the first two are variable length,
+            while the second one uses 16-bits), and support almost
+            every language in the world. UTF-8 is gaining traction
+            as the dominant international encoding of the web.</li>
+    </ul>
+</blockquote>
+
+<p>The first step of our journey is to find out what the encoding of
+your website is. The most reliable way is to ask your
+browser:</p>
+
+<dl>
+    <dt>Mozilla Firefox</dt>
+    <dd>Tools &gt; Page Info: Encoding</dd>
+    <dt>Internet Explorer</dt>
+    <dd>View &gt; Encoding: bulleted item is unofficial name</dd>
+</dl>
+
+<p>Internet Explorer won't give you the mime (i.e. useful/real) name of the
+character encoding, so you'll have to look it up using their description.
+Some common ones:</p>
+
+<table class="table">
+    <thead><tr>
+        <th>IE's Description</th>
+        <th>Mime Name</th>
+    </tr></thead>
+    <tbody>
+        <tr><th colspan="2">Windows</th></tr>
+        <tr><td>Arabic (Windows)</td><td>Windows-1256</td></tr>
+        <tr><td>Baltic (Windows)</td><td>Windows-1257</td></tr>
+        <tr><td>Central European (Windows)</td><td>Windows-1250</td></tr>
+        <tr><td>Cyrillic (Windows)</td><td>Windows-1251</td></tr>
+        <tr><td>Greek (Windows)</td><td>Windows-1253</td></tr>
+        <tr><td>Hebrew (Windows)</td><td>Windows-1255</td></tr>
+        <tr><td>Thai (Windows)</td><td>TIS-620</td></tr>
+        <tr><td>Turkish (Windows)</td><td>Windows-1254</td></tr>
+        <tr><td>Vietnamese (Windows)</td><td>Windows-1258</td></tr>
+        <tr><td>Western European (Windows)</td><td>Windows-1252</td></tr>
+    </tbody>
+    <tbody>
+        <tr><th colspan="2">ISO</th></tr>
+        <tr><td>Arabic (ISO)</td><td>ISO-8859-6</td></tr>
+        <tr><td>Baltic (ISO)</td><td>ISO-8859-4</td></tr>
+        <tr><td>Central European (ISO)</td><td>ISO-8859-2</td></tr>
+        <tr><td>Cyrillic (ISO)</td><td>ISO-8859-5</td></tr>
+        <tr class="minor"><td>Estonian (ISO)</td><td>ISO-8859-13</td></tr>
+        <tr class="minor"><td>Greek (ISO)</td><td>ISO-8859-7</td></tr>
+        <tr><td>Hebrew (ISO-Logical)</td><td>ISO-8859-8-l</td></tr>
+        <tr><td>Hebrew (ISO-Visual)</td><td>ISO-8859-8</td></tr>
+        <tr class="minor"><td>Latin 9 (ISO)</td><td>ISO-8859-15</td></tr>
+        <tr class="minor"><td>Turkish (ISO)</td><td>ISO-8859-9</td></tr>
+        <tr><td>Western European (ISO)</td><td>ISO-8859-1</td></tr>
+    </tbody>
+    <tbody>
+        <tr><th colspan="2">Other</th></tr>
+        <tr><td>Chinese Simplified (GB18030)</td><td>GB18030</td></tr>
+        <tr><td>Chinese Simplified (GB2312)</td><td>GB2312</td></tr>
+        <tr><td>Chinese Simplified (HZ)</td><td>HZ</td></tr>
+        <tr><td>Chinese Traditional (Big5)</td><td>Big5</td></tr>
+        <tr><td>Japanese (Shift-JIS)</td><td>Shift_JIS</td></tr>
+        <tr><td>Japanese (EUC)</td><td>EUC-JP</td></tr>
+        <tr><td>Korean</td><td>EUC-KR</td></tr>
+        <tr><td>Unicode (UTF-8)</td><td>UTF-8</td></tr>
+    </tbody>
+</table>
+
+<p>Internet Explorer does not recognize some of the more obscure
+character encodings, and having to lookup the real names with a table
+is a pain, so I recommend using Mozilla Firefox to find out your
+character encoding.</p>
+
+<h2 id="findmetacharset">Finding the embedded encoding</h2>
+
+<p>At this point, you may be asking, &quot;Didn't we already find out our
+encoding?&quot; Well, as it turns out, there are multiple places where
+a web developer can specify a character encoding, and one such place
+is in a <code>META</code> tag:</p>
+
+<pre>&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=UTF-8&quot; /&gt;</pre>
+
+<p>You'll find this in the <code>HEAD</code> section of an HTML document.
+The text to the right of <code>charset=</code> is the &quot;claimed&quot;
+encoding: the HTML claims to be this encoding, but whether or not this
+is actually the case depends on other factors. For now, take note
+if your <code>META</code> tag claims that either:</p>
+
+<ol>
+    <li>The character encoding is the same as the one reported by the
+        browser,</li>
+    <li>The character encoding is different from the browser's, or</li>
+    <li>There is no <code>META</code> tag at all! (horror, horror!)</li>
+</ol>
+
+<h2 id="fixcharset">Fixing the encoding</h2>
+
+<p>If your <code>META</code> encoding and your real encoding match,
+savvy! You can skip this section. If they don't...</p>
+
+<h3 id="fixcharset-none">No embedded encoding</h3>
+
+<p>If this is the case, you'll want to add in the appropriate
+<code>META</code> tag to your website. It's as simple as copy-pasting
+the code snippet above and replacing UTF-8 with whatever is the mime name
+of your real encoding.</p>
+
+<blockquote class="aside">
+    <p>For all those skeptics out there, there is a very good reason
+    why the character encoding should be explicitly stated. When the
+    browser isn't told what the character encoding of a text is, it
+    has to guess: and sometimes the guess is wrong. Hackers can manipulate
+    this guess in order to slip XSS pass filters and then fool the
+    browser into executing it as active code. A great example of this
+    is the <a href="http://shiflett.org/archive/177">Google UTF-7
+    exploit</a>.</p>
+    <p>You might be able to get away with not specifying a character
+    encoding with the <code>META</code> tag as long as your webserver
+    sends the right Content-Type header, but why risk it? Besides, if
+    the user downloads the HTML file, there is no longer any webserver
+    to define the character encoding.</p>
+</blockquote>
+
+<h3 id="fixcharset-diff">Embedded encoding disagrees</h3>
+
+<p>This is an extremely common mistake: another source is telling
+the browser what the
+character encoding is and is overriding the embedded encoding. This
+source usually is the Content-Type HTTP header that the webserver (i.e.
+Apache) sends. A usual Content-Type header sent with a page might
+look like this:</p>
+
+<pre>Content-Type: text/html; charset=ISO-8859-1</pre>
+
+<p>Notice how there is a charset parameter: this is the webserver's
+way of telling a browser what the character encoding is, much like
+the <code>META</code> tags we touched upon previously.</p>
+
+<blockquote class="aside"><p>In fact, the <code>META</code> tag is
+designed as a substitute for the HTTP header for contexts where
+sending headers is impossible (such as locally stored files without
+a webserver). Thus the name <code>http-equiv</code> (HTTP equivalent).
+</p></blockquote>
+
+<p>There are two ways to go about fixing this: changing the <code>META</code>
+tag to match the HTTP header, or changing the HTTP header to match
+the <code>META</code> tag. How do we know which to do? It depends
+on the website's content: after all, headers and tags are only ways of
+describing the actual characters on the web page.</p>
+
+<p>If your website:</p>
+
+<dl>
+    <dt>...only uses ASCII characters,</dt>
+    <dd>Either way is fine, but I recommend switching both to
+        UTF-8 (more on this later).</dd>
+    <dt>...uses special characters, and they display
+        properly,</dt>
+    <dd>Change the embedded encoding to the server encoding.</dd>
+    <dt>...uses special characters, but users often complain that
+        they come out garbled,</dt>
+    <dd>Change the server encoding to the embedded encoding.</dd>
+</dl>
+
+<p>Changing a META tag is easy: just swap out the old encoding
+for the new. Changing the server (HTTP header) encoding, however,
+is slightly more difficult.</p>
+
+<h3 id="fixcharset-server">Changing the server encoding</h3>
+
+<h4 id="fixcharset-server-php">PHP header() function</h4>
+
+<p>The simplest way to handle this problem is to send the encoding
+yourself, via your programming language. Since you're using HTML
+Purifier, I'll assume PHP, although it's not too difficult to do
+similar things in
+<a href="http://www.w3.org/International/O-HTTP-charset#scripting">other
+languages</a>. The appropriate code is:</p>
+
+<pre><a href="http://php.net/function.header">header</a>('Content-Type:text/html; charset=UTF-8');</pre>
+
+<p>...replacing UTF-8 with whatever your embedded encoding is.
+This code must come before any output, so be careful about
+stray whitespace in your application.</p>
+
+<h4 id="fixcharset-server-phpini">PHP ini directive</h4>
+
+<p>PHP also has a neat little ini directive that can save you a
+header call: <code><a href="http://php.net/ini.core#ini.default-charset">default_charset</a></code>. Using this code:</p>
+
+<pre><a href="http://php.net/function.ini_set">ini_set</a>('default_charset', 'UTF-8');</pre>
+
+<p>...will also do the trick. If PHP is running as an Apache module (and
+not as FastCGI, consult
+<a href="http://php.net/phpinfo">phpinfo</a>() for details), you can even use htaccess do apply this property
+globally:</p>
+
+<pre><a href="http://php.net/configuration.changes#configuration.changes.apache">php_value</a> default_charset &quot;UTF-8&quot;</pre>
+
+<blockquote class="aside"><p>As with all INI directives, this can
+also go in your php.ini file. Some hosting providers allow you to customize
+your own php.ini file, ask your support for details. Use:</p>
+<pre>default_charset = &quot;utf-8&quot;</pre></blockquote>
+
+<h4 id="fixcharset-server-nophp">Non-PHP</h4>
+
+<p>You may, for whatever reason, may need to set the character encoding
+on non-PHP files, usually plain ol' HTML files. Doing this
+is more of a hit-or-miss process: depending on the software being
+used as a webserver and the configuration of that software, certain
+techniques may work, or may not work.</p>
+
+<h4 id="fixcharset-server-htaccess">.htaccess</h4>
+
+<p>On Apache, you can use an .htaccess file to change the character
+encoding. I'll defer to
+<a href="http://www.w3.org/International/questions/qa-htaccess-charset">W3C</a>
+for the in-depth explanation, but it boils down to creating a file
+named .htaccess with the contents:</p>
+
+<pre><a href="http://httpd.apache.org/docs/1.3/mod/mod_mime.html#addcharset">AddCharset</a> UTF-8 .html</pre>
+
+<p>Where UTF-8 is replaced with the character encoding you want to
+use and .html is a file extension that this will be applied to. This
+character encoding will then be set for any file directly in
+or in the subdirectories of directory you place this file in.</p>
+
+<p>If you're feeling particularly courageous, you can use:</p>
+
+<pre><a href="http://httpd.apache.org/docs/1.3/mod/core.html#adddefaultcharset">AddDefaultCharset</a> UTF-8</pre>
+
+<p>...which changes the character set Apache adds to any document that
+doesn't have any Content-Type parameters. This directive, which the
+default configuration file sets to iso-8859-1 for security
+reasons, is probably why your headers mismatch
+with the <code>META</code> tag. If you would prefer Apache not to be
+butting in on your character encodings, you can tell it not
+to send anything at all:</p>
+
+<pre><a href="http://httpd.apache.org/docs/1.3/mod/core.html#adddefaultcharset">AddDefaultCharset</a> Off</pre>
+
+<p>...making your <code>META</code> tags the sole source of
+character encoding information. In these cases, it is
+<em>especially</em> important to make sure you have valid <code>META</code>
+tags on your pages and all the text before them is ASCII.</p>
+
+<blockquote class="aside"><p>These directives can also be
+placed in httpd.conf file for Apache, but
+in most shared hosting situations you won't be able to edit this file.
+</p></blockquote>
+
+<h4 id="fixcharset-server-ext">File extensions</h4>
+
+<p>If you're not allowed to use .htaccess files, you can often
+piggy-back off of Apache's default AddCharset declarations to get
+your files in the proper extension. Here are Apache's default
+character set declarations:</p>
+
+<table class="table">
+    <thead><tr>
+        <th>Charset</th>
+        <th>File extension(s)</th>
+    </tr></thead>
+    <tbody>
+        <tr><td>ISO-8859-1</td><td>.iso8859-1 .latin1</td></tr>
+        <tr><td>ISO-8859-2</td><td>.iso8859-2 .latin2 .cen</td></tr>
+        <tr><td>ISO-8859-3</td><td>.iso8859-3 .latin3</td></tr>
+        <tr><td>ISO-8859-4</td><td>.iso8859-4 .latin4</td></tr>
+        <tr><td>ISO-8859-5</td><td>.iso8859-5 .latin5 .cyr .iso-ru</td></tr>
+        <tr><td>ISO-8859-6</td><td>.iso8859-6 .latin6 .arb</td></tr>
+        <tr><td>ISO-8859-7</td><td>.iso8859-7 .latin7 .grk</td></tr>
+        <tr><td>ISO-8859-8</td><td>.iso8859-8 .latin8 .heb</td></tr>
+        <tr><td>ISO-8859-9</td><td>.iso8859-9 .latin9 .trk</td></tr>
+        <tr><td>ISO-2022-JP</td><td>.iso2022-jp .jis</td></tr>
+        <tr><td>ISO-2022-KR</td><td>.iso2022-kr .kis</td></tr>
+        <tr><td>ISO-2022-CN</td><td>.iso2022-cn .cis</td></tr>
+        <tr><td>Big5</td><td>.Big5 .big5 .b5</td></tr>
+        <tr><td>WINDOWS-1251</td><td>.cp-1251 .win-1251</td></tr>
+        <tr><td>CP866</td><td>.cp866</td></tr>
+        <tr><td>KOI8-r</td><td>.koi8-r .koi8-ru</td></tr>
+        <tr><td>KOI8-ru</td><td>.koi8-uk .ua</td></tr>
+        <tr><td>ISO-10646-UCS-2</td><td>.ucs2</td></tr>
+        <tr><td>ISO-10646-UCS-4</td><td>.ucs4</td></tr>
+        <tr><td>UTF-8</td><td>.utf8</td></tr>
+        <tr><td>GB2312</td><td>.gb2312 .gb </td></tr>
+        <tr><td>utf-7</td><td>.utf7</td></tr>
+        <tr><td>EUC-TW</td><td>.euc-tw</td></tr>
+        <tr><td>EUC-JP</td><td>.euc-jp</td></tr>
+        <tr><td>EUC-KR</td><td>.euc-kr</td></tr>
+        <tr><td>shift_jis</td><td>.sjis</td></tr>
+    </tbody>
+</table>
+
+<p>So, for example, a file named <code>page.utf8.html</code> or
+<code>page.html.utf8</code> will probably be sent with the UTF-8 charset
+attached, the difference being that if there is an
+<code>AddCharset charset .html</code> declaration, it will override
+the .utf8 extension in <code>page.utf8.html</code> (precedence moves
+from right to left). By default, Apache has no such declaration.</p>
+
+<h4 id="fixcharset-server-iis">Microsoft IIS</h4>
+
+<p>If anyone can contribute information on how to configure Microsoft
+IIS to change character encodings, I'd be grateful.</p>
+
+<h3 id="fixcharset-xml">XML</h3>
+
+<p><code>META</code> tags are the most common source of embedded
+encodings, but they can also come from somewhere else: XML
+processing instructions. They look like:</p>
+
+<pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;</pre>
+
+<p>...and are most often found in XML documents (including XHTML).</p>
+
+<p>For XHTML, this processing instruction theoretically
+overrides the <code>META</code> tag. In reality, this happens only when the
+XHTML is actually served as legit XML and not HTML, which is almost
+always never due to Internet Explorer's lack of support for 
+<code>application/xhtml+xml</code> (even though doing so is often
+argued to be <a href="http://www.hixie.ch/advocacy/xhtml">good practice</a>).</p>
+
+<p>For XML, however, this processing instruction is extremely important.
+Since most webservers are not configured to send charsets for .xml files,
+this is the only thing a parser has to go on. Furthermore, the default
+for XML files is UTF-8, which often butts heads with more common
+ISO-8859-1 encoding (you see this in garbled RSS feeds).</p>
+
+<p>In short, if you use XHTML and have gone through the
+trouble of adding the XML header, be sure to make sure it jives
+with your <code>META</code> tags and HTTP headers.</p>
+
+<h3>Inside the process</h3>
+
+<p>This section is not required reading,
+but may answer some of your questions on what's going on in all
+this character encoding hocus pocus. If you're interested in
+moving on to the next phase, skip this section.</p>
+
+<p>A logical question that follows all of our wheeling and dealing
+with multiple sources of character encodings is &quot;Why are there
+so many options?&quot; To answer this question, we have to turn
+back our definition of character encodings: they allow a program
+to interpret bytes into human-readable characters.</p>
+
+<p>Thus, a chicken-egg problem: a character encoding
+is necessary to interpret the
+text of a document. A <code>META</code> tag is in the text of a document.
+The <code>META</code> tag gives the character encoding. How can we
+determine the contents of a <code>META</code> tag, inside the text,
+if we don't know it's character encoding? And how do we figure out
+the character encoding, if we don't know the contents of the
+<code>META</code> tag?</p>
+
+<p>Fortunantely for us, the characters we need to write the
+<code>META</code> are in ASCII, which is pretty much universal
+over every character encoding that is in common use today. So,
+all the web-browser has to do is parse all the way down until
+it gets to the Content-Type tag, extract the character encoding
+tag, then re-parse the document according to this new information.</p>
+
+<p>Obviously this is complicated, so browsers prefer the simpler
+and more efficient solution: get the character encoding from a 
+somewhere other than the document itself, i.e. the HTTP headers,
+much to the chagrin of HTML authors who can't set these headers.</p>
+
+<h2 id="whyutf8">Why UTF-8?</h2>
+
+<p>So, you've gone through all the trouble of ensuring that your
+server and embedded characters all line up properly and are
+present.  Good job: at
+this point, you could quit and rest easy knowing that your pages
+are not vulnerable to character encoding style XSS attacks.
+However, just as having a character encoding is better than
+having no character encoding at all, having UTF-8 as your
+character encoding is better than having some other random
+character encoding, and the next step is to convert to UTF-8.
+But why?</p>
+
+<h3 id="whyutf8-i18n">Internationalization</h3>
+
+<p>Many software projects, at one point or another, suddenly realize
+that they should be supporting more than one language. Even regular
+usage in one language sometimes requires the occasional special character
+that, without surprise, is not available in your character set. Sometimes
+developers get around this by adding support for multiple encodings: when
+using Chinese, use Big5, when using Japanese, use Shift-JIS, when
+using Greek, etc. Other times, they use character entities with great
+zeal.</p>
+
+<p>UTF-8, however, obviates the need for any of these complicated
+measures. After getting the system to use UTF-8 and adjusting for
+sources that are outside the hand of the browser (more on this later),
+UTF-8 just works. You can use it for any language, even many languages
+at once, you don't have to worry about managing multiple encodings,
+you don't have to use those user-unfriendly entities.</p>
+
+<h3 id="whyutf8-user">User-friendly</h3>
+
+<p>Websites encoded in Latin-1 (ISO-8859-1) which ocassionally need
+a special character outside of their scope often will use a character
+entity to achieve the desired effect. For instance, &theta; can be
+written <code>&amp;theta;</code>, regardless of the character encoding's
+support of Greek letters.</p>
+
+<p>This works nicely for limited use of special characters, but
+say you wanted this sentence of Chinese text: &#28608;&#20809;,
+&#36889;&#20841;&#20491;&#23383;&#26159;&#29978;&#40636;&#24847;&#24605;.
+The entity-ized version would look like this:</p>
+
+<pre>&amp;#28608;&amp;#20809;, &amp;#36889;&amp;#20841;&amp;#20491;&amp;#23383;&amp;#26159;&amp;#29978;&amp;#40636;&amp;#24847;&amp;#24605;</pre>
+
+<p>Extremely inconvenient for those of us who actually know what
+character entities are, totally unintelligible to poor users who don't!
+Even the slightly more user-friendly, &quot;intelligible&quot; character
+entities like <code>&amp;theta;</code> will leave users who are
+uninterested in learning HTML scratching their heads. On the other
+hand, if they see &theta; in an edit box, they'll know that it's a
+special character, and treat it accordingly, even if they don't know
+how to write that character themselves.</p>
+
+<blockquote class="aside"><p>Wikipedia is a great case study for
+an application that originally used ISO-8859-1 but switched to UTF-8
+when it became far to cumbersome to support foreign languages. Bots
+will now actually go through articles and convert character entities
+to their corresponding real characters for the sake of user-friendliness
+and searcheability. See
+<a href="http://meta.wikimedia.org/wiki/Help:Special_characters">Meta's
+page on special characters</a> for more details.
+</p></blockquote>
+
+<h3 id="whyutf8-forms">Forms</h3>
+
+<p>While we're on the tack of users, how do non-UTF-8 web forms deal
+with characters that our outside of their character set? Rather than
+discuss what UTF-8 does right, we're going to show what could go wrong
+if you didn't use UTF-8 and people tried to use characters outside
+of your character encoding.</p>
+
+<p>The troubles are large, extensive, and extremely difficult to fix (or,
+at least, difficult enough that if you had the time and resources to invest
+in doing the fix, you would be probably better off migrating to UTF-8).
+There are two types of form submission: <code>application/x-www-form-urlencoded</code>
+which is used for GET and by default for POST, and <code>multipart/form-data</code>
+which may be used by POST, and is required when you want to upload
+files.</p>
+
+<p>The following is a summarization of notes from
+<a href="http://ppewww.physics.gla.ac.uk/~flavell/charset/form-i18n.html">
+<code>FORM</code> submission and i18n</a>. That document contains lots
+of useful information, but is written in a rambly manner, so
+here I try to get right to the point.</p>
+
+<h4 id="whyutf8-forms-urlencoded"><code>application/x-www-form-urlencoded</code></h4>
+
+<p>This is the Content-Type that GET requests must use, and POST requests
+use by default. It involves the ubiquituous percent encoding format that
+looks something like: <code>%C3%86</code>. There is no official way of
+determining the character encoding of such a request, since the percent
+encoding operates on a byte level, so it is usually assumed that it
+is the same as the encoding the page containing the form was submitted
+in. You'll run into very few problems if you only use characters in
+the character encoding you chose.</p>
+
+<p>However, once you start adding characters outside of your encoding
+(and this is a lot more common than you may think: take curly
+&quot;smart&quot; quotes from Microsoft as an example),
+a whole manner of strange things start to happen. Depending on the
+browser you're using, they might:</p>
+
+<ul>
+    <li>Replace the unsupported characters with useless question marks,</li>
+    <li>Attempt to fix the characters (example: smart quotes to regular quotes),</li>
+    <li>Replace the character with a character entity, or</li>
+    <li>Send it anyway as a different character encoding mixed in
+        with the original encoding (usually Windows-1252 rather than
+        iso-8859-1 or UTF-8 interspersed in 8-bit)</li>
+</ul>
+
+<p>To properly guard against these behaviors, you'd have to sniff out
+the browser agent, compile a database of different behaviors, and
+take appropriate conversion action against the string (disregarding
+a spate of extremely mysterious, random and devastating bugs Internet
+Explorer manifests every once in a while). Or you could
+use UTF-8 and rest easy knowing that none of this could possibly happen
+since UTF-8 supports every character.</p>
+
+<h4 id="whyutf8-forms-multipart"><code>multipart/form-data</code></h4>
+
+<p>Multipart form submission takes a way a lot of the ambiguity
+that percent-encoding had: the server now can explicitly ask for
+certain encodings, and the client can explicitly tell the server
+during the form submission what encoding the fields are in.</p>
+
+<p>There are two ways you go with this functionality: leave it
+unset and have the browser send in the same encoding as the page,
+or set it to UTF-8 and then do another conversion server-side.
+Each method has deficiencies, especially the former.</p>
+
+<p>If you tell the browser to send the form in the same encoding as
+the page, you still have the trouble of what to do with characters
+that are outside of the character encoding's range. The behavior, once
+again, varies: Firefox 2.0 entity-izes them while Internet Explorer
+7.0 mangles them beyond intelligibility. For serious I18N purposes,
+this is not an option.</p>
+
+<p>The other possibility is to set Accept-Encoding to UTF-8, which
+begs the question: Why aren't you using UTF-8 for everything then?
+This route is more palatable, but there's a notable caveat: your data
+will come in as UTF-8, so you will have to explicitly convert it into
+your favored local character encoding.</p>
+
+<p>I object to this approach on idealogical grounds: you're
+digging yourself deeper into
+the hole when you could have been converting to UTF-8
+instead. And, of course, you can't use this method for GET requests.</p>
+
+<h3 id="whyutf8-support">Well supported</h3>
+
+<h3 id="whyutf8-htmlpurifier">HTML Purifier</h3>
+
+<h2 id="migrate">Migrate to UTF-8</h2>
+
+<h3 id="migrate-editor">Text editor</h3>
+
+<h3 id="migrate-db">Configuring your database</h3>
+
+<h3 id="migrate-convert">Convert old text</h3>
+
+<h3 id="migrate-bom">Byte Order Mark (headers already sent!)</h3>
+
+<h3 id="migrate-variablewidth">Dealing with variable width in functions</h3>
+
+<h2 id="externallinks">Further Reading</h2>
+
+<p>Many other developers have already discussed the subject of Unicode,
+UTF-8 and internationalization, and I would like to defer to them for
+a more in-depth look into character sets and encodings.</p>
+
+<ul>
+    <li><a href="http://www.joelonsoftware.com/articles/Unicode.html">
+        The Absolute Minimum Every Software Developer Absolutely,
+        Positively Must Know About Unicode and Character Sets
+        (No Excuses!)</a> by Joel Spolsky, provides a <em>very</em>
+        good high-level look at Unicode and character sets in general.</li>
+    <li><a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8 on Wikipedia</a>,
+        provides a lot of useful details into the innards of UTF-8, although
+        it may be a little off-putting to people who don't know much
+        about Unicode to begin with.</li>
+</ul>
+
+</body>
+</html>
--- a/docs/examples/basic.php
+++ b/docs/examples/basic.php
@ -1,15 +1,14 @@
-<?php
+<?php exit;

 // This file demonstrates basic usage of HTMLPurifier.

-exit; // not to be called directly, it will fail fantastically!
-
-set_include_path('/path/to/htmlpurifier/library' . PATH_SEPARATOR . get_include_path());
-require_once 'HTMLPurifier.php';
+require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';

 $purifier = new HTMLPurifier();
 $html = '<b>Simple and short';

 $pure_html = $purifier->purify($html);

+echo $pure_html;
+
 ?>
--- a/docs/index.html
+++ b/docs/index.html
@ -28,6 +28,9 @@ information for casual developers using HTML Purifier.</p>
 <dt><a href="enduser-youtube.html">Embedding YouTube videos</a></dt>
 <dd>Explains how to safely allow the embedding of flash from trusted sites.</dd>

+<dt><a href="enduser-slow.html">Speeding up HTML Purifier</a></dt>
+<dd>Explains how to speed up HTML Purifier through caching or inbound filtering.</dd>
+
 </dl>

 <h2>Development</h2>
--- a/docs/style.css
+++ b/docs/style.css
@ -23,6 +23,8 @@ h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }

 /* Marks off asides, discussions on why something is the way it is */
 .aside {margin-left:2em; font-family:sans-serif; font-size:0.9em; }
+blockquote .label {font-weight:bold; font-size:1em; margin:0 0 .1em;
+    border-bottom:1px solid #CCC;}

 /* A regular table */
 .table {border-collapse:collapse; border-bottom:2px solid #888; margin-left:2em; }
@ -37,4 +39,4 @@ h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }
 #index {font-size:smaller; }

 /* Contains, without exception, $Id$, for SVN version info. */
-#version {text-align:right; font-style:italic; margin:2em 0;}
+#version {text-align:right; font-style:italic; margin:2em 0;}
--- a/library/HTMLPurifier.func.php
+++ b/library/HTMLPurifier.func.php
@ -6,12 +6,12 @@
 *       this is efficient for instances when you only use HTML Purifier
 *       on a few of your pages, it murders bytecode caching. You still
 *       need to add HTML Purifier to your path.
+ * @note ''HTMLPurifier()'' is NOT the same as ''new HTMLPurifier()''
 */

 function HTMLPurifier($html, $config = null) {
    static $purifier = false;
    if (!$purifier) {
-        $init = true;
        require_once 'HTMLPurifier.php';
        $purifier = new HTMLPurifier();
    }
--- a/library/HTMLPurifier.php
+++ b/library/HTMLPurifier.php
@ -109,7 +109,7 @@ class HTMLPurifier
        
        $config = $config ? HTMLPurifier_Config::create($config) : $this->config;
        
-        $context =& new HTMLPurifier_Context();
+        $context = new HTMLPurifier_Context();
        $html = $this->encoder->convertToUTF8($html, $config, $context);
        
        // purified HTML
--- a/library/HTMLPurifier/AttrDef/CSS.php
+++ b/library/HTMLPurifier/AttrDef/CSS.php
@ -8,6 +8,11 @@ require_once 'HTMLPurifier/CSSDefinition.php';
 * @note We don't implement the whole CSS specification, so it might be
 *       difficult to reuse this component in the context of validating
 *       actual stylesheet declarations.
+ * @note If we were really serious about validating the CSS, we would
+ *       tokenize the styles and then parse the tokens. Obviously, we
+ *       are not doing that. Doing that could seriously harm performance,
+ *       but would make these components a lot more viable for a CSS
+ *       filtering solution.
 */
 class HTMLPurifier_AttrDef_CSS extends HTMLPurifier_AttrDef
 {
@ -20,6 +25,9 @@ class HTMLPurifier_AttrDef_CSS extends HTMLPurifier_AttrDef
        
        // we're going to break the spec and explode by semicolons.
        // This is because semicolon rarely appears in escaped form
+        // Doing this is generally flaky but fast
+        // IT MIGHT APPEAR IN URIs, see HTMLPurifier_AttrDef_CSSURI
+        // for details
        
        $declarations = explode(';', $css);
        $propvalues = array();
--- a/library/HTMLPurifier/AttrDef/CSSURI.php
+++ b/library/HTMLPurifier/AttrDef/CSSURI.php
@ -0,0 +1,58 @@
+<?php
+
+require_once 'HTMLPurifier/AttrDef/URI.php';
+
+/**
+ * Validates a URI in CSS syntax, which uses url('http://example.com')
+ * @note While theoretically speaking we a URI in a CSS document could
+ *       be non-embedded, as of CSS2 there is no such usage so we're
+ *       generalizing it. This may need to be changed in the future.
+ * @warning Since HTMLPurifier_AttrDef_CSS blindly uses semicolons as
+ *          the separator, you cannot put a literal semicolon in
+ *          in the URI. Try percent encoding it, in that case.
+ */
+class HTMLPurifier_AttrDef_CSSURI extends HTMLPurifier_AttrDef_URI
+{
+    
+    function HTMLPurifier_AttrDef_CSSURI() {
+        $this->HTMLPurifier_AttrDef_URI(true); // always embedded
+    }
+    
+    function validate($uri_string, $config, &$context) {
+        // parse the URI out of the string and then pass it onto
+        // the parent object
+        
+        $uri_string = $this->parseCDATA($uri_string);
+        if (strpos($uri_string, 'url(') !== 0) return false;
+        $uri_string = substr($uri_string, 4);
+        $new_length = strlen($uri_string) - 1;
+        if ($uri_string[$new_length] != ')') return false;
+        $uri = trim(substr($uri_string, 0, $new_length));
+        
+        if (isset($uri[0]) && ($uri[0] == "'" || $uri[0] == '"')) {
+            $quote = $uri[0];
+            $new_length = strlen($uri) - 1;
+            if ($uri[$new_length] !== $quote) return false;
+            $uri = substr($uri, 1, $new_length - 1);
+        }
+        
+        $keys   = array(  '(',   ')',   ',',   ' ',   '"',   "'");
+        $values = array('\\(', '\\)', '\\,', '\\ ', '\\"', "\\'");
+        $uri = str_replace($values, $keys, $uri);
+        
+        $result = parent::validate($uri, $config, $context);
+        
+        if ($result === false) return false;
+        
+        // escape necessary characters according to CSS spec
+        // except for the comma, none of these should appear in the
+        // URI at all
+        $result = str_replace($keys, $values, $result);
+        
+        return "url($result)";
+        
+    }
+    
+}
+
+?>
--- a/library/HTMLPurifier/AttrDef/ListStyle.php
+++ b/library/HTMLPurifier/AttrDef/ListStyle.php
@ -4,8 +4,7 @@ require_once 'HTMLPurifier/AttrDef.php';

 /**
 * Validates shorthand CSS property list-style.
- * @note This currently does not support list-style-image, as that functionality
- *       is not implemented yet elsewhere.
+ * @warning Does not support url tokens that have internal spaces.
 */
 class HTMLPurifier_AttrDef_ListStyle extends HTMLPurifier_AttrDef
 {
@ -20,6 +19,7 @@ class HTMLPurifier_AttrDef_ListStyle extends HTMLPurifier_AttrDef
        $def = $config->getCSSDefinition();
        $this->info['list-style-type']     = $def->info['list-style-type'];
        $this->info['list-style-position'] = $def->info['list-style-position'];
+        $this->info['list-style-image'] = $def->info['list-style-image'];
    }
    
    function validate($string, $config, &$context) {
@ -28,48 +28,49 @@ class HTMLPurifier_AttrDef_ListStyle extends HTMLPurifier_AttrDef
        $string = $this->parseCDATA($string);
        if ($string === '') return false;
        
+        // assumes URI doesn't have spaces in it
        $bits = explode(' ', strtolower($string)); // bits to process
        
-        $caught_type = false;
-        $caught_position = false;
-        $caught_none = false; // as in keyword none, which is in all of them
+        $caught = array();
+        $caught['type']     = false;
+        $caught['position'] = false;
+        $caught['image']    = false;
        
-        $ret = '';
+        $i = 0; // number of catches
+        $none = false;
        
        foreach ($bits as $bit) {
-            if ($caught_none && ($caught_type || $caught_position)) break;
-            if ($caught_type && $caught_position) break;
-            
+            if ($i >= 3) return; // optimization bit
            if ($bit === '') continue;
-            
-            if ($bit === 'none') {
-                if ($caught_none) continue;
-                $caught_none = true;
-                $ret .= 'none ';
-                continue;
-            }
-            
-            // if we add anymore, roll it into a loop
-            
-            $r = $this->info['list-style-type']->validate($bit, $config, $context);
-            if ($r !== false) {
-                if ($caught_type) continue;
-                $caught_type = true;
-                $ret .= $r . ' ';
-                continue;
-            }
-            
-            $r = $this->info['list-style-position']->validate($bit, $config, $context);
-            if ($r !== false) {
-                if ($caught_position) continue;
-                $caught_position = true;
-                $ret .= $r . ' ';
-                continue;
+            foreach ($caught as $key => $status) {
+                if ($status !== false) continue;
+                $r = $this->info['list-style-' . $key]->validate($bit, $config, $context);
+                if ($r === false) continue;
+                if ($r === 'none') {
+                    if ($none) continue;
+                    else $none = true;
+                    if ($key == 'image') continue;
+                }
+                $caught[$key] = $r;
+                $i++;
            }
        }
        
-        $ret = rtrim($ret);
-        return $ret ? $ret : false;
+        if (!$i) return false;
+        
+        $ret = array();
+        
+        // construct type
+        if ($caught['type']) $ret[] = $caught['type'];
+        
+        // construct image
+        if ($caught['image']) $ret[] = $caught['image'];
+        
+        // construct position
+        if ($caught['position']) $ret[] = $caught['position'];
+        
+        if (empty($ret)) return false;
+        return implode(' ', $ret);
        
    }
    
--- a/library/HTMLPurifier/AttrDef/URI.php
+++ b/library/HTMLPurifier/AttrDef/URI.php
@ -139,10 +139,10 @@ class HTMLPurifier_AttrDef_URI extends HTMLPurifier_AttrDef
            // no need to validate the scheme's fmt since we do that when we
            // retrieve the specific scheme object from the registry
            $scheme = ctype_lower($scheme) ? $scheme : strtolower($scheme);
-            $scheme_obj =& $registry->getScheme($scheme, $config, $context);
+            $scheme_obj = $registry->getScheme($scheme, $config, $context);
            if (!$scheme_obj) return false; // invalid scheme, clean it out
        } else {
-            $scheme_obj =& $registry->getScheme(
+            $scheme_obj = $registry->getScheme(
                $config->get('URI', 'DefaultScheme'), $config, $context
            );
        }
--- a/library/HTMLPurifier/AttrTransform/BdoDir.php
+++ b/library/HTMLPurifier/AttrTransform/BdoDir.php
@ -20,7 +20,7 @@ HTMLPurifier_ConfigSchema::defineAllowedValues(
 class HTMLPurifier_AttrTransform_BdoDir extends HTMLPurifier_AttrTransform
 {
    
-    function transform($attr, $config, $context) {
+    function transform($attr, $config, &$context) {
        if (isset($attr['dir'])) return $attr;
        $attr['dir'] = $config->get('Attr', 'DefaultTextDir');
        return $attr;
--- a/library/HTMLPurifier/AttrTransform/ImgRequired.php
+++ b/library/HTMLPurifier/AttrTransform/ImgRequired.php
@ -25,7 +25,7 @@ HTMLPurifier_ConfigSchema::define(
 class HTMLPurifier_AttrTransform_ImgRequired extends HTMLPurifier_AttrTransform
 {
    
-    function transform($attr, $config, $context) {
+    function transform($attr, $config, &$context) {
        
        $src = true;
        if (!isset($attr['src'])) {
--- a/library/HTMLPurifier/AttrTransform/Lang.php
+++ b/library/HTMLPurifier/AttrTransform/Lang.php
@ -10,7 +10,7 @@ require_once 'HTMLPurifier/AttrTransform.php';
 class HTMLPurifier_AttrTransform_Lang extends HTMLPurifier_AttrTransform
 {
    
-    function transform($attr, $config, $context) {
+    function transform($attr, $config, &$context) {
        
        $lang     = isset($attr['lang']) ? $attr['lang'] : false;
        $xml_lang = isset($attr['xml:lang']) ? $attr['xml:lang'] : false;
--- a/library/HTMLPurifier/AttrTransform/TextAlign.php
+++ b/library/HTMLPurifier/AttrTransform/TextAlign.php
@ -8,7 +8,7 @@ require_once 'HTMLPurifier/AttrTransform.php';
 class HTMLPurifier_AttrTransform_TextAlign
    extends HTMLPurifier_AttrTransform {

-    function transform($attr, $config, $context) {
+    function transform($attr, $config, &$context) {
        
        if (!isset($attr['align'])) return $attr;
        
--- a/library/HTMLPurifier/CSSDefinition.php
+++ b/library/HTMLPurifier/CSSDefinition.php
@ -11,6 +11,7 @@ require_once 'HTMLPurifier/AttrDef/FontFamily.php';
 require_once 'HTMLPurifier/AttrDef/Font.php';
 require_once 'HTMLPurifier/AttrDef/Border.php';
 require_once 'HTMLPurifier/AttrDef/ListStyle.php';
+require_once 'HTMLPurifier/AttrDef/CSSURI.php';

 /**
 * Defines allowed CSS attributes and what their values are.
@ -51,11 +52,19 @@ class HTMLPurifier_CSSDefinition
        $this->info['font-variant'] = new HTMLPurifier_AttrDef_Enum(
            array('normal', 'small-caps'), false);
        
+        $uri_or_none = new HTMLPurifier_AttrDef_Composite(
+            array(
+                new HTMLPurifier_AttrDef_Enum(array('none')),
+                new HTMLPurifier_AttrDef_CSSURI()
+            )
+        );
+        
        $this->info['list-style-position'] = new HTMLPurifier_AttrDef_Enum(
            array('inside', 'outside'), false);
        $this->info['list-style-type'] = new HTMLPurifier_AttrDef_Enum(
            array('disc', 'circle', 'square', 'decimal', 'lower-roman',
-            'upper-roman', 'lower-alpha', 'upper-alpha'), false);
+            'upper-roman', 'lower-alpha', 'upper-alpha', 'none'), false);
+        $this->info['list-style-image'] = $uri_or_none;
        
        $this->info['list-style'] = new HTMLPurifier_AttrDef_ListStyle($config);
        
@ -63,13 +72,15 @@ class HTMLPurifier_CSSDefinition
            array('capitalize', 'uppercase', 'lowercase', 'none'), false);
        $this->info['color'] = new HTMLPurifier_AttrDef_Color();
        
-        // technically speaking, this one should get its own validator, but
-        // since we don't support background images, it effectively is
-        // equivalent to color.  The only trouble is that if the author
-        // specifies an image and a color, they'll both end up getting dropped,
-        // even though we ought to implement it and just discard the image
-        // info.  This will be fixed in a later version (see TODO) when
-        // better URI filtering is implemented.
+        $this->info['background-image'] = $uri_or_none;
+        $this->info['background-repeat'] = new HTMLPurifier_AttrDef_Enum(
+            array('repeat', 'repeat-x', 'repeat-y', 'no-repeat')
+        );
+        $this->info['background-attachment'] = new HTMLPurifier_AttrDef_Enum(
+            array('scroll', 'fixed')
+        );
+        
+        // pending its own validator as a shorthand
        $this->info['background'] = 
        
        $border_color = 
--- a/library/HTMLPurifier/Config.php
+++ b/library/HTMLPurifier/Config.php
@ -51,8 +51,8 @@ class HTMLPurifier_Config
     *                      an array of directives based on loadArray().
     * @return Configured HTMLPurifier_Config object
     */
-    function create($config) {
-        if (is_a($config, 'HTMLPurifier_Config')) return $config;
+    static function create($config) {
+        if ($config instanceof HTMLPurifier_Config) return $config;
        $ret = HTMLPurifier_Config::createDefault();
        if (is_array($config)) $ret->loadArray($config);
        return $ret;
@ -62,7 +62,7 @@ class HTMLPurifier_Config
     * Convenience constructor that creates a default configuration object.
     * @return Default HTMLPurifier_Config object.
     */
-    function createDefault() {
+    static function createDefault() {
        $definition =& HTMLPurifier_ConfigSchema::instance();
        $config = new HTMLPurifier_Config($definition);
        return $config;
--- a/library/HTMLPurifier/ConfigSchema.php
+++ b/library/HTMLPurifier/ConfigSchema.php
@ -68,7 +68,7 @@ class HTMLPurifier_ConfigSchema {
    /**
     * Retrieves an instance of the application-wide configuration definition.
     */
-    function &instance($prototype = null) {
+    static function &instance($prototype = null) {
        static $instance;
        if ($prototype !== null) {
            $instance = $prototype;
@ -89,7 +89,7 @@ class HTMLPurifier_ConfigSchema {
     *      HTMLPurifier_DirectiveDef::$type for allowed values
     * @param $description Description of directive for documentation
     */
-    function define(
+    static function define(
        $namespace, $name, $default, $type, 
        $description
    ) {
@ -147,7 +147,7 @@ class HTMLPurifier_ConfigSchema {
     * @param $namespace Namespace's name
     * @param $description Description of the namespace
     */
-    function defineNamespace($namespace, $description) {
+    static function defineNamespace($namespace, $description) {
        $def =& HTMLPurifier_ConfigSchema::instance();
        if (isset($def->info[$namespace])) {
            trigger_error('Cannot redefine namespace', E_USER_ERROR);
@ -174,7 +174,7 @@ class HTMLPurifier_ConfigSchema {
     * @param $alias Name of aliased value
     * @param $real Value aliased value will be converted into
     */
-    function defineValueAliases($namespace, $name, $aliases) {
+    static function defineValueAliases($namespace, $name, $aliases) {
        $def =& HTMLPurifier_ConfigSchema::instance();
        if (!isset($def->info[$namespace][$name])) {
            trigger_error('Cannot set value alias for non-existant directive',
@ -204,7 +204,7 @@ class HTMLPurifier_ConfigSchema {
     * @param $name Name of directive
     * @param $allowed_values Arraylist of allowed values
     */
-    function defineAllowedValues($namespace, $name, $allowed_values) {
+    static function defineAllowedValues($namespace, $name, $allowed_values) {
        $def =& HTMLPurifier_ConfigSchema::instance();
        if (!isset($def->info[$namespace][$name])) {
            trigger_error('Cannot define allowed values for undefined directive',
@ -305,7 +305,7 @@ class HTMLPurifier_ConfigSchema {
     */
    function isError($var) {
        if (!is_object($var)) return false;
-        if (!is_a($var, 'HTMLPurifier_Error')) return false;
+        if (!($var instanceof HTMLPurifier_Error)) return false;
        return true;
    }
 }
--- a/library/HTMLPurifier/Encoder.php
+++ b/library/HTMLPurifier/Encoder.php
@ -67,7 +67,7 @@ class HTMLPurifier_Encoder
     *       would need that, and I'm probably not going to implement them.
     *       Once again, PHP 6 should solve all our problems.
     */
-    function cleanUTF8($str, $force_php = false) {
+    static function cleanUTF8($str, $force_php = false) {
        
        static $non_sgml_chars = array();
        if (empty($non_sgml_chars)) {
@ -249,7 +249,7 @@ class HTMLPurifier_Encoder
    // | 00000000 | 00010000 | 11111111 | 11111111 | Defined upper limit of legal scalar codes
    // +----------+----------+----------+----------+ 
    
-    function unichr($code) {
+    static function unichr($code) {
        if($code > 1114111 or $code < 0 or
          ($code >= 55296 and $code <= 57343) ) {
            // bits are set outside the "valid" range as defined
--- a/library/HTMLPurifier/EntityLookup.php
+++ b/library/HTMLPurifier/EntityLookup.php
@ -28,7 +28,7 @@ class HTMLPurifier_EntityLookup {
     * Retrieves sole instance of the object.
     * @param Optional prototype of custom lookup table to overload with.
     */
-    function instance($prototype = false) {
+    static function instance($prototype = false) {
        // no references, since PHP doesn't copy unless modified
        static $instance = null;
        if ($prototype) {
--- a/library/HTMLPurifier/HTMLDefinition.php
+++ b/library/HTMLPurifier/HTMLDefinition.php
@ -300,9 +300,6 @@ class HTMLPurifier_HTMLDefinition
        $this->info['b']->child    =
        $this->info['big']->child  =
        $this->info['small']->child=
-        $this->info['u']->child    =
-        $this->info['s']->child    =
-        $this->info['strike']->child    =
        $this->info['bdo']->child  =
        $this->info['span']->child =
        $this->info['dt']->child   =
@ -314,6 +311,12 @@ class HTMLPurifier_HTMLDefinition
        $this->info['h5']->child   = 
        $this->info['h6']->child   = $e_Inline;
        
+        if (!$this->strict) {
+            $this->info['u']->child    =
+            $this->info['s']->child    =
+            $this->info['strike']->child    = $e_Inline;
+        }
+        
        // the only three required definitions, besides custom table code
        $this->info['ol']->child   =
        $this->info['ul']->child   = new HTMLPurifier_ChildDef_Required('li');
@ -355,10 +358,12 @@ class HTMLPurifier_HTMLDefinition
        // reuses $e_Inline and $e_Block
        foreach ($e_Inline->elements as $name => $bool) {
            if ($name == '#PCDATA') continue;
+            if (!isset($this->info[$name])) continue;
            $this->info[$name]->type = 'inline';
        }
        
        foreach ($e_Block->elements as $name => $bool) {
+            if (!isset($this->info[$name])) continue;
            $this->info[$name]->type = 'block';
        }
        
@ -531,7 +536,7 @@ class HTMLPurifier_HTMLDefinition
        
        // protect against stdclasses floating around
        foreach ($this->info as $key => $obj) {
-            if (is_a($obj, 'stdclass')) {
+            if ($obj instanceof stdClass) {
                unset($this->info[$key]);
            }
        }
--- a/library/HTMLPurifier/Lexer.php
+++ b/library/HTMLPurifier/Lexer.php
@ -145,7 +145,7 @@ class HTMLPurifier_Lexer
     * @param $prototype Optional prototype lexer.
     * @return Concrete lexer.
     */
-    function create($prototype = null) {
+    static function create($prototype = null) {
        // we don't really care if it's a reference or a copy
        static $lexer = null;
        if ($prototype) {
@ -170,7 +170,7 @@ class HTMLPurifier_Lexer
     * @param $string HTML string to process.
     * @returns HTML with CDATA sections escaped.
     */
-    function escapeCDATA($string) {
+    static function escapeCDATA($string) {
        return preg_replace_callback(
            '/<!\[CDATA\[(.+?)\]\]>/',
            array('HTMLPurifier_Lexer', 'CDATACallback'),
@ -187,7 +187,7 @@ class HTMLPurifier_Lexer
     *                  and 1 the inside of the CDATA section.
     * @returns Escaped internals of the CDATA section.
     */
-    function CDATACallback($matches) {
+    static function CDATACallback($matches) {
        // not exactly sure why the character set is needed, but whatever
        return htmlspecialchars($matches[1], ENT_COMPAT, 'UTF-8');
    }
--- a/library/HTMLPurifier/Lexer/DOMLex.php
+++ b/library/HTMLPurifier/Lexer/DOMLex.php
@ -88,6 +88,11 @@ class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer
        } elseif ($node->nodeType === XML_COMMENT_NODE) {
            $tokens[] = $this->factory->createComment($node->data);
            return;
+        } elseif (
+            // not-well tested: there may be other nodes we have to grab
+            $node->nodeType !== XML_ELEMENT_NODE
+        ) {
+            return;
        }
        
        $attr = $node->hasAttributes() ?
--- a/library/HTMLPurifier/Lexer/PEARSax3.php
+++ b/library/HTMLPurifier/Lexer/PEARSax3.php
@ -37,7 +37,7 @@ class HTMLPurifier_Lexer_PEARSax3 extends HTMLPurifier_Lexer
        
        $string = $this->normalize($string, $config, $context);
        
-        $parser=& new XML_HTMLSax3();
+        $parser= new XML_HTMLSax3();
        $parser->set_object($this);
        $parser->set_element_handler('openHandler','closeHandler');
        $parser->set_data_handler('dataHandler');
--- a/library/HTMLPurifier/Printer/HTMLDefinition.php
+++ b/library/HTMLPurifier/Printer/HTMLDefinition.php
@ -10,10 +10,10 @@ class HTMLPurifier_Printer_HTMLDefinition extends HTMLPurifier_Printer
     */
    var $def;
    
-    function render(&$config) {
+    function render($config) {
        $ret = '';
        $this->config =& $config;
-        $this->def =& $config->getHTMLDefinition();
+        $this->def = $config->getHTMLDefinition();
        $def =& $this->def;
        
        $ret .= $this->start('div', array('class' => 'HTMLPurifier_Printer'));
--- a/library/HTMLPurifier/URISchemeRegistry.php
+++ b/library/HTMLPurifier/URISchemeRegistry.php
@ -37,7 +37,7 @@ class HTMLPurifier_URISchemeRegistry
     * @note Pass a registry object $prototype with a compatible interface and
     *       the function will copy it and return it all further times.
     */
-    function &instance($prototype = null) {
+    static function &instance($prototype = null) {
        static $instance = null;
        if ($prototype !== null) {
            $instance = $prototype;
--- a/smoketests/common.php
+++ b/smoketests/common.php
@ -3,6 +3,7 @@
 header('Content-type: text/html; charset=UTF-8');

 require_once '../library/HTMLPurifier.auto.php';
+error_reporting(E_ALL | E_STRICT);

 function escapeHTML($string) {
    $string = HTMLPurifier_Encoder::cleanUTF8($string);
--- a/smoketests/printDefinition.php
+++ b/smoketests/printDefinition.php
@ -54,11 +54,15 @@ echo '<?xml version="1.0" encoding="UTF-8" ?>';
    </script>
 </head>
 <body>
+
 <h1>HTML Purifier Printer Smoketest</h1>
-<p>This page will allow you to see precisely what HTML Purifier's internal
+
+<p>HTML Purifier claims to have a robust yet permissive whitelist: this
+page will allow you to see precisely what HTML Purifier's internal
 whitelist is. You can
 also twiddle with the configuration settings to see how a directive
 influences the internal workings of the definition objects.</p>
+
 <h2>Modify configuration</h2>

 <p>You can specify an array by typing in a comma-separated
--- a/smoketests/utf8.php
+++ b/smoketests/utf8.php
@ -1,5 +1,7 @@
 <?php

+// this file is encoded in UTF-8, please don't let your editor mangle it
+
 require_once 'common.php';

 echo '<?xml version="1.0" encoding="UTF-8" ?>';
--- a/smoketests/xssAttacks.xml
+++ b/smoketests/xssAttacks.xml
@ -978,8 +978,6 @@ alert(a.source)&lt;/SCRIPT&gt;</code>

 -onErrorUpdate() (fires on a databound object when an error occurs while updating the associated data in the data source object)

-onExit() (fires when someone clicks on a link or presses the back button)
-
 -onFilterChange() (fires when a visual filter completes state change)

 -onFinish() (attacker could create the exploit when marquee is finished looping)
--- a/tests/Debugger.php
+++ b/tests/Debugger.php
@ -70,7 +70,7 @@ class Debugger
        $this->add_pre = !extension_loaded('xdebug');
    }
    
-    function &instance() {
+    static function &instance() {
        static $soleInstance = false;
        if (!$soleInstance) $soleInstance = new Debugger();
        return $soleInstance;
--- a/tests/HTMLPurifier/AttrDef/CSSTest.php
+++ b/tests/HTMLPurifier/AttrDef/CSSTest.php
@ -1,6 +1,7 @@
 <?php

 require_once 'HTMLPurifier/AttrDef/CSS.php';
+require_once 'HTMLPurifier/AttrDefHarness.php';

 class HTMLPurifier_AttrDef_CSSTest extends HTMLPurifier_AttrDefHarness
 {
@ -71,6 +72,12 @@ class HTMLPurifier_AttrDef_CSSTest extends HTMLPurifier_AttrDefHarness
        $this->assertDef('vertical-align:12px;');
        $this->assertDef('vertical-align:50%;');
        $this->assertDef('table-layout:fixed;');
+        $this->assertDef('list-style-image:url(nice.jpg);');
+        $this->assertDef('list-style:disc url(nice.jpg) inside;');
+        $this->assertDef('background-image:url(foo.jpg);');
+        $this->assertDef('background-image:none;');
+        $this->assertDef('background-repeat:repeat-y;');
+        $this->assertDef('background-attachment:fixed;');
        
        // duplicates
        $this->assertDef('text-align:right;text-align:left;',
--- a/tests/HTMLPurifier/AttrDef/CSSURITest.php
+++ b/tests/HTMLPurifier/AttrDef/CSSURITest.php
@ -0,0 +1,37 @@
+<?php
+
+require_once 'HTMLPurifier/AttrDef/CSSURI.php';
+require_once 'HTMLPurifier/AttrDefHarness.php';
+
+class HTMLPurifier_AttrDef_CSSURITest extends HTMLPurifier_AttrDefHarness
+{
+    
+    function test() {
+        
+        $this->def = new HTMLPurifier_AttrDef_CSSURI();
+        
+        $this->assertDef('', false);
+        
+        // we could be nice but we won't be
+        $this->assertDef('http://www.example.com/', false);
+        
+        // no quotes are used, since that's the most widely supported
+        // syntax
+        $this->assertDef('url(', false);
+        $this->assertDef('url()', true);
+        $result = "url(http://www.example.com/)";
+        $this->assertDef('url(http://www.example.com/)', $result);
+        $this->assertDef('url("http://www.example.com/")', $result);
+        $this->assertDef("url('http://www.example.com/')", $result);
+        $this->assertDef(
+            '  url(  "http://www.example.com/" )   ', $result);
+        
+        // escaping
+        $this->assertDef("url(http://www.example.com/foo,bar\))", 
+            "url(http://www.example.com/foo\,bar\))");
+        
+    }
+    
+}
+
+?>
--- a/tests/HTMLPurifier/AttrDef/CompositeTest.php
+++ b/tests/HTMLPurifier/AttrDef/CompositeTest.php
@ -28,10 +28,10 @@ class HTMLPurifier_AttrDef_CompositeTest extends HTMLPurifier_AttrDefHarness
        // first test: value properly validates on first definition
        // so second def is never called
        
-        $def1 =& new HTMLPurifier_AttrDefMock($this);
-        $def2 =& new HTMLPurifier_AttrDefMock($this);
+        $def1 = new HTMLPurifier_AttrDefMock($this);
+        $def2 = new HTMLPurifier_AttrDefMock($this);
        $defs = array(&$def1, &$def2);
-        $def =& new HTMLPurifier_AttrDef_Composite_Testable($defs);
+        $def = new HTMLPurifier_AttrDef_Composite_Testable($defs);
        $input = 'FOOBAR';
        $output = 'foobar';
        $def1_params = array($input, $config, $context);
@ -47,10 +47,10 @@ class HTMLPurifier_AttrDef_CompositeTest extends HTMLPurifier_AttrDefHarness
        
        // second test, first def fails, second def works
        
-        $def1 =& new HTMLPurifier_AttrDefMock($this);
-        $def2 =& new HTMLPurifier_AttrDefMock($this);
+        $def1 = new HTMLPurifier_AttrDefMock($this);
+        $def2 = new HTMLPurifier_AttrDefMock($this);
        $defs = array(&$def1, &$def2);
-        $def =& new HTMLPurifier_AttrDef_Composite_Testable($defs);
+        $def = new HTMLPurifier_AttrDef_Composite_Testable($defs);
        $input = 'BOOMA';
        $output = 'booma';
        $def_params = array($input, $config, $context);
@ -67,10 +67,10 @@ class HTMLPurifier_AttrDef_CompositeTest extends HTMLPurifier_AttrDefHarness
        
        // third test, all fail, so composite faiils
        
-        $def1 =& new HTMLPurifier_AttrDefMock($this);
-        $def2 =& new HTMLPurifier_AttrDefMock($this);
+        $def1 = new HTMLPurifier_AttrDefMock($this);
+        $def2 = new HTMLPurifier_AttrDefMock($this);
        $defs = array(&$def1, &$def2);
-        $def =& new HTMLPurifier_AttrDef_Composite_Testable($defs);
+        $def = new HTMLPurifier_AttrDef_Composite_Testable($defs);
        $input = 'BOOMA';
        $output = false;
        $def_params = array($input, $config, $context);
--- a/tests/HTMLPurifier/AttrDef/ListStyleTest.php
+++ b/tests/HTMLPurifier/AttrDef/ListStyleTest.php
@ -15,9 +15,20 @@ class HTMLPurifier_AttrDef_ListStyleTest extends HTMLPurifier_AttrDefHarness
        $this->assertDef('circle outside');
        $this->assertDef('inside');
        $this->assertDef('none');
+        $this->assertDef('url(foo.gif)');
+        $this->assertDef('circle url(foo.gif) inside');
        
+        // invalid values
        $this->assertDef('outside inside', 'outside');
+        
+        // ordering
+        $this->assertDef('url(foo.gif) none', 'none url(foo.gif)');
        $this->assertDef('circle lower-alpha', 'circle');
+        // the spec is ambiguous about what happens in these
+        // cases, so we're going off the W3C CSS validator
+        $this->assertDef('disc none', 'disc');
+        $this->assertDef('none disc', 'none');
+        
        
    }
    
--- a/tests/HTMLPurifier/AttrDef/URITest.php
+++ b/tests/HTMLPurifier/AttrDef/URITest.php
@ -206,7 +206,7 @@ class HTMLPurifier_AttrDef_URITest extends HTMLPurifier_AttrDefHarness
        $registry =& HTMLPurifier_URISchemeRegistry::instance($fake_registry);
        
        // now, let's add a pseudo-scheme to the registry
-        $this->scheme =& new HTMLPurifier_URISchemeMock($this);
+        $this->scheme = new HTMLPurifier_URISchemeMock($this);
        
        // here are the schemes we will support with overloaded mocks
        $registry->setReturnReference('getScheme', $this->scheme, array('http', $this->config, $this->context));
--- a/tests/HTMLPurifier/ContextTest.php
+++ b/tests/HTMLPurifier/ContextTest.php
@ -20,7 +20,7 @@ class HTMLPurifier_ContextTest extends UnitTestCase
        
        $this->assertFalse($this->context->exists('IDAccumulator'));
        
-        $accumulator =& new HTMLPurifier_IDAccumulatorMock($this);
+        $accumulator = new HTMLPurifier_IDAccumulatorMock($this);
        $this->context->register('IDAccumulator', $accumulator);
        $this->assertTrue($this->context->exists('IDAccumulator'));
        
--- a/tests/HTMLPurifier/LexerTest.php
+++ b/tests/HTMLPurifier/LexerTest.php
@ -16,7 +16,9 @@ class HTMLPurifier_LexerTest extends UnitTestCase
        
        $this->DirectLex = new HTMLPurifier_Lexer_DirectLex();
        
-        if ( $GLOBALS['HTMLPurifierTest']['PEAR'] ) {
+        if ( $GLOBALS['HTMLPurifierTest']['PEAR'] && 
+             ((error_reporting() & E_STRICT) != E_STRICT)
+        ) {
            $this->_has_pear = true;
            require_once 'HTMLPurifier/Lexer/PEARSax3.php';
            $this->PEARSax3  = new HTMLPurifier_Lexer_PEARSax3();
--- a/tests/index.php
+++ b/tests/index.php
@ -1,6 +1,6 @@
 <?php

-error_reporting(E_ALL);
+error_reporting(E_ALL | E_STRICT);

 // wishlist: automated calling of this file from multiple PHP versions so we
 // don't have to constantly switch around
@ -84,6 +84,7 @@ $test_files[] = 'AttrDef/FontTest.php';
 $test_files[] = 'AttrDef/BorderTest.php';
 $test_files[] = 'AttrDef/ListStyleTest.php';
 $test_files[] = 'AttrDef/Email/SimpleCheckTest.php';
+$test_files[] = 'AttrDef/CSSURITest.php';
 $test_files[] = 'IDAccumulatorTest.php';
 $test_files[] = 'TagTransformTest.php';
 $test_files[] = 'AttrTransform/LangTest.php';