0
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2025-01-24 06:11:52 +00:00

Merge in r649-656, prompted by changing two of Encoder's functions to static.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/branches/strict@657 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
Edward Z. Yang 2007-01-19 02:28:53 +00:00
parent 5395d8b4bd
commit 37ea1673dd
30 changed files with 173 additions and 29 deletions

2
NEWS
View File

@ -14,6 +14,8 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
! Implemented background-image, background-repeat and background-attachment
CSS properties. background shorthand property HAS NOT been extended
to allow these, and background-position IS NOT implemented yet.
! Configuration documentation looks nicer
! Added smoketest 'all.php', which loads all other smoketests via frames
. Implemented AttrDef_CSSURI for url(http://google.com) style declarations
1.3.3, unknown release date, likely to be dropped

View File

@ -1,3 +1,6 @@
body {margin:1em 4em;}
table {border-collapse:collapse;}
table td, table th {padding:0.2em;}
@ -8,3 +11,14 @@ table.constraints td pre {margin:0;}
#toc {list-style-type:none; font-weight:bold;}
#toc ul {list-style-type:disc; font-weight:normal;}
.description p {margin-top:0;margin-bottom:1em;}
#library, h1 {text-align:center; font-family:Garamond, serif;
font-variant:small-caps;}
#library {font-size:1em;}
h1 {margin-top:0;}
h2 {border-bottom:1px solid #CCC; font-family:sans-serif; font-weight:normal;
font-size:1.3em;}
h3 {font-family:sans-serif; font-size:1.1em; font-weight:bold; }
h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }

View File

@ -18,12 +18,13 @@
<xsl:template match="/">
<html lang="en" xml:lang="en">
<head>
<title><xsl:value-of select="/configdoc/title" /> Configuration Documentation</title>
<title>Configuration Documentation - <xsl:value-of select="/configdoc/title" /></title>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
<link rel="stylesheet" type="text/css" href="styles/plain.css" />
</head>
<body>
<h1><xsl:value-of select="/configdoc/title" /> Configuration Documentation</h1>
<div id="library"><xsl:value-of select="/configdoc/title" /></div>
<h1>Configuration Documentation</h1>
<h2>Table of Contents</h2>
<ul id="toc">
<xsl:apply-templates mode="toc" />

View File

@ -14,6 +14,7 @@
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<p>Okay, face it. Programmers can get lazy, cut corners, or make mistakes. They
also can do quick prototypes, and then forget to rewrite them later. Well,

View File

@ -14,6 +14,7 @@
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<p>The classes in this library follow a few naming conventions, which may
help you find the correct functionality more quickly. Here they are:</p>

View File

@ -14,6 +14,7 @@
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<p>Here are some possible optimization techniques we can apply to code sections if
they turn out to be slow. Be sure not to prematurely optimize: if you get

View File

@ -32,6 +32,7 @@ thead th {text-align:left;padding:0.1em;background-color:#EEE;}
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<h2>Key</h2>

View File

@ -15,6 +15,7 @@
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<p>Prior to HTML Purifier 1.2.0, this library blithely accepted user input that
looked like this:</p>

View File

@ -15,6 +15,7 @@
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<p>HTML Purifier is a very powerful library. But with power comes great
responsibility, in the form of longer execution times. Remember, this

View File

@ -23,6 +23,7 @@ own advice for sake of portability. -->
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<p>Character encoding and character sets, in truth, are not that
difficult to understand. But if you don't understand them, you are going
@ -587,8 +588,24 @@ instead. And, of course, you can't use this method for GET requests.</p>
<h3 id="whyutf8-support">Well supported</h3>
<p>Almost every modern browser in the wild today has full UTF-8 and Unicode
support: the number of troublesome cases can be counted with the
fingers of one hand, and these browsers usually have trouble with
other character encodings too. Problems users usually encounter stem
from the lack of appropriate fonts to display the characters (once
again, this applies to all character encodings and HTML entities) or
Internet Explorer's lack of intelligent font picking (which can be
worked around).</p>
<p>We will go into more detail about how to deal with edge cases in
the browser world in the Migration section, but rest assured that
converting to UTF-8, if done correctly, will not result in users
hounding you about broken pages.</p>
<h3 id="whyutf8-htmlpurifier">HTML Purifier</h3>
<p>And finally, we get to HTML Purifier.</p>
<h2 id="migrate">Migrate to UTF-8</h2>
<h3 id="migrate-editor">Text editor</h3>

View File

@ -15,6 +15,7 @@
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<p>Clients like their YouTube videos. It gives them a warm fuzzy feeling when
they see a neat little embedded video player on their websites that can play

View File

@ -13,7 +13,7 @@
<h1>Documentation</h1>
<p><strong>HTML Purifier</strong> has documentation for all types of people.
<p><strong><a href="http://hp.jpsband.org/">HTML Purifier</a></strong> has documentation for all types of people.
Here is an index of all of them.</p>
<h2>End-user</h2>

View File

@ -15,6 +15,7 @@
<div id="filing">Filed under Proposals</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<p>Your website probably has a color-scheme.
<span style="color:#090; background:#FFF;">Green on white</span>,

View File

@ -15,6 +15,7 @@
<div id="filing">Filed under Reference</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://hp.jpsband.org/">HTML Purifier</a> End-User Documentation</div>
<p>Many thanks to the DevNetwork community for answering questions,
theorizing about design, and offering encouragement during

View File

@ -38,5 +38,7 @@ blockquote .label {font-weight:bold; font-size:1em; margin:0 0 .1em;
/* Contains, without exception, Return to index. */
#index {font-size:smaller; }
#home {font-size:smaller;}
/* Contains, without exception, $Id$, for SVN version info. */
#version {text-align:right; font-style:italic; margin:2em 0;}

View File

@ -91,7 +91,6 @@ class HTMLPurifier
$this->lexer = HTMLPurifier_Lexer::create();
$this->strategy = new HTMLPurifier_Strategy_Core();
$this->generator = new HTMLPurifier_Generator();
$this->encoder = new HTMLPurifier_Encoder();
}
@ -110,7 +109,7 @@ class HTMLPurifier
$config = $config ? HTMLPurifier_Config::create($config) : $this->config;
$context = new HTMLPurifier_Context();
$html = $this->encoder->convertToUTF8($html, $config, $context);
$html = HTMLPurifier_Encoder::convertToUTF8($html, $config, $context);
// purified HTML
$html =
@ -127,7 +126,7 @@ class HTMLPurifier
$config, $context
);
$html = $this->encoder->convertFromUTF8($html, $config, $context);
$html = HTMLPurifier_Encoder::convertFromUTF8($html, $config, $context);
$this->context =& $context;
return $html;
}

View File

@ -46,6 +46,7 @@ class HTMLPurifier_Config
/**
* Convenience constructor that creates a config object based on a mixed var
* @static
* @param mixed $config Variable that defines the state of the config
* object. Can be: a HTMLPurifier_Config() object or
* an array of directives based on loadArray().
@ -60,6 +61,7 @@ class HTMLPurifier_Config
/**
* Convenience constructor that creates a default configuration object.
* @static
* @return Default HTMLPurifier_Config object.
*/
static function createDefault() {
@ -178,4 +180,4 @@ class HTMLPurifier_Config
}
?>
?>

View File

@ -67,6 +67,7 @@ class HTMLPurifier_ConfigSchema {
/**
* Retrieves an instance of the application-wide configuration definition.
* @static
*/
static function &instance($prototype = null) {
static $instance;
@ -81,6 +82,7 @@ class HTMLPurifier_ConfigSchema {
/**
* Defines a directive for configuration
* @static
* @warning Will fail of directive's namespace is defined
* @param $namespace Namespace the directive is in
* @param $name Key of directive
@ -144,6 +146,7 @@ class HTMLPurifier_ConfigSchema {
/**
* Defines a namespace for directives to be put into.
* @static
* @param $namespace Namespace's name
* @param $description Description of the namespace
*/
@ -169,6 +172,7 @@ class HTMLPurifier_ConfigSchema {
*
* Directive value aliases are convenient for developers because it lets
* them set a directive to several values and get the same result.
* @static
* @param $namespace Directive's namespace
* @param $name Name of Directive
* @param $alias Name of aliased value
@ -200,6 +204,7 @@ class HTMLPurifier_ConfigSchema {
/**
* Defines a set of allowed values for a directive.
* @static
* @param $namespace Namespace of directive
* @param $name Name of directive
* @param $allowed_values Arraylist of allowed values
@ -380,4 +385,4 @@ class HTMLPurifier_ConfigEntity_Directive extends HTMLPurifier_ConfigEntity
}
?>
?>

View File

@ -38,16 +38,25 @@ HTMLPurifier_ConfigSchema::define(
/**
* A UTF-8 specific character encoder that handles cleaning and transforming.
* @note All functions in this class should be static.
*/
class HTMLPurifier_Encoder
{
/**
* Constructor throws fatal error if you attempt to instantiate class
*/
function HTMLPurifier_Encoder() {
trigger_error('Cannot instantiate encoder, call methods statically', E_USER_ERROR);
}
/**
* Cleans a UTF-8 string for well-formedness and SGML validity
*
* It will parse according to UTF-8 and return a valid UTF8 string, with
* non-SGML codepoints excluded.
*
* @static
* @note Just for reference, the non-SGML code points are 0 to 31 and
* 127 to 159, inclusive. However, we allow code points 9, 10
* and 13, which are the tab, line feed and carriage return
@ -225,6 +234,7 @@ class HTMLPurifier_Encoder
/**
* Translates a Unicode codepoint into its corresponding UTF-8 character.
* @static
* @note Based on Feyd's function at
* <http://forums.devnetwork.net/viewtopic.php?p=191404#191404>,
* which is in public domain.
@ -288,8 +298,9 @@ class HTMLPurifier_Encoder
/**
* Converts a string to UTF-8 based on configuration.
* @static
*/
function convertToUTF8($str, $config, &$context) {
static function convertToUTF8($str, $config, &$context) {
static $iconv = null;
if ($iconv === null) $iconv = function_exists('iconv');
$encoding = $config->get('Core', 'Encoding');
@ -303,10 +314,11 @@ class HTMLPurifier_Encoder
/**
* Converts a string from UTF-8 based on configuration.
* @static
* @note Currently, this is a lossy conversion, with unexpressable
* characters being omitted.
*/
function convertFromUTF8($str, $config, &$context) {
static function convertFromUTF8($str, $config, &$context) {
static $iconv = null;
if ($iconv === null) $iconv = function_exists('iconv');
$encoding = $config->get('Core', 'Encoding');

View File

@ -26,6 +26,7 @@ class HTMLPurifier_EntityLookup {
/**
* Retrieves sole instance of the object.
* @static
* @param Optional prototype of custom lookup table to overload with.
*/
static function instance($prototype = false) {

View File

@ -653,4 +653,4 @@ class HTMLPurifier_ElementDef
}
?>
?>

View File

@ -56,7 +56,6 @@ class HTMLPurifier_Lexer
{
function HTMLPurifier_Lexer() {
$this->_encoder = new HTMLPurifier_Encoder();
$this->_entity_parser = new HTMLPurifier_EntityParser();
}
@ -114,8 +113,6 @@ class HTMLPurifier_Lexer
return $string;
}
var $_encoder;
/**
* Lexes an HTML string into tokens.
*
@ -138,6 +135,8 @@ class HTMLPurifier_Lexer
* default with your own implementation. A copy/reference of the prototype
* lexer will now be returned when you request a new lexer.
*
* @static
*
* @note
* Though it is possible to call this factory method from subclasses,
* such usage is not recommended.
@ -166,6 +165,7 @@ class HTMLPurifier_Lexer
/**
* Translates CDATA sections into regular sections (through escaping).
*
* @static
* @protected
* @param $string HTML string to process.
* @returns HTML with CDATA sections escaped.
@ -181,6 +181,7 @@ class HTMLPurifier_Lexer
/**
* Callback function for escapeCDATA() that does the work.
*
* @static
* @warning Though this is public in order to let the callback happen,
* calling it directly is not recommended.
* @params $matches PCRE matches array, with index 0 the entire match
@ -212,7 +213,7 @@ class HTMLPurifier_Lexer
// clean into wellformed UTF-8 string for an SGML context: this has
// to be done after entity expansion because the entities sometimes
// represent non-SGML characters (horror, horror!)
$html = $this->_encoder->cleanUTF8($html);
$html = HTMLPurifier_Encoder::cleanUTF8($html);
return $html;
}

View File

@ -37,7 +37,7 @@ class HTMLPurifier_Lexer_PEARSax3 extends HTMLPurifier_Lexer
$string = $this->normalize($string, $config, $context);
$parser= new XML_HTMLSax3();
$parser = new XML_HTMLSax3();
$parser->set_object($this);
$parser->set_element_handler('openHandler','closeHandler');
$parser->set_data_handler('dataHandler');

View File

@ -32,6 +32,7 @@ class HTMLPurifier_URISchemeRegistry
/**
* Retrieve sole instance of the registry.
* @static
* @param $prototype Optional prototype to overload sole instance with,
* or bool true to reset to default registry.
* @note Pass a registry object $prototype with a compatible interface and

40
smoketests/all.php Normal file
View File

@ -0,0 +1,40 @@
<?php
require_once 'common.php';
header('Content-type: text/html; charset=UTF-8');
echo '<?xml version="1.0" encoding="UTF-8" ?>';
?><!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>HTML Purifier: All Smoketests</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<style type="text/css">
#content {margin:5em;}
iframe {width:100%;height:30em;}
</style>
</head>
<body>
<h1>HTML Purifier: All Smoketests</h1>
<div id="content">
<?php
$dir = './';
$dh = opendir($dir);
while (false !== ($filename = readdir($dh))) {
if ($filename[0] == '.') continue;
if (strpos($filename, '.php') === false) continue;
if ($filename == 'common.php') continue;
if ($filename == 'all.php') continue;
?>
<iframe src="<?php echo escapeHTML($filename); ?>"></iframe>
<?php
}
?>
</div>
</body>
</html>

View File

@ -11,4 +11,4 @@ function escapeHTML($string) {
return $string;
}
?>
?>

View File

@ -46,6 +46,7 @@ echo '<?xml version="1.0" encoding="UTF-8" ?>';
.HTMLPurifier_Printer caption {font-size:1.5em; font-weight:bold;
width:100%;}
.HTMLPurifier_Printer .heavy {background:#99C;text-align:center;}
dt {font-weight:bold;}
</style>
<script type="text/javascript">
function toggleWriteability(id_of_patient, checked) {
@ -97,13 +98,14 @@ transformation into a real array list or a lookup table).</p>
<label for="<?php echo $directive; ?>">%<?php echo $directive; ?></label>
</a>
</th>
<td>
<?php if (is_bool($value)) { ?>
<td id="<?php echo $directive; ?>">
<label for="Yes_<?php echo $directive; ?>"><span class="c">%<?php echo $directive; ?>:</span> Yes</label>
<input type="radio" name="<?php echo $directive; ?>" id="Yes_<?php echo $directive; ?>" value="1"<?php if ($value) { ?> checked="checked"<?php } ?> /> &nbsp;
<label for="No_<?php echo $directive; ?>"><span class="c">%<?php echo $directive; ?>:</span> No</label>
<input type="radio" name="<?php echo $directive; ?>" id="No_<?php echo $directive; ?>" value="0"<?php if (!$value) { ?> checked="checked"<?php } ?> />
<?php } else { ?>
<td>
<?php if($allow_null) { ?>
<label for="Null_<?php echo $directive; ?>"><span class="c">%<?php echo $directive; ?>:</span> Null/Disabled*</label>
<input
@ -140,6 +142,40 @@ variable and a null variable. A whitelist, for example, will take an
empty array as meaning <em>no</em> allowed elements, while checking
Null/Disabled will mean that user whitelisting functionality is disabled.</p>
</form>
<h2>Definitions</h2>
<dl>
<dt>Parent of Fragment</dt>
<dd>HTML that HTML Purifier does not live in a void: when it's
output, it has to be placed in another element by means of
something like <code>&lt;element&gt; &lt;?php echo $html
?&gt; &lt;/element&gt;</code>. The parent in this example
is <code>element</code>.</dd>
<dt>Strict mode</dt>
<dd>Whether or not HTML Purifier's output is Transitional or
Strict compliant. Non-strict mode still actually a little strict
and converts many deprecated elements.</dd>
<dt>#PCDATA</dt>
<dd>Literally <strong>Parsed Character Data</strong>, it is regular
text. Tags like <code>ul</code> don't allow text in them, so
#PCDATA is missing.</dd>
<dt>Tag transform</dt>
<dd>A tag transform will change one tag to another. Example: <code>font</code>
turns into a <code>span</code> tag with appropriate CSS.</dd>
<dt>Attr Transform</dt>
<dd>An attribute transform changes a group of attributes based on one
another. Currently, only <code>lang</code> and <code>xml:lang</code>
use this hook, to synchronize each other's values. Pre/Post indicates
whether or not the transform is done before/after validation.</dd>
<dt>Excludes</dt>
<dd>Tags that an element excludes are excluded for all descendants of
that element, and not just the children of them.</dd>
<dt>Name(Param1, Param2)</dt>
<dd>Represents an internal data-structure. You'll have to check out
the corresponding classes in HTML Purifier to find out more.</dd>
</dl>
<h2>HTMLDefinition</h2>
<?php echo $printer_html_definition->render($config) ?>
<h2>CSSDefinition</h2>

View File

@ -70,6 +70,9 @@ class Debugger
$this->add_pre = !extension_loaded('xdebug');
}
/**
* @static
*/
static function &instance() {
static $soleInstance = false;
if (!$soleInstance) $soleInstance = new Debugger();
@ -142,4 +145,4 @@ class Debugger
}
?>
?>

View File

@ -8,14 +8,13 @@ class HTMLPurifier_EncoderTest extends UnitTestCase
var $Encoder;
function setUp() {
$this->Encoder = new HTMLPurifier_Encoder();
$this->_entity_lookup = HTMLPurifier_EntityLookup::instance();
}
function assertCleanUTF8($string, $expect = null) {
if ($expect === null) $expect = $string;
$this->assertIdentical($this->Encoder->cleanUTF8($string), $expect, 'iconv: %s');
$this->assertIdentical($this->Encoder->cleanUTF8($string, true), $expect, 'PHP: %s');
$this->assertIdentical(HTMLPurifier_Encoder::cleanUTF8($string), $expect, 'iconv: %s');
$this->assertIdentical(HTMLPurifier_Encoder::cleanUTF8($string, true), $expect, 'PHP: %s');
}
function test_cleanUTF8() {
@ -35,7 +34,7 @@ class HTMLPurifier_EncoderTest extends UnitTestCase
// UTF-8 means that we don't touch it
$this->assertIdentical(
$this->Encoder->convertToUTF8("\xF6", $config, $context),
HTMLPurifier_Encoder::convertToUTF8("\xF6", $config, $context),
"\xF6" // this is invalid
);
$this->assertNoErrors();
@ -44,14 +43,14 @@ class HTMLPurifier_EncoderTest extends UnitTestCase
// Now it gets converted
$this->assertIdentical(
$this->Encoder->convertToUTF8("\xF6", $config, $context),
HTMLPurifier_Encoder::convertToUTF8("\xF6", $config, $context),
"\xC3\xB6"
);
$config->set('Test', 'ForceNoIconv', true);
$this->assertIdentical(
$this->Encoder->convertToUTF8("\xF6", $config, $context),
HTMLPurifier_Encoder::convertToUTF8("\xF6", $config, $context),
"\xC3\xB6"
);
@ -63,7 +62,7 @@ class HTMLPurifier_EncoderTest extends UnitTestCase
// UTF-8 means that we don't touch it
$this->assertIdentical(
$this->Encoder->convertFromUTF8("\xC3\xB6", $config, $context),
HTMLPurifier_Encoder::convertFromUTF8("\xC3\xB6", $config, $context),
"\xC3\xB6"
);
@ -71,14 +70,14 @@ class HTMLPurifier_EncoderTest extends UnitTestCase
// Now it gets converted
$this->assertIdentical(
$this->Encoder->convertFromUTF8("\xC3\xB6", $config, $context),
HTMLPurifier_Encoder::convertFromUTF8("\xC3\xB6", $config, $context),
"\xF6"
);
$config->set('Test', 'ForceNoIconv', true);
$this->assertIdentical(
$this->Encoder->convertFromUTF8("\xC3\xB6", $config, $context),
HTMLPurifier_Encoder::convertFromUTF8("\xC3\xB6", $config, $context),
"\xF6"
);

View File

@ -326,4 +326,4 @@ class HTMLPurifier_LexerTest extends UnitTestCase
}
?>
?>