mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2025-01-20 12:31:53 +00:00
[1.2.0] [BC] ID attributes now disabled by default. New directives:
+ %HTML.EnableAttrID - restores old behavior by allowing IDs + %Attr.IDPrefix - %Attr.IDBlacklist alternative that munges all user IDs so that they don't collide with your IDs + %Attr.IDPrefixLocal - Same as above, but for when there are multiple instances of user content on the page + Profuse documentation on how to use these available in id.txt git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@526 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
parent
2dc8e9c3d5
commit
7a4c7b3777
8
NEWS
8
NEWS
@ -2,6 +2,7 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
|
|||||||
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||
|
|
||||||
= KEY ====================
|
= KEY ====================
|
||||||
|
# Breaks back-compat
|
||||||
! Feature
|
! Feature
|
||||||
- Bugfix
|
- Bugfix
|
||||||
+ Sub-comment
|
+ Sub-comment
|
||||||
@ -9,6 +10,13 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
|
|||||||
==========================
|
==========================
|
||||||
|
|
||||||
1.2.0, unknown projected release date
|
1.2.0, unknown projected release date
|
||||||
|
# ID attributes now disabled by default. New directives:
|
||||||
|
+ %HTML.EnableAttrID - restores old behavior by allowing IDs
|
||||||
|
+ %Attr.IDPrefix - %Attr.IDBlacklist alternative that munges all user IDs
|
||||||
|
so that they don't collide with your IDs
|
||||||
|
+ %Attr.IDPrefixLocal - Same as above, but for when there are multiple
|
||||||
|
instances of user content on the page
|
||||||
|
+ Profuse documentation on how to use these available in id.txt
|
||||||
! Added MODx plugin <http://modxcms.com/forums/index.php/topic,6604.0.html>
|
! Added MODx plugin <http://modxcms.com/forums/index.php/topic,6604.0.html>
|
||||||
! Added percent encoding normalization
|
! Added percent encoding normalization
|
||||||
! XSS attacks smoketest given facelift
|
! XSS attacks smoketest given facelift
|
||||||
|
2
TODO
2
TODO
@ -48,6 +48,8 @@ Unknown release (on a scratch-an-itch basis)
|
|||||||
empty-cells:show is applied to have compatibility with Internet Explorer
|
empty-cells:show is applied to have compatibility with Internet Explorer
|
||||||
- Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand.
|
- Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand.
|
||||||
Also, enable disabling of directionality
|
Also, enable disabling of directionality
|
||||||
|
- Append something to duplicate IDs so they're still usable (impl. note: the
|
||||||
|
dupe detector would also need to detect the suffix as well)
|
||||||
|
|
||||||
Encoding workarounds
|
Encoding workarounds
|
||||||
- Non-lossy dumb alternate character encoding transformations, achieved by
|
- Non-lossy dumb alternate character encoding transformations, achieved by
|
||||||
|
124
docs/id.txt
Normal file
124
docs/id.txt
Normal file
@ -0,0 +1,124 @@
|
|||||||
|
|
||||||
|
IDs
|
||||||
|
What they are, why you should(n't) wear them, and how to deal with it
|
||||||
|
|
||||||
|
Prior to HTML Purifier 1.2.0, this library blithely accepted user input that
|
||||||
|
looked like this:
|
||||||
|
|
||||||
|
<a id="fragment">Anchor</a>
|
||||||
|
|
||||||
|
...presenting an attractive vector for those that would destroy standards
|
||||||
|
compliance: simply set the ID to one that is already used elsewhere in the
|
||||||
|
document and voila: validation breaks. There was a half-hearted attempt to
|
||||||
|
prevent this by allowing users to blacklist IDs, but I suspect that no one
|
||||||
|
really bothered, and thus, with the release of 1.2.0, IDs are now *removed*
|
||||||
|
by default.
|
||||||
|
|
||||||
|
IDs, however, are quite useful functionality to have, so if users start
|
||||||
|
complaining about broken anchors you'll probably want to turn them back on
|
||||||
|
with %HTML.EnableAttrID. But before you go mucking around with the config
|
||||||
|
object, it's probably worth to take some precautions to keep your page
|
||||||
|
validating. Why?
|
||||||
|
|
||||||
|
1. Standards-compliant pages are good
|
||||||
|
2. Duplicated IDs interfere with anchors. If there are two id="foobar"s in a
|
||||||
|
document, which spot does a browser presented with the fragment #foobar go
|
||||||
|
to? Most browsers opt for the first appearing ID, making it impossible
|
||||||
|
to references the second section. Similarly, duplicated IDs can hijack
|
||||||
|
client-side scripting that relies on the IDs of elements.
|
||||||
|
|
||||||
|
You have (currently) four ways of dealing with the problem.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Road #1: Blacklisting IDs
|
||||||
|
Good for pages with single content source and stable templates
|
||||||
|
|
||||||
|
Keeping in terms with the KISS (Keep It Simple, Stupid) principle, let us
|
||||||
|
deal with the most obvious solution: preventing users from using any IDs that
|
||||||
|
appear elsewhere on the document. The method is simple:
|
||||||
|
|
||||||
|
$config->set('HTML', 'EnableAttrID', true);
|
||||||
|
$config->set('Attr', 'IDBlacklist' array(
|
||||||
|
'list', 'of', 'attributes', 'that', 'are', 'forbidden'
|
||||||
|
));
|
||||||
|
|
||||||
|
That being said, there are some notable drawbacks. First of all, you have to
|
||||||
|
know precisely which IDs are being used by the HTML surrounding the user code.
|
||||||
|
This is easier said than done: quite often the page designer and the system
|
||||||
|
coder work separately, so the designer has to constantly be talking with the
|
||||||
|
coder whenever he decides to add a new anchor. Miss one and you open yourself
|
||||||
|
to possible standards-compliance issues.
|
||||||
|
|
||||||
|
Furthermore, this position becomes untenable when a single web page must hold
|
||||||
|
multiple portions of user-submitted content. Since there's obviously no way
|
||||||
|
to find out before-hand what IDs users will use, the blacklist is helpless.
|
||||||
|
And even since HTML Purifier validates each segment seperately, perhaps doing
|
||||||
|
so at different times, it would be extremely difficult to dynamically update
|
||||||
|
the blacklist inbetween runs.
|
||||||
|
|
||||||
|
Finally, simply destroying the ID is extremely un-userfriendly behavior: after
|
||||||
|
all, they might have simply specified a duplicate ID by accident.
|
||||||
|
|
||||||
|
Thus, we get to our second method.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Road #2: Namespacing IDs
|
||||||
|
Lazy developer's way, but needs user education
|
||||||
|
|
||||||
|
This method, too, is quite simple: add a prefix to all user IDs. With this
|
||||||
|
code:
|
||||||
|
|
||||||
|
$config->set('HTML', 'EnableAttrID', true);
|
||||||
|
$config->set('Attr', 'IDPrefix', 'user_');
|
||||||
|
|
||||||
|
...this:
|
||||||
|
|
||||||
|
<a id="foobar">Anchor!</a>
|
||||||
|
|
||||||
|
...turns into:
|
||||||
|
|
||||||
|
<a id="user_foobar">Anchor!</a>
|
||||||
|
|
||||||
|
As long as you don't have any IDs that start with user_, collisions are
|
||||||
|
guaranteed not to happen. The drawback is obvious: if a user submits
|
||||||
|
id="foobar", they probably expect to be able to reference their page with
|
||||||
|
#foobar. You'll have to tell them, "No, that doesn't work, you have to add
|
||||||
|
user_ to the beginning."
|
||||||
|
|
||||||
|
And yes, things get hairier. Even with a nice prefix, we still have done
|
||||||
|
nothing about multiple HTML Purifier outputs on one page. Thus, we have
|
||||||
|
a second configuration value to piggy-back off of: %Attr.IDPrefixLocal:
|
||||||
|
|
||||||
|
$config->set('Attr', 'IDPrefixLocal', 'comment' . $id . '_');
|
||||||
|
|
||||||
|
This new attributes does nothing but append on to regular IDPrefix, but is
|
||||||
|
special in that it is volatile: it's value is determined at run-time and
|
||||||
|
cannot possibly be cordoned into, say, a .ini config file. As for what to
|
||||||
|
put into the directive, is up to you, but I would recommend the ID number
|
||||||
|
the text has been assigned in the database. Whatever you pick, however, it
|
||||||
|
has to be unique and stable for the text you are validating. Note, however,
|
||||||
|
that we require that %Attr.IDPrefix be set before you use this directive.
|
||||||
|
|
||||||
|
And also remember: the user has to know what this prefix is too!
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Path #3: Abstinence
|
||||||
|
|
||||||
|
You may not want to bother. That's okay too, just don't enable IDs.
|
||||||
|
|
||||||
|
Personally, I would take this road whenever user-submitted content would be
|
||||||
|
possibly be shown together on one page. Why a blog comment would need to use
|
||||||
|
anchors is beyond me.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Path #4: Denial
|
||||||
|
|
||||||
|
To revert back to pre-1.2.0 behavior, simply:
|
||||||
|
|
||||||
|
$config->set('HTML', 'EnableAttrID', true);
|
||||||
|
|
||||||
|
Don't come crying to me when your page mysteriously stops validating, though.
|
@ -3,6 +3,30 @@
|
|||||||
require_once 'HTMLPurifier/AttrDef.php';
|
require_once 'HTMLPurifier/AttrDef.php';
|
||||||
require_once 'HTMLPurifier/IDAccumulator.php';
|
require_once 'HTMLPurifier/IDAccumulator.php';
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'Attr', 'IDPrefix', '', 'string',
|
||||||
|
'String to prefix to IDs. If you have no idea what IDs your pages '.
|
||||||
|
'may use, you may opt to simply add a prefix to all user-submitted ID '.
|
||||||
|
'attributes so that they are still usable, but will not conflict with '.
|
||||||
|
'core page IDs. Example: setting the directive to \'user_\' will result in '.
|
||||||
|
'a user submitted \'foo\' to become \'user_foo\' Be sure to set '.
|
||||||
|
'%HTML.EnableAttrID to true before using '.
|
||||||
|
'this. This directive was available since 1.2.0.'
|
||||||
|
);
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'Attr', 'IDPrefixLocal', '', 'string',
|
||||||
|
'Temporary prefix for IDs used in conjunction with %Attr.IDPrefix. If '.
|
||||||
|
'you need to allow multiple sets of '.
|
||||||
|
'user content on web page, you may need to have a seperate prefix that '.
|
||||||
|
'changes with each iteration. This way, seperately submitted user content '.
|
||||||
|
'displayed on the same page doesn\'t clobber each other. Ideal values '.
|
||||||
|
'are unique identifiers for the content it represents (i.e. the id of '.
|
||||||
|
'the row in the database). Be sure to add a seperator (like an underscore) '.
|
||||||
|
'at the end. Warning: this directive will not work unless %Attr.IDPrefix '.
|
||||||
|
'is set to a non-empty value! This directive was available since 1.2.0.'
|
||||||
|
);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Validates the HTML attribute ID.
|
* Validates the HTML attribute ID.
|
||||||
* @warning Even though this is the id processor, it
|
* @warning Even though this is the id processor, it
|
||||||
@ -21,6 +45,16 @@ class HTMLPurifier_AttrDef_ID extends HTMLPurifier_AttrDef
|
|||||||
|
|
||||||
if ($id === '') return false;
|
if ($id === '') return false;
|
||||||
|
|
||||||
|
$prefix = $config->get('Attr', 'IDPrefix');
|
||||||
|
if ($prefix !== '') {
|
||||||
|
$prefix .= $config->get('Attr', 'IDPrefixLocal');
|
||||||
|
// prevent re-appending the prefix
|
||||||
|
if (strpos($id, $prefix) !== 0) $id = $prefix . $id;
|
||||||
|
} elseif ($config->get('Attr', 'IDPrefixLocal') !== '') {
|
||||||
|
trigger_error('%Attr.IDPrefixLocal cannot be used unless '.
|
||||||
|
'%Attr.IDPrefix is set', E_USER_WARNING);
|
||||||
|
}
|
||||||
|
|
||||||
$id_accumulator =& $context->get('IDAccumulator');
|
$id_accumulator =& $context->get('IDAccumulator');
|
||||||
if (isset($id_accumulator->ids[$id])) return false;
|
if (isset($id_accumulator->ids[$id])) return false;
|
||||||
|
|
||||||
|
@ -22,6 +22,19 @@ require_once 'HTMLPurifier/Generator.php';
|
|||||||
require_once 'HTMLPurifier/Token.php';
|
require_once 'HTMLPurifier/Token.php';
|
||||||
require_once 'HTMLPurifier/TagTransform.php';
|
require_once 'HTMLPurifier/TagTransform.php';
|
||||||
|
|
||||||
|
HTMLPurifier_ConfigSchema::define(
|
||||||
|
'HTML', 'EnableAttrID', false, 'bool',
|
||||||
|
'Allows the ID attribute in HTML. This is disabled by default '.
|
||||||
|
'due to the fact that without proper configuration user input can '.
|
||||||
|
'easily break the validation of a webpage by specifying an ID that is '.
|
||||||
|
'already on the surrounding HTML. If you don\'t mind throwing caution to '.
|
||||||
|
'the wind, enable this directive, but I strongly recommend you also '.
|
||||||
|
'consider blacklisting IDs you use (%Attr.IDBlacklist) or prefixing all '.
|
||||||
|
'user supplied IDs (%Attr.IDPrefix). This directive has been available '.
|
||||||
|
'since 1.2.0, and when set to true reverts to the behavior of pre-1.2.0 '.
|
||||||
|
'versions.'
|
||||||
|
);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Defines the purified HTML type with large amounts of objects.
|
* Defines the purified HTML type with large amounts of objects.
|
||||||
*
|
*
|
||||||
@ -271,7 +284,6 @@ class HTMLPurifier_HTMLDefinition
|
|||||||
// which manually override these in their local definitions
|
// which manually override these in their local definitions
|
||||||
$this->info_global_attr = array(
|
$this->info_global_attr = array(
|
||||||
// core attrs
|
// core attrs
|
||||||
'id' => new HTMLPurifier_AttrDef_ID(),
|
|
||||||
'class' => new HTMLPurifier_AttrDef_Class(),
|
'class' => new HTMLPurifier_AttrDef_Class(),
|
||||||
'title' => $e_Text,
|
'title' => $e_Text,
|
||||||
'style' => new HTMLPurifier_AttrDef_CSS(),
|
'style' => new HTMLPurifier_AttrDef_CSS(),
|
||||||
@ -281,6 +293,10 @@ class HTMLPurifier_HTMLDefinition
|
|||||||
'xml:lang' => new HTMLPurifier_AttrDef_Lang(),
|
'xml:lang' => new HTMLPurifier_AttrDef_Lang(),
|
||||||
);
|
);
|
||||||
|
|
||||||
|
if ($config->get('HTML', 'EnableAttrID')) {
|
||||||
|
$this->info_global_attr['id'] = new HTMLPurifier_AttrDef_ID();
|
||||||
|
}
|
||||||
|
|
||||||
// required attribute stipulation handled in attribute transformation
|
// required attribute stipulation handled in attribute transformation
|
||||||
$this->info['bdo']->attr = array(); // nothing else
|
$this->info['bdo']->attr = array(); // nothing else
|
||||||
|
|
||||||
|
@ -7,13 +7,17 @@ require_once 'HTMLPurifier/IDAccumulator.php';
|
|||||||
class HTMLPurifier_AttrDef_IDTest extends HTMLPurifier_AttrDefHarness
|
class HTMLPurifier_AttrDef_IDTest extends HTMLPurifier_AttrDefHarness
|
||||||
{
|
{
|
||||||
|
|
||||||
function test() {
|
function setUp() {
|
||||||
|
parent::setUp();
|
||||||
|
|
||||||
$this->context = new HTMLPurifier_Context();
|
|
||||||
$id_accumulator = new HTMLPurifier_IDAccumulator();
|
$id_accumulator = new HTMLPurifier_IDAccumulator();
|
||||||
$this->context->register('IDAccumulator', $id_accumulator);
|
$this->context->register('IDAccumulator', $id_accumulator);
|
||||||
$this->def = new HTMLPurifier_AttrDef_ID();
|
$this->def = new HTMLPurifier_AttrDef_ID();
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
function test() {
|
||||||
|
|
||||||
// valid ID names
|
// valid ID names
|
||||||
$this->assertDef('alpha');
|
$this->assertDef('alpha');
|
||||||
$this->assertDef('al_ha');
|
$this->assertDef('al_ha');
|
||||||
@ -34,6 +38,44 @@ class HTMLPurifier_AttrDef_IDTest extends HTMLPurifier_AttrDefHarness
|
|||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function testPrefix() {
|
||||||
|
|
||||||
|
$this->config->set('Attr', 'IDPrefix', 'user_');
|
||||||
|
|
||||||
|
$this->assertDef('alpha', 'user_alpha');
|
||||||
|
$this->assertDef('<asa', false);
|
||||||
|
$this->assertDef('once', 'user_once');
|
||||||
|
$this->assertDef('once', false);
|
||||||
|
|
||||||
|
// if already prefixed, leave alone
|
||||||
|
$this->assertDef('user_alas');
|
||||||
|
$this->assertDef('user_user_alas'); // how to bypass
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
function testTwoPrefixes() {
|
||||||
|
|
||||||
|
$this->config->set('Attr', 'IDPrefix', 'user_');
|
||||||
|
$this->config->set('Attr', 'IDPrefixLocal', 'story95_');
|
||||||
|
|
||||||
|
$this->assertDef('alpha', 'user_story95_alpha');
|
||||||
|
$this->assertDef('<asa', false);
|
||||||
|
$this->assertDef('once', 'user_story95_once');
|
||||||
|
$this->assertDef('once', false);
|
||||||
|
|
||||||
|
$this->assertDef('user_story95_alas');
|
||||||
|
$this->assertDef('user_alas', 'user_story95_user_alas'); // !
|
||||||
|
|
||||||
|
$this->config->set('Attr', 'IDPrefix', '');
|
||||||
|
$this->assertDef('amherst'); // no affect when IDPrefix isn't set
|
||||||
|
$this->assertError('%Attr.IDPrefixLocal cannot be used unless '.
|
||||||
|
'%Attr.IDPrefix is set');
|
||||||
|
// SimpleTest has a bug and throws a sprintf error
|
||||||
|
// $this->assertNoErrors();
|
||||||
|
$this->swallowErrors();
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
?>
|
?>
|
@ -21,17 +21,25 @@ class HTMLPurifier_Strategy_ValidateAttributesTest extends
|
|||||||
$this->assertResult('');
|
$this->assertResult('');
|
||||||
|
|
||||||
// test ids
|
// test ids
|
||||||
$this->assertResult('<div id="valid">Preserve the ID.</div>');
|
$this->assertResult(
|
||||||
|
'<div id="valid">Kill the ID.</div>',
|
||||||
|
'<div>Kill the ID.</div>'
|
||||||
|
);
|
||||||
|
|
||||||
|
$this->assertResult('<div id="valid">Preserve the ID.</div>', true,
|
||||||
|
array('HTML.EnableAttrID' => true));
|
||||||
|
|
||||||
$this->assertResult(
|
$this->assertResult(
|
||||||
'<div id="0invalid">Kill the ID.</div>',
|
'<div id="0invalid">Kill the ID.</div>',
|
||||||
'<div>Kill the ID.</div>'
|
'<div>Kill the ID.</div>',
|
||||||
|
array('HTML.EnableAttrID' => true)
|
||||||
);
|
);
|
||||||
|
|
||||||
// test id accumulator
|
// test id accumulator
|
||||||
$this->assertResult(
|
$this->assertResult(
|
||||||
'<div id="valid">Valid</div><div id="valid">Invalid</div>',
|
'<div id="valid">Valid</div><div id="valid">Invalid</div>',
|
||||||
'<div id="valid">Valid</div><div>Invalid</div>'
|
'<div id="valid">Valid</div><div>Invalid</div>',
|
||||||
|
array('HTML.EnableAttrID' => true)
|
||||||
);
|
);
|
||||||
|
|
||||||
$this->assertResult(
|
$this->assertResult(
|
||||||
@ -42,20 +50,25 @@ class HTMLPurifier_Strategy_ValidateAttributesTest extends
|
|||||||
// test attribute key case sensitivity
|
// test attribute key case sensitivity
|
||||||
$this->assertResult(
|
$this->assertResult(
|
||||||
'<div ID="valid">Convert ID to lowercase.</div>',
|
'<div ID="valid">Convert ID to lowercase.</div>',
|
||||||
'<div id="valid">Convert ID to lowercase.</div>'
|
'<div id="valid">Convert ID to lowercase.</div>',
|
||||||
|
array('HTML.EnableAttrID' => true)
|
||||||
);
|
);
|
||||||
|
|
||||||
// test simple attribute substitution
|
// test simple attribute substitution
|
||||||
$this->assertResult(
|
$this->assertResult(
|
||||||
'<div id=" valid ">Trim whitespace.</div>',
|
'<div id=" valid ">Trim whitespace.</div>',
|
||||||
'<div id="valid">Trim whitespace.</div>'
|
'<div id="valid">Trim whitespace.</div>',
|
||||||
|
array('HTML.EnableAttrID' => true)
|
||||||
);
|
);
|
||||||
|
|
||||||
// test configuration id blacklist
|
// test configuration id blacklist
|
||||||
$this->assertResult(
|
$this->assertResult(
|
||||||
'<div id="invalid">Invalid</div>',
|
'<div id="invalid">Invalid</div>',
|
||||||
'<div>Invalid</div>',
|
'<div>Invalid</div>',
|
||||||
array('Attr.IDBlacklist' => array('invalid'))
|
array(
|
||||||
|
'Attr.IDBlacklist' => array('invalid'),
|
||||||
|
'HTML.EnableAttrID' => true
|
||||||
|
)
|
||||||
);
|
);
|
||||||
|
|
||||||
// test classes
|
// test classes
|
||||||
|
Loading…
Reference in New Issue
Block a user