From f3d050c517aab64a24f077ee6ea9009381db9de4 Mon Sep 17 00:00:00 2001 From: "Edward Z. Yang" Date: Thu, 30 Dec 2010 23:51:53 +0000 Subject: [PATCH] Fix two bugs with caching of customized raw definitions. The first bug is that we will repeatedly write out the result of a customized raw definition to the filesystem, even when a cache entry already exists. The second bug is that caching these definitions doesn't actually work (the cache entry is written but never used.) A new API for retrieving raw definitions permits the user to take advantage of caching. Signed-off-by: Edward Z. Yang --- NEWS | 8 +- docs/enduser-customize.html | 93 ++++++-- library/HTMLPurifier/Config.php | 225 ++++++++++++++---- .../ConfigSchema/schema/HTML.Nofollow.txt | 2 +- library/HTMLPurifier/Definition.php | 11 + .../HTMLPurifier/AttrValidator_ErrorsTest.php | 2 - tests/HTMLPurifier/ConfigTest.php | 100 +++++++- .../DefinitionCache/SerializerTest.php | 1 + tests/HTMLPurifier/HTMLDefinitionTest.php | 20 +- .../HTMLPurifier/HTMLModule/SafeEmbedTest.php | 1 - .../Strategy/RemoveForeignElementsTest.php | 3 - 11 files changed, 375 insertions(+), 91 deletions(-) diff --git a/NEWS b/NEWS index 7cb9ba57..427aa132 100644 --- a/NEWS +++ b/NEWS @@ -9,7 +9,11 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier . Internal change ========================== -4.2.1, unknown release date +4.3.0, unknown release date +# Fixed broken caching of customized raw definitions, but requires an + API change. The old API still works but will emit a warning, + see http://htmlpurifier.org/docs/enduser-customize.html#optimized + for how to upgrade your code. ! Added %HTML.Nofollow to add rel="nofollow" to external links. ! More types of SPL autoloaders allowed on later versions of PHP. ! Implementations for position, top, left, right, bottom, z-index @@ -24,6 +28,8 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier This safety check is only done for HTMLPurifier.auto.php; if you are using standalone or the specialized includes files, you're expected to know what you're doing. +- Stop repeatedly writing the cache file after I'm done customizing a + raw definition. 4.2.0, released 2010-09-15 ! Added %Core.RemoveProcessingInstructions, which lets you remove diff --git a/docs/enduser-customize.html b/docs/enduser-customize.html index 42756f11..7e1ffa26 100644 --- a/docs/enduser-customize.html +++ b/docs/enduser-customize.html @@ -146,7 +146,9 @@
$config = HTMLPurifier_Config::createDefault();
 $config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
 $config->set('HTML.DefinitionRev', 1);
-$def = $config->getHTMLDefinition(true);
+if ($def = $config->maybeGetRawHTMLDefinition()) { + // our code will go here +}

Assuming that HTML Purifier has already been properly loaded (hint: @@ -174,23 +176,15 @@ $def = $config->getHTMLDefinition(true);

  • The fourth line retrieves a raw HTMLPurifier_HTMLDefinition - object that we will be tweaking. If the parameter was removed, we - would be retrieving a fully formed definition object, which is somewhat - useless for customization purposes. + object that we will be tweaking. Interestingly enough, we have + placed it in an if block: this is because + maybeGetRawHTMLDefinition, as its name suggests, may + return a NULL, in which case we should skip doing any + initialization. This, in fact, will correspond to when our fully + customized object is already in the cache.
  • -

    Broken backwards-compatibility

    - -

    - Those of you who have already been twiddling around with the raw - HTML definition object, you'll be noticing that you're getting an error - when you attempt to retrieve the raw definition object without specifying - a DefinitionID. It is vital to caching (see below) that you make a unique - name for your customized definition, so make up something right now and - things will operate again. -

    -

    Turn off caching

    @@ -781,6 +775,75 @@ $form->excludes = array('form' => true);

  • library/HTMLPurifier/ElementDef.php
  • +

    Notes for HTML Purifier 4.2.0 and earlier

    + +

    + Previously, this tutorial gave some incorrect template code for + editing raw definitions, and that template code will now produce the + error Due to a documentation error in previous version of HTML + Purifier... Here is how to mechanically transform old-style + code into new-style code. +

    + +

    + First, identify all code that edits the raw definition object, and + put it together. Ensure none of this code must be run on every + request; if some sub-part needs to always be run, move it outside + this block. Here is an example below, with the raw definition + object code bolded. +

    + +
    $config = HTMLPurifier_Config::createDefault();
    +$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
    +$config->set('HTML.DefinitionRev', 1);
    +$def = $config->getHTMLDefinition(true);
    +$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
    +$purifier = new HTMLPurifier($config);
    + +

    + Next, replace the raw definition retrieval with a + maybeGetRawHTMLDefinition method call inside an if conditional, and + place the editing code inside that if block. +

    + +
    $config = HTMLPurifier_Config::createDefault();
    +$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
    +$config->set('HTML.DefinitionRev', 1);
    +if ($def = $config->maybeGetRawHTMLDefinition()) {
    +    $def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
    +}
    +$purifier = new HTMLPurifier($config);
    + +

    + And you're done! Alternatively, if you're OK with not ever caching + your code, the following will still work and not emit warnings. +

    + +
    $config = HTMLPurifier_Config::createDefault();
    +$def = $config->getHTMLDefinition(true);
    +$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
    +$purifier = new HTMLPurifier($config);
    + +

    + A slightly less efficient version of this was what was going on with + old versions of HTML Purifier. +

    + +

    + Technical notes: ajh pointed out on in a forum topic that + HTML Purifier appeared to be repeatedly writing to the cache even + when a cache entry already existed. Investigation lead to the + discovery of the following infelicity: caching of customized + definitions didn't actually work! The problem was that even though + a cache file would be written out at the end of the process, there + was no way for HTML Purifier to say, Actually, I've already got a + copy of your work, no need to reconfigure your + customizations. This required the API to change: placing + all of the customizations to the raw definition object in a + conditional which could be skipped. +

    +