diff --git a/NEWS b/NEWS index 10794b77..9ac40031 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,8 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier ! HTMLPurifier object now accepts configuration arrays, no need to manually instantiate a configuration object ! Context object now accessible to outside +! Added enduser-youtube.html, explains how to embed YouTube videos. See + also corresponding smoketest preserveYouTube.php. - printDefinition.php: added labels, added better clarification . HTMLPurifier_Config::create() added, takes mixed variable and converts into a HTMLPurifier_Config object. diff --git a/docs/enduser-youtube.html b/docs/enduser-youtube.html new file mode 100644 index 00000000..a237971a --- /dev/null +++ b/docs/enduser-youtube.html @@ -0,0 +1,174 @@ + + + + + + + +Embedding YouTube Videos - HTML Purifier + + + +

Embedding YouTube Videos

+
...as well as other dangerous active content
+ +
Filed under End-User
+
Return to the index.
+ +

Clients like their YouTube videos. It gives them a warm fuzzy feeling when +they see a neat little embedded video player on their websites that can play +the latest clips from their documentary "Fido and the Bones of Spring". +All joking aside, the ability to embed YouTube videos or other active +content in their pages is something that a lot of people like.

+ +

This is a bad idea. The moment you embed anything untrusted, +you will definitely be slammed by a manner of nasties that can be +embedded in things from your run of the mill Flash movie to +Quicktime movies. +Allowing users to tell the browser to load content from other websites +is intrinsically dangerous: there already security risks associated with +letting users include images from other sites!

+ +

Luckily for us, however, whitelisting saves the day. Sure, letting users +include any old random flash file could be dangerous, but if it's +from a specific website, it probably is okay. If no amount of pleading will +convince the people upstairs that they should just settle with just linking +to their movies, you may find this technique very useful.

+ +

Sample

+ +

Below is custom code that allows users to embed +YouTube videos. This is not favoritism: this trick can easily be adapted for +other forms of embeddable content.

+ +

Usually, websites like YouTube give us boilerplate code that you can insert +into your documents. YouTube's code goes like this:

+ +
+<object width="425" height="350">
+  <param name="movie" value="http://www.youtube.com/v/AyPzM5WK8ys" />
+  <param name="wmode" value="transparent" />
+  <embed src="http://www.youtube.com/v/AyPzM5WK8ys"
+         type="application/x-shockwave-flash"
+         wmode="transparent" width="425" height="350" />
+</object>
+
+ +

There are two things to note about this code:

+ +
    +
  1. <embed> is not recognized by W3C, so if you want + standards-compliant code, you'll have to get rid of it.
  2. +
  3. The code is exactly the same for all instances, except for the + identifier AyPzM5WK8ys which tells us which movie file + to retrieve.
  4. +
+ +

What point 2 means is that if we have code like <span +class="embed-youtube">AyPzM5WK8ys</span> your +application can reconstruct the full object from this small snippet that +passes through HTML Purifier unharmed.

+ +
+<?php
+
+class HTMLPurifierX_PreserveYouTube extends HTMLPurifier
+{
+    function purify($html, $config = null) {
+        $pre_regex = '#<object[^>]+>.+?'.
+            'http://www.youtube.com/v/([A-Za-z0-9]+).+?</object>#';
+        $pre_replace = '<span class="youtube-embed">\1</span>';
+        $html = preg_replace($pre_regex, $pre_replace, $html);
+        $html = parent::purify($html, $config);
+        $post_regex = '#<span class="youtube-embed">([A-Za-z0-9]+)</span>#';
+        $post_replace = '<object width="425" height="350" '.
+            'data="http://www.youtube.com/v/\1">'.
+            '<param name="movie" value="http://www.youtube.com/v/\1"></param>'.
+            '<param name="wmode" value="transparent"></param>'.
+            '<!--[if IE]>'.
+            '<embed src="http://www.youtube.com/v/\1"'.
+            'type="application/x-shockwave-flash"'.
+            'wmode="transparent" width="425" height="350" />'.
+            '<![endif]-->'.
+            '</object>';
+        $html = preg_replace($post_regex, $post_replace, $html);
+        return $html;
+    }
+}
+
+$purifier = new HTMLPurifierX_PreserveYouTube();
+$html_still_with_youtube = $purifier->purify($html_with_youtube);
+
+?>
+
+ +

There is a bit going on here, so let's explain.

+ +
    +
  1. The class uses the prefix HTMLPurifierX because it's + userspace code. Don't use HTMLPurifier in front of your + class, since it might clobber another class in the library.
  2. +
  3. In order to keep the interface compatible, we've extended HTMLPurifier + into a new class that preserves the YouTube videos. This means that + all you have to do is replace all instances of + new HTMLPurifier to new + HTMLPurifierX_PreserveYouTube. There's other ways to go about + doing this: if you were calling a function that wrapped HTML Purifier, + you could paste the PHP right there. If you wanted to be really + fancy, you could make a decorator for HTMLPurifier.
  4. +
  5. The first preg_replace call replaces any YouTube code users may have + embedded into the benign span tag. Span is used because it is inline, + and objects are inline too. We are very careful to be extremely + restrictive on what goes inside the span tag, as if an errant code + gets in there it could get messy.
  6. +
  7. The HTML is then purified as usual.
  8. +
  9. Then, another preg_replace replaces the span tag with a fully fledged + object. Note that the embed is removed, and, in its place, a data + attribute was added to the object. This makes the tag standards + compliant! It also breaks Internet Explorer, so we add in a bit of + conditional comments with the old embed code to make it work again. + It's all quite convoluted but works.
  10. +
+ +

Warning

+ +

There are a number of possible problems with the code above, depending +on how you look at it.

+ +

Cannot change width and height

+ +

The width and height of the final YouTube movie cannot be adjusted. This +is because I am lazy. If you really insist on letting users change the size +of the movie, what you need to do is package up the attributes inside the +span tag (along with the movie ID). It gets complicated though: a malicious +user can specify an outrageously large height and width and attempt to crash +the user's operating system/browser. You need to either cap it by limiting +the amount of digits allowed in the regex or using a callback to check the +number.

+ +

Trusts YouTube's security

+ +

By allowing this code onto our website, we are trusting that YouTube has +tech-savvy enough people not to allow their users to inject malicious +code into the Flash files. An exploit on YouTube means an exploit on your +site, and when you start allowing shadier sites, remember that trust +is important.

+ +

Poorly written adaptations compromise security

+ +

This should go without saying, but if you're going to adapt this code +for Google Video or the like, make sure you do it right. It's +extremely easy to allow a character too many in the final section and +suddenly you're introducing XSS into HTML Purifier's XSS free output. HTML +Purifier may be well written, but it cannot guard against vulnerabilities +introduced after it has finished.

+ +

Future plans

+ +

It would probably be a good idea if this code was added to the core +library. Look out for the inclusion of this into the core as a decorator +or the like.

+ + + \ No newline at end of file diff --git a/smoketests/preserveYouTube.php b/smoketests/preserveYouTube.php new file mode 100644 index 00000000..ef347b47 --- /dev/null +++ b/smoketests/preserveYouTube.php @@ -0,0 +1,65 @@ +'; +?> + + + HTML Purifier Preserve YouTube Smoketest + + + +

HTML Purifier Preserve YouTube Smoketest

+]+>.+?'. + 'http://www.youtube.com/v/([A-Za-z0-9]+).+?#'; + $pre_replace = '\1'; + $html = preg_replace($pre_regex, $pre_replace, $html); + $html = parent::purify($html, $config); + $post_regex = '#([A-Za-z0-9]+)#'; + $post_replace = ''. + ''. + ''. + ''. + ''; + $html = preg_replace($post_regex, $post_replace, $html); + return $html; + } +} + +$string = ''; + +$regular_purifier = new HTMLPurifier(); +$youtube_purifier = new HTMLPurifierX_PreserveYouTube(); + +?> +

Unpurified

+

Click here to see the unpurified version (breaks validation).

+
+ +

Without YouTube exception

+
purify($string); +?>
+ +

With YouTube exception

+
purify($string); +?>
+ + + \ No newline at end of file