Clients like their YouTube videos. It gives them a warm fuzzy feeling when they see a neat little embedded video player on their websites that can play the latest clips from their documentary "Fido and the Bones of Spring". All joking aside, the ability to embed YouTube videos or other active content in their pages is something that a lot of people like.
This is a bad idea. The moment you embed anything untrusted,
you will definitely be slammed by a manner of nasties that can be
embedded in things from your run of the mill Flash movie to
Quicktime movies.
Even img
tags, which HTML Purifier allows by default, can be
dangerous. Be distrustful of anything that tells a browser to load content
from another website automatically.
Luckily for us, however, whitelisting saves the day. Sure, letting users include any old random flash file could be dangerous, but if it's from a specific website, it probably is okay. If no amount of pleading will convince the people upstairs that they should just settle with just linking to their movies, you may find this technique very useful.
Below is custom code that allows users to embed YouTube videos. This is not favoritism: this trick can easily be adapted for other forms of embeddable content.
Usually, websites like YouTube give us boilerplate code that you can insert into your documents. YouTube's code goes like this:
<object width="425" height="350"> <param name="movie" value="http://www.youtube.com/v/AyPzM5WK8ys" /> <param name="wmode" value="transparent" /> <embed src="http://www.youtube.com/v/AyPzM5WK8ys" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350" /> </object>
There are two things to note about this code:
<embed>
is not recognized by W3C, so if you want
standards-compliant code, you'll have to get rid of it.What point 2 means is that if we have code like <span
class="embed-youtube">AyPzM5WK8ys</span>
your
application can reconstruct the full object from this small snippet that
passes through HTML Purifier unharmed.
<?php class HTMLPurifierX_PreserveYouTube extends HTMLPurifier { function purify($html, $config = null) { $pre_regex = '#<object[^>]+>.+?'. 'http://www.youtube.com/v/([A-Za-z0-9]+).+?</object>#'; $pre_replace = '<span class="youtube-embed">\1</span>'; $html = preg_replace($pre_regex, $pre_replace, $html); $html = parent::purify($html, $config); $post_regex = '#<span class="youtube-embed">([A-Za-z0-9]+)</span>#'; $post_replace = '<object width="425" height="350" '. 'data="http://www.youtube.com/v/\1">'. '<param name="movie" value="http://www.youtube.com/v/\1"></param>'. '<param name="wmode" value="transparent"></param>'. '<!--[if IE]>'. '<embed src="http://www.youtube.com/v/\1"'. 'type="application/x-shockwave-flash"'. 'wmode="transparent" width="425" height="350" />'. '<![endif]-->'. '</object>'; $html = preg_replace($post_regex, $post_replace, $html); return $html; } } $purifier = new HTMLPurifierX_PreserveYouTube(); $html_still_with_youtube = $purifier->purify($html_with_youtube); ?>
There is a bit going on here, so let's explain.
HTMLPurifierX
because it's
userspace code. Don't use HTMLPurifier
in front of your
class, since it might clobber another class in the library.new HTMLPurifier
to new
HTMLPurifierX_PreserveYouTube
. There's other ways to go about
doing this: if you were calling a function that wrapped HTML Purifier,
you could paste the PHP right there. If you wanted to be really
fancy, you could make a decorator for HTMLPurifier.There are a number of possible problems with the code above, depending on how you look at it.
The width and height of the final YouTube movie cannot be adjusted. This is because I am lazy. If you really insist on letting users change the size of the movie, what you need to do is package up the attributes inside the span tag (along with the movie ID). It gets complicated though: a malicious user can specify an outrageously large height and width and attempt to crash the user's operating system/browser. You need to either cap it by limiting the amount of digits allowed in the regex or using a callback to check the number.
By allowing this code onto our website, we are trusting that YouTube has tech-savvy enough people not to allow their users to inject malicious code into the Flash files. An exploit on YouTube means an exploit on your site. Even though YouTube is run by the reputable Google, it doesn't mean they are invulnerable. You're putting a certain measure of the job on an external provider (just as you have by entrusting your user input to HTML Purifier), and it is important that you are cognizant of the risk.
This should go without saying, but if you're going to adapt this code for Google Video or the like, make sure you do it right. It's extremely easy to allow a character too many in the final section and suddenly you're introducing XSS into HTML Purifier's XSS free output. HTML Purifier may be well written, but it cannot guard against vulnerabilities introduced after it has finished.
It would probably be a good idea if this code was added to the core library. Look out for the inclusion of this into the core as a decorator or the like.