mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2024-11-08 14:58:42 +00:00
4c54283642
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@619 48356398-32a2-884e-a903-53898d9a118a
179 lines
8.4 KiB
HTML
179 lines
8.4 KiB
HTML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
|
<meta name="description" content="Explains how to safely allow the embedding of flash from trusted sites in HTML Purifier." />
|
|
<link rel="stylesheet" type="text/css" href="./style.css" />
|
|
|
|
<title>Embedding YouTube Videos - HTML Purifier</title>
|
|
|
|
</head><body>
|
|
|
|
<h1 class="subtitled">Embedding YouTube Videos</h1>
|
|
<div class="subtitle">...as well as other dangerous active content</div>
|
|
|
|
<div id="filing">Filed under End-User</div>
|
|
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
|
|
|
<p>Clients like their YouTube videos. It gives them a warm fuzzy feeling when
|
|
they see a neat little embedded video player on their websites that can play
|
|
the latest clips from their documentary "Fido and the Bones of Spring".
|
|
All joking aside, the ability to embed YouTube videos or other active
|
|
content in their pages is something that a lot of people like.</p>
|
|
|
|
<p>This is a <em>bad</em> idea. The moment you embed anything untrusted,
|
|
you will definitely be slammed by a manner of nasties that can be
|
|
embedded in things from your run of the mill Flash movie to
|
|
<a href="http://blog.spywareguide.com/2006/12/myspace_phish_attack_leads_use.html">Quicktime movies</a>.
|
|
Even <code>img</code> tags, which HTML Purifier allows by default, can be
|
|
dangerous. Be distrustful of anything that tells a browser to load content
|
|
from another website automatically.</p>
|
|
|
|
<p>Luckily for us, however, whitelisting saves the day. Sure, letting users
|
|
include any old random flash file could be dangerous, but if it's
|
|
from a specific website, it probably is okay. If no amount of pleading will
|
|
convince the people upstairs that they should just settle with just linking
|
|
to their movies, you may find this technique very useful.</p>
|
|
|
|
<h2>Sample</h2>
|
|
|
|
<p>Below is custom code that allows users to embed
|
|
YouTube videos. This is not favoritism: this trick can easily be adapted for
|
|
other forms of embeddable content.</p>
|
|
|
|
<p>Usually, websites like YouTube give us boilerplate code that you can insert
|
|
into your documents. YouTube's code goes like this:</p>
|
|
|
|
<pre>
|
|
<object width="425" height="350">
|
|
<param name="movie" value="http://www.youtube.com/v/AyPzM5WK8ys" />
|
|
<param name="wmode" value="transparent" />
|
|
<embed src="http://www.youtube.com/v/AyPzM5WK8ys"
|
|
type="application/x-shockwave-flash"
|
|
wmode="transparent" width="425" height="350" />
|
|
</object>
|
|
</pre>
|
|
|
|
<p>There are two things to note about this code:</p>
|
|
|
|
<ol>
|
|
<li><code><embed></code> is not recognized by W3C, so if you want
|
|
standards-compliant code, you'll have to get rid of it.</li>
|
|
<li>The code is exactly the same for all instances, except for the
|
|
identifier <tt>AyPzM5WK8ys</tt> which tells us which movie file
|
|
to retrieve.</li>
|
|
</ol>
|
|
|
|
<p>What point 2 means is that if we have code like <code><span
|
|
class="embed-youtube">AyPzM5WK8ys</span></code> your
|
|
application can reconstruct the full object from this small snippet that
|
|
passes through HTML Purifier <em>unharmed</em>.</p>
|
|
|
|
<pre>
|
|
<?php
|
|
|
|
class HTMLPurifierX_PreserveYouTube extends HTMLPurifier
|
|
{
|
|
function purify($html, $config = null) {
|
|
$pre_regex = '#<object[^>]+>.+?'.
|
|
'http://www.youtube.com/v/([A-Za-z0-9]+).+?</object>#';
|
|
$pre_replace = '<span class="youtube-embed">\1</span>';
|
|
$html = preg_replace($pre_regex, $pre_replace, $html);
|
|
$html = parent::purify($html, $config);
|
|
$post_regex = '#<span class="youtube-embed">([A-Za-z0-9]+)</span>#';
|
|
$post_replace = '<object width="425" height="350" '.
|
|
'data="http://www.youtube.com/v/\1">'.
|
|
'<param name="movie" value="http://www.youtube.com/v/\1"></param>'.
|
|
'<param name="wmode" value="transparent"></param>'.
|
|
'<!--[if IE]>'.
|
|
'<embed src="http://www.youtube.com/v/\1"'.
|
|
'type="application/x-shockwave-flash"'.
|
|
'wmode="transparent" width="425" height="350" />'.
|
|
'<![endif]-->'.
|
|
'</object>';
|
|
$html = preg_replace($post_regex, $post_replace, $html);
|
|
return $html;
|
|
}
|
|
}
|
|
|
|
$purifier = new HTMLPurifierX_PreserveYouTube();
|
|
$html_still_with_youtube = $purifier->purify($html_with_youtube);
|
|
|
|
?>
|
|
</pre>
|
|
|
|
<p>There is a bit going on here, so let's explain.</p>
|
|
|
|
<ol>
|
|
<li>The class uses the prefix <code>HTMLPurifierX</code> because it's
|
|
userspace code. Don't use <code>HTMLPurifier</code> in front of your
|
|
class, since it might clobber another class in the library.</li>
|
|
<li>In order to keep the interface compatible, we've extended HTMLPurifier
|
|
into a new class that preserves the YouTube videos. This means that
|
|
all you have to do is replace all instances of
|
|
<code>new HTMLPurifier</code> to <code>new
|
|
HTMLPurifierX_PreserveYouTube</code>. There's other ways to go about
|
|
doing this: if you were calling a function that wrapped HTML Purifier,
|
|
you could paste the PHP right there. If you wanted to be really
|
|
fancy, you could make a decorator for HTMLPurifier.</li>
|
|
<li>The first preg_replace call replaces any YouTube code users may have
|
|
embedded into the benign span tag. Span is used because it is inline,
|
|
and objects are inline too. We are very careful to be extremely
|
|
restrictive on what goes inside the span tag, as if an errant code
|
|
gets in there it could get messy.</li>
|
|
<li>The HTML is then purified as usual.</li>
|
|
<li>Then, another preg_replace replaces the span tag with a fully fledged
|
|
object. Note that the embed is removed, and, in its place, a data
|
|
attribute was added to the object. This makes the tag standards
|
|
compliant! It also breaks Internet Explorer, so we add in a bit of
|
|
conditional comments with the old embed code to make it work again.
|
|
It's all quite convoluted but works.</li>
|
|
</ol>
|
|
|
|
<h2>Warning</h2>
|
|
|
|
<p>There are a number of possible problems with the code above, depending
|
|
on how you look at it.</p>
|
|
|
|
<h3>Cannot change width and height</h3>
|
|
|
|
<p>The width and height of the final YouTube movie cannot be adjusted. This
|
|
is because I am lazy. If you really insist on letting users change the size
|
|
of the movie, what you need to do is package up the attributes inside the
|
|
span tag (along with the movie ID). It gets complicated though: a malicious
|
|
user can specify an outrageously large height and width and attempt to crash
|
|
the user's operating system/browser. You need to either cap it by limiting
|
|
the amount of digits allowed in the regex or using a callback to check the
|
|
number.</p>
|
|
|
|
<h3>Trusts media's host's security</h3>
|
|
|
|
<p>By allowing this code onto our website, we are trusting that YouTube has
|
|
tech-savvy enough people not to allow their users to inject malicious
|
|
code into the Flash files. An exploit on YouTube means an exploit on your
|
|
site. Even though YouTube is run by the reputable Google, it
|
|
<a href="http://ha.ckers.org/blog/20061213/google-xss-vuln/">doesn't</a>
|
|
mean they are
|
|
<a href="http://ha.ckers.org/blog/20061208/xss-in-googles-orkut/">invulnerable.</a>
|
|
You're putting a certain measure of the job on an external provider (just as
|
|
you have by entrusting your user input to HTML Purifier), and
|
|
it is important that you are cognizant of the risk.</p>
|
|
|
|
<h3>Poorly written adaptations compromise security</h3>
|
|
|
|
<p>This should go without saying, but if you're going to adapt this code
|
|
for Google Video or the like, make sure you do it <em>right</em>. It's
|
|
extremely easy to allow a character too many in the final section and
|
|
suddenly you're introducing XSS into HTML Purifier's XSS free output. HTML
|
|
Purifier may be well written, but it cannot guard against vulnerabilities
|
|
introduced after it has finished.</p>
|
|
|
|
<h2>Future plans</h2>
|
|
|
|
<p>It would probably be a good idea if this code was added to the core
|
|
library. Look out for the inclusion of this into the core as a decorator
|
|
or the like.</p>
|
|
|
|
</body>
|
|
</html> |