mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2025-01-18 11:41:52 +00:00
[1.3.2] Added enduser-youtube.html, explains how to embed YouTube videos. See also corresponding smoketest preserveYouTube.php.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@612 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
parent
360f984f63
commit
48da08ab78
2
NEWS
2
NEWS
@ -17,6 +17,8 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
|
|||||||
! HTMLPurifier object now accepts configuration arrays, no need to manually
|
! HTMLPurifier object now accepts configuration arrays, no need to manually
|
||||||
instantiate a configuration object
|
instantiate a configuration object
|
||||||
! Context object now accessible to outside
|
! Context object now accessible to outside
|
||||||
|
! Added enduser-youtube.html, explains how to embed YouTube videos. See
|
||||||
|
also corresponding smoketest preserveYouTube.php.
|
||||||
- printDefinition.php: added labels, added better clarification
|
- printDefinition.php: added labels, added better clarification
|
||||||
. HTMLPurifier_Config::create() added, takes mixed variable and converts into
|
. HTMLPurifier_Config::create() added, takes mixed variable and converts into
|
||||||
a HTMLPurifier_Config object.
|
a HTMLPurifier_Config object.
|
||||||
|
174
docs/enduser-youtube.html
Normal file
174
docs/enduser-youtube.html
Normal file
@ -0,0 +1,174 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||||
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
|
||||||
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||||||
|
<meta name="description" content="Explains how to safely allow the embedding of flash from trusted sites." />
|
||||||
|
<link rel="stylesheet" type="text/css" href="./style.css" />
|
||||||
|
|
||||||
|
<title>Embedding YouTube Videos - HTML Purifier</title>
|
||||||
|
|
||||||
|
</head><body>
|
||||||
|
|
||||||
|
<h1 class="subtitled">Embedding YouTube Videos</h1>
|
||||||
|
<div class="subtitle">...as well as other dangerous active content</div>
|
||||||
|
|
||||||
|
<div id="filing">Filed under End-User</div>
|
||||||
|
<div id="index">Return to the <a href="index.html">index</a>.</div>
|
||||||
|
|
||||||
|
<p>Clients like their YouTube videos. It gives them a warm fuzzy feeling when
|
||||||
|
they see a neat little embedded video player on their websites that can play
|
||||||
|
the latest clips from their documentary "Fido and the Bones of Spring".
|
||||||
|
All joking aside, the ability to embed YouTube videos or other active
|
||||||
|
content in their pages is something that a lot of people like.</p>
|
||||||
|
|
||||||
|
<p>This is a <em>bad</em> idea. The moment you embed anything untrusted,
|
||||||
|
you will definitely be slammed by a manner of nasties that can be
|
||||||
|
embedded in things from your run of the mill Flash movie to
|
||||||
|
<a href="http://blog.spywareguide.com/2006/12/myspace_phish_attack_leads_use.html">Quicktime movies</a>.
|
||||||
|
Allowing users to tell the browser to load content from other websites
|
||||||
|
is intrinsically dangerous: there already security risks associated with
|
||||||
|
letting users include images from other sites!</p>
|
||||||
|
|
||||||
|
<p>Luckily for us, however, whitelisting saves the day. Sure, letting users
|
||||||
|
include any old random flash file could be dangerous, but if it's
|
||||||
|
from a specific website, it probably is okay. If no amount of pleading will
|
||||||
|
convince the people upstairs that they should just settle with just linking
|
||||||
|
to their movies, you may find this technique very useful.</p>
|
||||||
|
|
||||||
|
<h2>Sample</h2>
|
||||||
|
|
||||||
|
<p>Below is custom code that allows users to embed
|
||||||
|
YouTube videos. This is not favoritism: this trick can easily be adapted for
|
||||||
|
other forms of embeddable content.</p>
|
||||||
|
|
||||||
|
<p>Usually, websites like YouTube give us boilerplate code that you can insert
|
||||||
|
into your documents. YouTube's code goes like this:</p>
|
||||||
|
|
||||||
|
<pre>
|
||||||
|
<object width="425" height="350">
|
||||||
|
<param name="movie" value="http://www.youtube.com/v/AyPzM5WK8ys" />
|
||||||
|
<param name="wmode" value="transparent" />
|
||||||
|
<embed src="http://www.youtube.com/v/AyPzM5WK8ys"
|
||||||
|
type="application/x-shockwave-flash"
|
||||||
|
wmode="transparent" width="425" height="350" />
|
||||||
|
</object>
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<p>There are two things to note about this code:</p>
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li><code><embed></code> is not recognized by W3C, so if you want
|
||||||
|
standards-compliant code, you'll have to get rid of it.</li>
|
||||||
|
<li>The code is exactly the same for all instances, except for the
|
||||||
|
identifier <tt>AyPzM5WK8ys</tt> which tells us which movie file
|
||||||
|
to retrieve.</li>
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<p>What point 2 means is that if we have code like <code><span
|
||||||
|
class="embed-youtube">AyPzM5WK8ys</span></code> your
|
||||||
|
application can reconstruct the full object from this small snippet that
|
||||||
|
passes through HTML Purifier <em>unharmed</em>.</p>
|
||||||
|
|
||||||
|
<pre>
|
||||||
|
<?php
|
||||||
|
|
||||||
|
class HTMLPurifierX_PreserveYouTube extends HTMLPurifier
|
||||||
|
{
|
||||||
|
function purify($html, $config = null) {
|
||||||
|
$pre_regex = '#<object[^>]+>.+?'.
|
||||||
|
'http://www.youtube.com/v/([A-Za-z0-9]+).+?</object>#';
|
||||||
|
$pre_replace = '<span class="youtube-embed">\1</span>';
|
||||||
|
$html = preg_replace($pre_regex, $pre_replace, $html);
|
||||||
|
$html = parent::purify($html, $config);
|
||||||
|
$post_regex = '#<span class="youtube-embed">([A-Za-z0-9]+)</span>#';
|
||||||
|
$post_replace = '<object width="425" height="350" '.
|
||||||
|
'data="http://www.youtube.com/v/\1">'.
|
||||||
|
'<param name="movie" value="http://www.youtube.com/v/\1"></param>'.
|
||||||
|
'<param name="wmode" value="transparent"></param>'.
|
||||||
|
'<!--[if IE]>'.
|
||||||
|
'<embed src="http://www.youtube.com/v/\1"'.
|
||||||
|
'type="application/x-shockwave-flash"'.
|
||||||
|
'wmode="transparent" width="425" height="350" />'.
|
||||||
|
'<![endif]-->'.
|
||||||
|
'</object>';
|
||||||
|
$html = preg_replace($post_regex, $post_replace, $html);
|
||||||
|
return $html;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
$purifier = new HTMLPurifierX_PreserveYouTube();
|
||||||
|
$html_still_with_youtube = $purifier->purify($html_with_youtube);
|
||||||
|
|
||||||
|
?>
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<p>There is a bit going on here, so let's explain.</p>
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li>The class uses the prefix <code>HTMLPurifierX</code> because it's
|
||||||
|
userspace code. Don't use <code>HTMLPurifier</code> in front of your
|
||||||
|
class, since it might clobber another class in the library.</li>
|
||||||
|
<li>In order to keep the interface compatible, we've extended HTMLPurifier
|
||||||
|
into a new class that preserves the YouTube videos. This means that
|
||||||
|
all you have to do is replace all instances of
|
||||||
|
<code>new HTMLPurifier</code> to <code>new
|
||||||
|
HTMLPurifierX_PreserveYouTube</code>. There's other ways to go about
|
||||||
|
doing this: if you were calling a function that wrapped HTML Purifier,
|
||||||
|
you could paste the PHP right there. If you wanted to be really
|
||||||
|
fancy, you could make a decorator for HTMLPurifier.</li>
|
||||||
|
<li>The first preg_replace call replaces any YouTube code users may have
|
||||||
|
embedded into the benign span tag. Span is used because it is inline,
|
||||||
|
and objects are inline too. We are very careful to be extremely
|
||||||
|
restrictive on what goes inside the span tag, as if an errant code
|
||||||
|
gets in there it could get messy.</li>
|
||||||
|
<li>The HTML is then purified as usual.</li>
|
||||||
|
<li>Then, another preg_replace replaces the span tag with a fully fledged
|
||||||
|
object. Note that the embed is removed, and, in its place, a data
|
||||||
|
attribute was added to the object. This makes the tag standards
|
||||||
|
compliant! It also breaks Internet Explorer, so we add in a bit of
|
||||||
|
conditional comments with the old embed code to make it work again.
|
||||||
|
It's all quite convoluted but works.</li>
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<h2>Warning</h2>
|
||||||
|
|
||||||
|
<p>There are a number of possible problems with the code above, depending
|
||||||
|
on how you look at it.</p>
|
||||||
|
|
||||||
|
<h3>Cannot change width and height</h3>
|
||||||
|
|
||||||
|
<p>The width and height of the final YouTube movie cannot be adjusted. This
|
||||||
|
is because I am lazy. If you really insist on letting users change the size
|
||||||
|
of the movie, what you need to do is package up the attributes inside the
|
||||||
|
span tag (along with the movie ID). It gets complicated though: a malicious
|
||||||
|
user can specify an outrageously large height and width and attempt to crash
|
||||||
|
the user's operating system/browser. You need to either cap it by limiting
|
||||||
|
the amount of digits allowed in the regex or using a callback to check the
|
||||||
|
number.</p>
|
||||||
|
|
||||||
|
<h3>Trusts YouTube's security</h3>
|
||||||
|
|
||||||
|
<p>By allowing this code onto our website, we are trusting that YouTube has
|
||||||
|
tech-savvy enough people not to allow their users to inject malicious
|
||||||
|
code into the Flash files. An exploit on YouTube means an exploit on your
|
||||||
|
site, and when you start allowing shadier sites, remember that trust
|
||||||
|
is important.</p>
|
||||||
|
|
||||||
|
<h3>Poorly written adaptations compromise security</h3>
|
||||||
|
|
||||||
|
<p>This should go without saying, but if you're going to adapt this code
|
||||||
|
for Google Video or the like, make sure you do it <em>right</em>. It's
|
||||||
|
extremely easy to allow a character too many in the final section and
|
||||||
|
suddenly you're introducing XSS into HTML Purifier's XSS free output. HTML
|
||||||
|
Purifier may be well written, but it cannot guard against vulnerabilities
|
||||||
|
introduced after it has finished.</p>
|
||||||
|
|
||||||
|
<h2>Future plans</h2>
|
||||||
|
|
||||||
|
<p>It would probably be a good idea if this code was added to the core
|
||||||
|
library. Look out for the inclusion of this into the core as a decorator
|
||||||
|
or the like.</p>
|
||||||
|
|
||||||
|
</body>
|
||||||
|
</html>
|
65
smoketests/preserveYouTube.php
Normal file
65
smoketests/preserveYouTube.php
Normal file
@ -0,0 +1,65 @@
|
|||||||
|
<?php
|
||||||
|
|
||||||
|
require_once 'common.php';
|
||||||
|
|
||||||
|
echo '<?xml version="1.0" encoding="UTF-8" ?>';
|
||||||
|
?><!DOCTYPE html
|
||||||
|
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||||
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>HTML Purifier Preserve YouTube Smoketest</title>
|
||||||
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1>HTML Purifier Preserve YouTube Smoketest</h1>
|
||||||
|
<?php
|
||||||
|
|
||||||
|
class HTMLPurifierX_PreserveYouTube extends HTMLPurifier
|
||||||
|
{
|
||||||
|
function purify($html, $config = null) {
|
||||||
|
$pre_regex = '#<object[^>]+>.+?'.
|
||||||
|
'http://www.youtube.com/v/([A-Za-z0-9]+).+?</object>#';
|
||||||
|
$pre_replace = '<span class="youtube-embed">\1</span>';
|
||||||
|
$html = preg_replace($pre_regex, $pre_replace, $html);
|
||||||
|
$html = parent::purify($html, $config);
|
||||||
|
$post_regex = '#<span class="youtube-embed">([A-Za-z0-9]+)</span>#';
|
||||||
|
$post_replace = '<object width="425" height="350" '.
|
||||||
|
'data="http://www.youtube.com/v/\1">'.
|
||||||
|
'<param name="movie" value="http://www.youtube.com/v/\1"></param>'.
|
||||||
|
'<param name="wmode" value="transparent"></param>'.
|
||||||
|
'<!--[if IE]>'.
|
||||||
|
'<embed src="http://www.youtube.com/v/\1"'.
|
||||||
|
'type="application/x-shockwave-flash"'.
|
||||||
|
'wmode="transparent" width="425" height="350" />'.
|
||||||
|
'<![endif]-->'.
|
||||||
|
'</object>';
|
||||||
|
$html = preg_replace($post_regex, $post_replace, $html);
|
||||||
|
return $html;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
$string = '<object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/JzqumbhfxRo"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/JzqumbhfxRo" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object>';
|
||||||
|
|
||||||
|
$regular_purifier = new HTMLPurifier();
|
||||||
|
$youtube_purifier = new HTMLPurifierX_PreserveYouTube();
|
||||||
|
|
||||||
|
?>
|
||||||
|
<h2>Unpurified</h2>
|
||||||
|
<p><a href="?break">Click here to see the unpurified version (breaks validation).</a></p>
|
||||||
|
<div><?php
|
||||||
|
if (isset($_GET['break'])) echo $string;
|
||||||
|
?></div>
|
||||||
|
|
||||||
|
<h2>Without YouTube exception</h2>
|
||||||
|
<div><?php
|
||||||
|
echo $regular_purifier->purify($string);
|
||||||
|
?></div>
|
||||||
|
|
||||||
|
<h2>With YouTube exception</h2>
|
||||||
|
<div><?php
|
||||||
|
echo $youtube_purifier->purify($string);
|
||||||
|
?></div>
|
||||||
|
|
||||||
|
</body>
|
||||||
|
</html>
|
Loading…
Reference in New Issue
Block a user