0
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2025-01-03 13:21:51 +00:00

Commit benchmarking code.

git-svn-id: http://htmlpurifier.org/svnroot/html_purifier/trunk@28 48356398-32a2-884e-a903-53898d9a118a
This commit is contained in:
Edward Z. Yang 2006-04-16 00:35:34 +00:00
parent 53b738c1e4
commit 20e027646d
4 changed files with 270 additions and 0 deletions

72
benchmarks/HTML_Lexer.php Normal file
View File

@ -0,0 +1,72 @@
<?php
// test our parser versus HTMLSax parser
set_time_limit(5);
// PEAR
require_once 'Benchmark/Timer.php';
require_once 'XML/HTMLSax3.php';
require_once 'Text/Password.php';
require_once '../MarkupFragment.php';
require_once '../HTML_Lexer.php';
?>
<html>
<head>
<title>Benchmark: HTML_Lexer versus HTMLSax</title>
</head>
<body>
<h1>Benchmark: HTML_Lexer versus HTMLSax</h1>
<?php
function do_benchmark($document) {
$timer = new Benchmark_Timer();
$timer->start();
$lexer = new HTML_Lexer();
$tokens = $lexer->tokenizeHTML($document);
$timer->setMarker('HTML_Lexer');
$lexer = new HTML_Lexer_Sax();
$sax_tokens = $lexer->tokenizeHTML($document);
$timer->setMarker('HTML_Lexer_Sax');
$timer->stop();
$timer->display();
}
// sample of html pages
$dir = 'samples/HTML_Lexer';
$dh = opendir($dir);
while (false !== ($filename = readdir($dh))) {
if (strpos($filename, '.html') !== strlen($filename) - 5) continue;
$document = file_get_contents($dir . '/' . $filename);
echo "<h2>File: $filename</h2>\n";
do_benchmark($document);
}
// crashers
$snippets = array();
$snippets[] = '<a href="foo>';
$snippets[] = '<a "=>';
foreach ($snippets as $snippet) {
echo '<h2>' . htmlentities($snippet) . '</h2>';
do_benchmark($snippet);
}
// random input
$document = Text_Password::create(80, 'unpronounceable', 'qwerty <>="\'');
echo "<h2>Random input</h2>\n";
echo '<p style="font-family:monospace;">' . htmlentities($document) . '</p>';
do_benchmark($document);
?></body></html>

View File

@ -0,0 +1,53 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>Main Page - Huaxia Taiji Club</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" media="screen, projection" href="/screen.css" />
<link rel="stylesheet" type="text/css" media="print" href="/print.css" />
</head>
<body>
<div id="translation"><a href="/ch/Main_Page">&#20013;&#25991;</a></div>
<div id="heading"><a href="/en/Main_Page" title="English Main Page">Huaxia Taiji Club</a>
<a class="heading_ch" href="/ch/Main_Page" title="&#20013;&#25991;&#20027;&#39029;">&#21326;&#22799;&#22826;&#26497;&#20465;&#20048;&#37096;</a></div>
<ul id="menu">
<li><a href="/en/Main_Page" class="active">Main Page</a></li><li><a href="/en/About">About</a></li><li><a href="/en/News">News</a></li><li><a href="/en/Events">Events</a></li><li><a href="/en/Digest">Digest</a></li><li><a href="/en/Taiji_and_I">Taiji and I</a></li><li><a href="/en/Downloads">Downloads</a></li><li><a href="/en/Registration">Registration</a></li><li><a href="/en/Contact">Contact</a></li> <li><a href="http://www.taijiclub.org/gallery2/main.php">Gallery</a></li>
<li><a href="http://www.taijiclub.org/forums/index.php">Forums</a></li>
</ul>
<div id="content">
<h1 id="title">Main Page</h1><h2>Taiji (Tai Chi) </h2>
<div id="sidebar">
<h3>Recent News</h3>
<ul>
<li>Zou Xiaojun was elected as the new club vice president </li>
<li>HX Edison Taiji Club <a href="http://www.taijiclub.org/downloads/Taiji_club_regulation_.pdf">by-law</a> effective 3/28/2006</li>
<li>A new email account for our club: HXEdisontaijiclub@yahoo.com</li>
<li>Workshop conducted by <a href="http://www.taijiclub.org/ch/Digest/LiDeyin">?????</a> Li Deyin is set on June 4, 2006 at Clarion Hotel in Edison from 9:30am-12pm; <a href="http://www.taijiclub.org/en/Registration">Registration</a></li>
</ul>
</div>
<p><i>Taiji</i> is an ancient Chinese tradition of movement systems that is associated with philosophy, physiology, psychology, geometry and dynamics. It is the slowest form of martial arts and is meant to improve the internal spirit. It is soothing to the soul and extremely invigorating. </p>
<p>The founder of Taiji was Zhang Sanfeng (Chang San-feng), who was a monk of the Wu Dang (Wu Tang) Monastery and lived in the period from 1391 to 1459. His exercises stressed suppleness and elasticity as opposed to the hardness and force of other martial art styles. Several centuries old, Taiji was originally developed as a form of self-defense, emphasizing strength, balance, flexibility and speed. Tai Chi also differs from other martial arts in that it is based on the Taoist religion and aims to avoid aggressive forces. </p>
<p>Modern Taiji includes many forms &mdash; Quan, Sword and Fan. Impacting the mind and body of the practitioners, Taiji is practiced as a meditative exercise made up of a series of forms, or choreographed motions, requiring slow, gentle movement of the arms, legs and torso. Taiji practitioners learn to center their attention on their breathing and body movements so that the exercise strengthens their overall mental and physical awareness. In a sense, Taiji is similar to yoga in that it is also a form of moving meditation, with the goal of achieving stillness through the motion and awareness of breath. To perform Taiji, practitioners have to empty their mind of thoughts and worries in order to achieve harmony. It is a great aid for reducing stress and improving the quality of life. </p>
<p>In China and in communities all over the world, Taiji is practiced by young and old in the early morning hours. It's a great way to bring a new and fresh day!</p>
<p>Check out our <a href="/gallery2/main.php">gallery</a>.</p>
<div style="text-align:center;"><a href="http://www.taijiclub.org/gallery2/v/2006/group1b.jpg.html?g2_imageViewsIndex=1"><img src="/gallery2/d/1836-2/group1b.jpg" /></a></div>
<div style="text-align:center;">Click on photo to see HR version</div></div>
</body>
</html>

View File

@ -0,0 +1,17 @@
<html><head><meta http-equiv="content-type" content="text/html; charset=UTF-8"><title>Google</title><style><!--
body,td,a,p,.h{font-family:arial,sans-serif;}
.h{font-size: 20px;}
.q{color:#0000cc;}
//-->
</style>
<script>
<!--
function sf(){document.f.q.focus();}
function rwt(el,ct,cd,sg){var e = window.encodeURIComponent ? encodeURIComponent : escape;el.href="/url?sa=t&ct="+e(ct)+"&cd="+e(cd)+"&url="+e(el.href).replace(/\+/g,"%2B")+"&ei=fHNBRJDEG4HSaLONmIoP"+sg;el.onmousedown="";return true;}
// -->
</script>
</head><body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b alink=#ff0000 onLoad=sf() topmargin=3 marginheight=3><center><table border=0 cellspacing=0 cellpadding=0 width=100%><tr><td align=right nowrap><font size=-1><b>edwardzyang@gmail.com</b>&nbsp;|&nbsp;<a href="/url?sa=p&pref=ig&pval=2&q=http://www.google.com/ig%3Fhl%3Den" onmousedown="return rwt(this,'pro','hppphou:def','&sig2=hDbTpsWIp9YG37a23n6krQ')">Personalized Home</a>&nbsp;|&nbsp;<a href="/searchhistory/?hl=en">Search History</a>&nbsp;|&nbsp;<a href="https://www.google.com/accounts/ManageAccount">My Account</a>&nbsp;|&nbsp;<a href="http://www.google.com/accounts/Logout?continue=http://www.google.com/">Sign out</a></font></td></tr><tr height=4><td><img alt="" width=1 height=1></td></tr></table><img src="/intl/en/images/logo.gif" width=276 height=110 alt="Google"><br><br>
<form action=/search name=f><script><!--
function qs(el) {if (window.RegExp && window.encodeURIComponent) {var ue=el.href;var qe=encodeURIComponent(document.f.q.value);if(ue.indexOf("q=")!=-1){el.href=ue.replace(new RegExp("q=[^&$]*"),"q="+qe);}else{el.href=ue+"&q="+qe;}}return 1;}
// -->
</script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><b>Web</b>&nbsp;&nbsp;&nbsp;&nbsp;<a id=1a class=q href="/imghp?hl=en&tab=wi" onClick="return qs(this);">Images</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=2a class=q href="http://groups.google.com/grphp?hl=en&tab=wg" onClick="return qs(this);">Groups</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=4a class=q href="http://news.google.com/nwshp?hl=en&tab=wn" onClick="return qs(this);">News</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=5a class=q href="http://froogle.google.com/frghp?hl=en&tab=wf" onClick="return qs(this);">Froogle</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=8a class=q href="/lochp?hl=en&tab=wl" onClick="return qs(this);">Local</a>&nbsp;&nbsp;&nbsp;&nbsp;<b><a href="/intl/en/options/" class=q>more&nbsp;&raquo;</a></b></font></td></tr></table><table cellspacing=0 cellpadding=0><tr><td width=25%>&nbsp;</td><td align=center><input type=hidden name=hl value=en><input maxlength=2048 size=55 name=q value="" title="Google Search"><br><input type=submit value="Google Search" name=btnG><input type=submit value="I'm Feeling Lucky" name=btnI></td><td valign=top nowrap width=25%><font size=-2>&nbsp;&nbsp;<a href=/advanced_search?hl=en>Advanced Search</a><br>&nbsp;&nbsp;<a href=/preferences?hl=en>Preferences</a><br>&nbsp;&nbsp;<a href=/language_tools?hl=en>Language Tools</a></font></td></tr></table></form><br><br><font size=-1><a href="/ads/">Advertising&nbsp;Programs</a> - <a href=/services/>Business Solutions</a> - <a href=/about.html>About Google</a></font><p><font size=-2>&copy;2006 Google</font></p></center></body></html>

View File

@ -0,0 +1,128 @@
<html>
<head>
<title>Anime Digi-Lib Index</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<div id="tb">
<form name="lycos_search" method="get" target="_new" style="margin: 0px"
action="http://r.hotbot.com/r/memberpgs_lycos_searchbox_af/http://www.angelfire.lycos.com/cgi-bin/search/pursuit">
<table id="tbtable" cellpadding="0" cellspacing="0" border="0" width="100%" style="border: 1px solid black;">
<tr style="background-color: #dcf7ff">
<td colspan="3">
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td>&nbsp;Search:</td>
<td><input type="radio" name="cat" value="lycos" checked></td>
<td nowrap="nowrap">The Web</td>
<td><input type="radio" name="cat" value="angelfire"></td>
<td nowrap="nowrap">Angelfire</td>
<td nowrap="nowrap">&nbsp;&nbsp;&nbsp;<img src="http://af.lygo.com/d/toolbar/planeticon.gif"></td><td nowrap="nowrap">&nbsp;<a href="http://r.lycos.com/r/tlbr_planet/http://planet.lycos.com" target="_new">Planet</a></td>
</tr>
</table>
<td nowrap="nowrap"><a href="http://lt.angelfire.com/af_toolbar/edit/_h_/www.angelfire.lycos.com/build/index.tmpl" target="_top">
<span id="build">Edit your Site</span></a>&nbsp;</td>
<td><img src="http://af.lygo.com/d/toolbar/dir.gif" alt="show site directory" border="0" height="10" hspace="3" width="8"></td>
<td nowrap="nowrap"><a href="http://lt.angelfire.com/af_toolbar/browse/_h_/www.angelfire.lycos.com/directory/index.tmpl" target="_top">Browse Sites</a>&nbsp;</td>
<td><a href="http://lt.angelfire.com/af_toolbar/angelfire/_h_/www.angelfire.lycos.com" target="_top"><img src="http://af.lygo.com/d/toolbar/aflogo_top.gif" alt="hosted by angelfire" border="0" height="26" width="143"></a></td>
</tr>
<tr style="background-color: #dcf7ff">
<td nowrap="nowrap" valign="middle">&nbsp;<input size="30" style="font-size: 10px; background-color: #fff;" type="text" name="query" id="searchbox"></td>
<td style="background: #fff url(http://af.lygo.com/d/toolbar/bg.gif) repeat-x; text-align: center;" colspan="3" align="center">
<a href="http://clk.atdmt.com/VON/go/lycsnvon0710000019von/direct/01/"><img src="/sys/free_logo_xxxx_157x20.gif" height="20" width="157" border="0" alt="Vonage"></a><img src="http://view.atdmt.com/VON/view/lycsnvon0710000019von/direct/01/"></td>
<span style="font-size: 11px;">
<span style="color:#00f; font-weight:bold;">&#171;</span>
<span id="top100">
<a href="javascript:void top100('prev')" target="_top">Previous</a> |
<a href="http://lt.angelfire.com/af_toolbar/top100/_h_/www.angelfire.lycos.com/cgi-bin/top100/pagelist?start=1" target="_top">Top 100</a> |
<a href="javascript:void top100('next')" target="_top">Next</a>
</span>
<span style="color: #00f; font-weight: bold;">&#187;</span>
</span>
</td>
<td valign="top" style="background: #fff url(http://af.lygo.com/d/toolbar/bg.gif) repeat-x;"><a href="http://lt.angelfire.com/af_toolbar/angelfire/_h_/www.angelfire.lycos.com" target="_top"><img src="http://af.lygo.com/d/toolbar/aflogo_bot.gif" alt="hosted by angelfire" border="0" height="22" width="143"></a></td>
</tr>
</table>
</form>
</div>
<table border="0" cellpadding="0" cellspacing="0" width="728"><tr><td>
<script type="text/javascript">
if (objAdMgr.isSlotAvailable("leaderboard")) {
objAdMgr.renderSlot("leaderboard")
}
</script>
<noscript>
<a href="http://network.realmedia.com/RealMedia/ads/click_nx.ads/lycosangelfire/ros/728x90/wp/ss/a/491169@Top1?x"><img border="0" src="http://network.realmedia.com/RealMedia/ads/adstream_nx.ads/lycosangelfire/ros/728x90/wp/ss/a/491169@Top1" alt="leaderboard ad" /></a>
</noscript>
</td></tr>
</table>
<table width="86%" border="0" cellspacing="0" cellpadding="2">
<tr>
<td height="388" width="19%" bgcolor="#FFCCFF" valign="top">
<p>May 1, 2000</p>
<p><b>Pop Culture</b> </p>
<p>by. H. Finkelstein</p>
</td>
<td height="388" width="52%" valign="top">
<p>Welcome to the <b>Anime Digi-Lib</b>, a virtual index to anime on the
internet. This site strives to house a comprehensive index to both personal
and commercial websites and provides reviews to these sites. We hope to
be a gateway for people who've never imagined they'd ever be interested
in Japanese Animation. </p>
<table width="99%" border="1" cellspacing="0" cellpadding="2" height="320" name="Searchnservices">
<tr>
<td height="263" valign="top" width="58%">
<p>&nbsp; </p>
<p>&nbsp;</p>
<FORM ACTION="/cgi-bin/script_library/site_search/search" METHOD="GET">
<table border="0" cellpadding="2" cellspacing="0">
<tr>
<td colspan="2">Search term: <INPUT NAME="search_term"><br></td>
</tr>
<tr>
<td colspan="2" align="center">Case-sensitive -
<INPUT TYPE="checkbox" NAME="case_sensitive">yes<br></td>
</tr>
<tr>
<td align="right"><INPUT TYPE="radio" NAME="search_type" VALUE="exact" CHECKED>exact</td>
<td><INPUT TYPE="radio" NAME="search_type" VALUE="fuzzy">fuzzy<br></td>
</tr>
<tr>
<td colspan="2" align="center"><INPUT TYPE="hidden" NAME="display" VALUE="#FF0000"><INPUT TYPE="submit"></td>
</tr>
</table>
</form>
<td>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr><td><font face="verdana,geneva" color="#000011" size="1">What is better, subtitled or dubbed anime?</font></td></tr>
<tr><td><input type="radio" name="rd" value="1"><font face="verdana" size="2" color="#000011">Subtitled</font></td></tr>
<tr><td align="middle"><font face="verdana" size="1"><a href="http://pub.alxnet.com/poll?id=2079873&q=view">Current results</a></font></td></tr>
</table></td></tr>
<tr>
<td><font face="verdana" size="1"><a href="http://www.alxnet.com/services/poll/">Free
Web Polls</a></font></td>
</tr>
</table></form>
<!-- Alxnet.com -- web poll code ends -->
</td>
</tr>
</table>
</body>
</html>