[2.1.2] Merge in Brett Zamir's patches.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1397 48356398-32a2-884e-a903-53898d9a118a
2025-04-25 03:44:37 +00:00 · 2007-08-26 18:20:46 +00:00 · 2007-08-26 18:20:46 +00:00 · 29c3c21b34
commit 29c3c21b34
parent e45cc503a2
5 changed files with 59 additions and 36 deletions
--- a/1
+++ b/1
@ -27,6 +27,7 @@ NEWS ( CHANGELOG and HISTORY )                                     HTMLPurifier
 - Hammer out a bunch of edge-case bugs in the standalone distribution
 - Inclusion reflection removed from URISchemeRegistry; you must manually
  include any new schema files you wish to use
+- Numerous typo fixes in documentation thanks to Brett Zamir
 . Unit test refactoring for one logical test per test function
 . Config and context parameters in ComplexHarness deprecated: instead, edit
  the $config and $context member variables
--- a/docs/enduser-customize.html
+++ b/docs/enduser-customize.html
@ -32,7 +32,7 @@
  Before we even write any code, it is paramount to consider whether or
  not the code we're writing is necessary or not. HTML Purifier, by default,
  contains a large set of elements and attributes: large enough so that
-  <em>any</em> element or attribute in XHTML 1.0 (and its HTML variant)
+  <em>any</em> element or attribute in XHTML 1.0 or 1.1 (and its HTML variants)
  that can be safely used by the general public is implemented.
 </p>

@ -76,11 +76,12 @@
 <h3>XHTML 1.1</h3>

 <p>
-  We have not implemented the
+  As of HTMLPurifier 2.1.0, we have implemented the
  <a href="http://www.w3.org/TR/2001/REC-ruby-20010531/">Ruby module</a>,
  which defines a set of tags
  for publishing short annotations for text, used mostly in Japanese
-  and Chinese school texts.
+  and Chinese school texts, but applicable for positioning any text (not
+  limited to translations) above or below other corresponding text.
 </p>

 <h3>XHTML 2.0</h3>
@ -492,10 +493,11 @@ $def =& $config->getHTMLDefinition(true);
 <p>
  The <code>(%flow;)*</code> indicates the allowed children of the
  <code>li</code> tag: <code>li</code> allows any number of flow
-  elements as its children. In HTML Purifier, we'd write it like
-  <code>Flow</code> (here's where the content sets we were
-  discussing earlier come into play). There are three shorthand content models you
-  can specify:
+  elements as its children. (The <code>- O</code> allows the closing tag to be 
+  omitted, though in XML this is not allowed.) In HTML Purifier, 
+  we'd write it like <code>Flow</code> (here's where the content sets 
+  we were discussing earlier come into play). There are three shorthand
+  content models you can specify:
 </p>

 <table class="table">
@ -668,12 +670,22 @@ $def =& $config->getHTMLDefinition(true);
  Common is a combination of the above-mentioned collections.
 </p>

+<p class="aside">
+  Readers familiar with the modularization may have noticed that the Core
+  attribute collection differs from that specified by the <a
+  href="http://www.w3.org/TR/xhtml-modularization/abstract_modules.html#s_commonatts">abstract
+  modules of the XHTML Modularization 1.1</a>. We believe this section
+  to be in error, as <code>br</code> permits the use of the <code>style</code>
+  attribute even though it uses the <code>Core</code> collection, and 
+  the DTD and XML Schemas supplied by W3C support our interpretation.
+</p>
+
 <h3>Attributes</h3>

 <p>
-  If you didn't read the <a href="#addAttribute">previous section on
+  If you didn't read the <a href="#addAttribute">earlier section on
  adding attributes</a>, read it now.  The last parameter is simply
-  array of attribute names to attribute implementations, in the exact
+  an array of attribute names to attribute implementations, in the exact
  same format as <code>addAttribute()</code>.
 </p>

--- a/docs/enduser-id.html
+++ b/docs/enduser-id.html
@ -58,7 +58,7 @@ appear elsewhere on the document.  The method is simple:</p>

 <pre>$config->set('HTML', 'EnableAttrID', true);
 $config->set('Attr', 'IDBlacklist' array(
-    'list', 'of', 'attributes', 'that', 'are', 'forbidden'
+    'list', 'of', 'attribute', 'values', 'that', 'are', 'forbidden'
 ));</pre>

 <p>That being said, there are some notable drawbacks.  First of all, you have to
@ -71,9 +71,9 @@ to possible standards-compliance issues.</p>
 <p>Furthermore, this position becomes untenable when a single web page must hold
 multiple portions of user-submitted content.  Since there's obviously no way
 to find out before-hand what IDs users will use, the blacklist is helpless.
-And even since HTML Purifier validates each segment seperately, perhaps doing
+And since HTML Purifier validates each segment separately, perhaps doing
 so at different times, it would be extremely difficult to dynamically update
-the blacklist inbetween runs.</p>
+the blacklist in between runs.</p>

 <p>Finally, simply destroying the ID is extremely un-userfriendly behavior: after
 all, they might have simply specified a duplicate ID by accident.</p>
--- a/docs/enduser-tidy.html
+++ b/docs/enduser-tidy.html
@ -22,7 +22,7 @@ out:</p>

 <p class="emphasis">This ain't HTML Tidy!</p>

-<p>Rather, Tidy stands for a cool set of Tidy-inspired in HTML Purifier
+<p>Rather, Tidy stands for a cool set of Tidy-inspired features in HTML Purifier
 that allows users to submit deprecated elements and attributes and get
 valid strict markup back. For example:</p>

@ -33,8 +33,8 @@ valid strict markup back. For example:</p>
 <pre>&lt;div style=&quot;text-align:center;&quot;&gt;Centered&lt;/div&gt;</pre>

 <p>...when this particular fix is run on the HTML. This tutorial will give
-you down the lowdown of what exactly HTML Purifier will do when Tidy
-is on, and how to fine tune this behavior. Once again, <strong>you do
+you the lowdown of what exactly HTML Purifier will do when Tidy
+is on, and how to fine-tune this behavior. Once again, <strong>you do
 not need Tidy installed on your PHP to use these features!</strong></p>

 <h2>What does it do?</h2>
@ -221,7 +221,7 @@ general syntax:</p>

 <p>The lowdown is, quite frankly, HTML Purifier's default settings are
 probably good enough. The next step is to bump the level up to heavy,
-and if that still doesn't satisfy your appetite, do some fine tuning.
+and if that still doesn't satisfy your appetite, do some fine-tuning.
 Other than that, don't worry about it: this all works silently and
 effectively in the background.</p>

--- a/docs/enduser-utf8.html
+++ b/docs/enduser-utf8.html
@ -96,7 +96,7 @@ which can be a rewarding (but difficult) task.</p>
 <h2 id="findcharset">Finding the real encoding</h2>

 <p>In the beginning, there was ASCII, and things were simple. But they
-weren't good, for no one could write in Cryllic or Thai. So there
+weren't good, for no one could write in Cyrillic or Thai. So there
 exploded a proliferation of character encodings to remedy the problem
 by extending the characters ASCII could express. This ridiculously
 simplified version of the history of character encodings shows us that
@ -138,7 +138,7 @@ browser:</p>
    <dd>View &gt; Encoding: bulleted item is unofficial name</dd>
 </dl>

-<p>Internet Explorer won't give you the mime (i.e. useful/real) name of the
+<p>Internet Explorer won't give you the MIME (i.e. useful/real) name of the
 character encoding, so you'll have to look it up using their description.
 Some common ones:</p>

@ -216,6 +216,12 @@ if your <code>META</code> tag claims that either:</p>

 <h2 id="fixcharset">Fixing the encoding</h2>

+<p class="aside">The advice given here is for pages being served as
+vanilla <code>text/html</code>.  Different practices must be used
+for <code>application/xml</code> or <code>application/xml+xhtml</code>, see
+<a href="http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020430/">W3C's
+document on XHTML media types</a> for more information.</p>
+
 <p>If your <code>META</code> encoding and your real encoding match,
 savvy! You can skip this section. If they don't...</p>

@ -302,7 +308,8 @@ languages</a>. The appropriate code is:</p>

 <p>...replacing UTF-8 with whatever your embedded encoding is.
 This code must come before any output, so be careful about
-stray whitespace in your application.</p>
+stray whitespace in your application (i.e., any whitespace before 
+output excluding whitespace within &lt;?php ?&gt; tags).</p>

 <h4 id="fixcharset-server-phpini">PHP ini directive</h4>

@ -313,8 +320,8 @@ header call: <code><a href="http://php.net/ini.core#ini.default-charset">default

 <p>...will also do the trick. If PHP is running as an Apache module (and
 not as FastCGI, consult
-<a href="http://php.net/phpinfo">phpinfo</a>() for details), you can even use htaccess do apply this property
-globally:</p>
+<a href="http://php.net/phpinfo">phpinfo</a>() for details), you can even use htaccess to apply this property
+across many PHP files:</p>

 <pre><a href="http://php.net/configuration.changes#configuration.changes.apache">php_value</a> default_charset &quot;UTF-8&quot;</pre>

@ -360,10 +367,11 @@ to send anything at all:</p>

 <pre><a href="http://httpd.apache.org/docs/1.3/mod/core.html#adddefaultcharset">AddDefaultCharset</a> Off</pre>

-<p>...making your <code>META</code> tags the sole source of
-character encoding information. In these cases, it is
-<em>especially</em> important to make sure you have valid <code>META</code>
-tags on your pages and all the text before them is ASCII.</p>
+<p>...making your internal charset declaration (usually the <code>META</code> tags)
+the sole source of character encoding 
+information. In these cases, it is <em>especially</em> important to make 
+sure you have valid <code>META</code> tags on your pages and all the 
+text before them is ASCII.</p>

 <blockquote class="aside"><p>These directives can also be
 placed in httpd.conf file for Apache, but
@ -428,28 +436,30 @@ IIS to change character encodings, I'd be grateful.</p>

 <p><code>META</code> tags are the most common source of embedded
 encodings, but they can also come from somewhere else: XML
-processing instructions. They look like:</p>
+Declarations. They look like:</p>

 <pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;</pre>

 <p>...and are most often found in XML documents (including XHTML).</p>

-<p>For XHTML, this processing instruction theoretically
+<p>For XHTML, this XML Declaration theoretically
 overrides the <code>META</code> tag. In reality, this happens only when the
 XHTML is actually served as legit XML and not HTML, which is almost always
 never due to Internet Explorer's lack of support for 
 <code>application/xhtml+xml</code> (even though doing so is often
-argued to be <a href="http://www.hixie.ch/advocacy/xhtml">good practice</a>).</p>
+argued to be <a href="http://www.hixie.ch/advocacy/xhtml">good 
+practice</a> and is required by the XHTML 1.1 specification).</p>

-<p>For XML, however, this processing instruction is extremely important.
+<p>For XML, however, this XML Declaration is extremely important.
 Since most webservers are not configured to send charsets for .xml files,
 this is the only thing a parser has to go on. Furthermore, the default
 for XML files is UTF-8, which often butts heads with more common
 ISO-8859-1 encoding (you see this in garbled RSS feeds).</p>

 <p>In short, if you use XHTML and have gone through the
-trouble of adding the XML header, make sure it jives
-with your <code>META</code> tags and HTTP headers.</p>
+trouble of adding the XML Declaration, make sure it jives
+with your <code>META</code> tags (which should only be present 
+if served in text/html) and HTTP headers.</p>

 <h3 id="fixcharset-internals">Inside the process</h3>

@ -545,7 +555,7 @@ an application that originally used ISO-8859-1 but switched to UTF-8
 when it became far to cumbersome to support foreign languages. Bots
 will now actually go through articles and convert character entities
 to their corresponding real characters for the sake of user-friendliness
-and searcheability. See
+and searchability. See
 <a href="http://meta.wikimedia.org/wiki/Help:Special_characters">Meta's
 page on special characters</a> for more details.
 </p></blockquote>
@ -609,7 +619,7 @@ since UTF-8 supports every character.</p>

 <h4 id="whyutf8-forms-multipart"><code>multipart/form-data</code></h4>

-<p>Multipart form submission takes a way a lot of the ambiguity
+<p>Multipart form submission takes away a lot of the ambiguity
 that percent-encoding had: the server now can explicitly ask for
 certain encodings, and the client can explicitly tell the server
 during the form submission what encoding the fields are in.</p>
@ -678,7 +688,7 @@ set the encoding correctly using %Core.Encoding):</p>

 <ul>
    <li>The <code>Encoder</code> will transform the text from ISO 8859-1 to UTF-8
-        (note that theta is preserved since it doesn't actually use
+        (note that theta is preserved here since it doesn't actually use
        any non-ASCII characters): <code>&amp;theta;</code></li>
    <li>The <code>EntityParser</code> will transform all named and numeric
        character entities to their corresponding raw UTF-8 equivalents:
@ -723,7 +733,7 @@ by the target encoding, but that would require reimplementing iconv
 with HTML awareness, something I will not do.</p>

 <p>So there: either it's UTF-8 or crippled international support. Your pick! (and I'm
-not being sarcastic here: some people could care less about other languages)</p>
+not being sarcastic here: some people could care less about other languages).</p>

 <h2 id="migrate">Migrate to UTF-8</h2>

@ -985,7 +995,7 @@ and yes, it is variable width. Other traits:</p>
 in different ways. It is beyond the scope of this document to explain
 what precisely these implications are. PHPWact provides
 a very good <a href="http://www.phpwact.org/php/i18n/utf-8">reference document</a>
-on what to expect from each functions, although coverage is spotty in
+on what to expect from each function, although coverage is spotty in
 some areas. Their more general notes on
 <a href="http://www.phpwact.org/php/i18n/charsets">character sets</a>
 are also worth looking at for information on UTF-8. Some rules of thumb