diff --git a/NEWS b/NEWS
index 8bd8c3b6..ca3e02eb 100644
--- a/NEWS
+++ b/NEWS
@@ -27,6 +27,7 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
- Hammer out a bunch of edge-case bugs in the standalone distribution
- Inclusion reflection removed from URISchemeRegistry; you must manually
include any new schema files you wish to use
+- Numerous typo fixes in documentation thanks to Brett Zamir
. Unit test refactoring for one logical test per test function
. Config and context parameters in ComplexHarness deprecated: instead, edit
the $config and $context member variables
diff --git a/docs/enduser-customize.html b/docs/enduser-customize.html
index 8e9fe1dd..8634021c 100644
--- a/docs/enduser-customize.html
+++ b/docs/enduser-customize.html
@@ -32,7 +32,7 @@
Before we even write any code, it is paramount to consider whether or
not the code we're writing is necessary or not. HTML Purifier, by default,
contains a large set of elements and attributes: large enough so that
- any element or attribute in XHTML 1.0 (and its HTML variant)
+ any element or attribute in XHTML 1.0 or 1.1 (and its HTML variants)
that can be safely used by the general public is implemented.
@@ -76,11 +76,12 @@
- We have not implemented the
+ As of HTMLPurifier 2.1.0, we have implemented the
Ruby module,
which defines a set of tags
for publishing short annotations for text, used mostly in Japanese
- and Chinese school texts.
+ and Chinese school texts, but applicable for positioning any text (not
+ limited to translations) above or below other corresponding text.
@@ -668,12 +670,22 @@ $def =& $config->getHTMLDefinition(true);
Common is a combination of the above-mentioned collections.
+
+ Readers familiar with the modularization may have noticed that the Core
+ attribute collection differs from that specified by the abstract
+ modules of the XHTML Modularization 1.1. We believe this section
+ to be in error, as br
permits the use of the style
+ attribute even though it uses the Core
collection, and
+ the DTD and XML Schemas supplied by W3C support our interpretation.
+
+
Attributes
- If you didn't read the previous section on
+ If you didn't read the earlier section on
adding attributes, read it now. The last parameter is simply
- array of attribute names to attribute implementations, in the exact
+ an array of attribute names to attribute implementations, in the exact
same format as addAttribute()
.
diff --git a/docs/enduser-id.html b/docs/enduser-id.html
index 8321a0a2..051ae7ca 100644
--- a/docs/enduser-id.html
+++ b/docs/enduser-id.html
@@ -58,7 +58,7 @@ appear elsewhere on the document. The method is simple:
$config->set('HTML', 'EnableAttrID', true);
$config->set('Attr', 'IDBlacklist' array(
- 'list', 'of', 'attributes', 'that', 'are', 'forbidden'
+ 'list', 'of', 'attribute', 'values', 'that', 'are', 'forbidden'
));
That being said, there are some notable drawbacks. First of all, you have to
@@ -71,9 +71,9 @@ to possible standards-compliance issues.
Furthermore, this position becomes untenable when a single web page must hold
multiple portions of user-submitted content. Since there's obviously no way
to find out before-hand what IDs users will use, the blacklist is helpless.
-And even since HTML Purifier validates each segment seperately, perhaps doing
+And since HTML Purifier validates each segment separately, perhaps doing
so at different times, it would be extremely difficult to dynamically update
-the blacklist inbetween runs.
+the blacklist in between runs.
Finally, simply destroying the ID is extremely un-userfriendly behavior: after
all, they might have simply specified a duplicate ID by accident.
diff --git a/docs/enduser-tidy.html b/docs/enduser-tidy.html
index b3f79f60..56c9b288 100644
--- a/docs/enduser-tidy.html
+++ b/docs/enduser-tidy.html
@@ -22,7 +22,7 @@ out:
This ain't HTML Tidy!
-Rather, Tidy stands for a cool set of Tidy-inspired in HTML Purifier
+
Rather, Tidy stands for a cool set of Tidy-inspired features in HTML Purifier
that allows users to submit deprecated elements and attributes and get
valid strict markup back. For example:
@@ -33,8 +33,8 @@ valid strict markup back. For example:
<div style="text-align:center;">Centered</div>
...when this particular fix is run on the HTML. This tutorial will give
-you down the lowdown of what exactly HTML Purifier will do when Tidy
-is on, and how to fine tune this behavior. Once again, you do
+you the lowdown of what exactly HTML Purifier will do when Tidy
+is on, and how to fine-tune this behavior. Once again, you do
not need Tidy installed on your PHP to use these features!
What does it do?
@@ -221,7 +221,7 @@ general syntax:
The lowdown is, quite frankly, HTML Purifier's default settings are
probably good enough. The next step is to bump the level up to heavy,
-and if that still doesn't satisfy your appetite, do some fine tuning.
+and if that still doesn't satisfy your appetite, do some fine-tuning.
Other than that, don't worry about it: this all works silently and
effectively in the background.
diff --git a/docs/enduser-utf8.html b/docs/enduser-utf8.html
index b8cee57d..9933f1dd 100644
--- a/docs/enduser-utf8.html
+++ b/docs/enduser-utf8.html
@@ -96,7 +96,7 @@ which can be a rewarding (but difficult) task.
Finding the real encoding
In the beginning, there was ASCII, and things were simple. But they
-weren't good, for no one could write in Cryllic or Thai. So there
+weren't good, for no one could write in Cyrillic or Thai. So there
exploded a proliferation of character encodings to remedy the problem
by extending the characters ASCII could express. This ridiculously
simplified version of the history of character encodings shows us that
@@ -138,7 +138,7 @@ browser:
View > Encoding: bulleted item is unofficial name
-Internet Explorer won't give you the mime (i.e. useful/real) name of the
+
Internet Explorer won't give you the MIME (i.e. useful/real) name of the
character encoding, so you'll have to look it up using their description.
Some common ones:
@@ -216,6 +216,12 @@ if your META
tag claims that either:
Fixing the encoding
+The advice given here is for pages being served as
+vanilla text/html
. Different practices must be used
+for application/xml
or application/xml+xhtml
, see
+W3C's
+document on XHTML media types for more information.
+
If your META
encoding and your real encoding match,
savvy! You can skip this section. If they don't...
@@ -302,7 +308,8 @@ languages. The appropriate code is:
...replacing UTF-8 with whatever your embedded encoding is.
This code must come before any output, so be careful about
-stray whitespace in your application.
+stray whitespace in your application (i.e., any whitespace before
+output excluding whitespace within <?php ?> tags).
PHP ini directive
@@ -313,8 +320,8 @@ header call: default
...will also do the trick. If PHP is running as an Apache module (and
not as FastCGI, consult
-phpinfo() for details), you can even use htaccess do apply this property
-globally:
+phpinfo() for details), you can even use htaccess to apply this property
+across many PHP files:
php_value default_charset "UTF-8"
@@ -360,10 +367,11 @@ to send anything at all:
AddDefaultCharset Off
-...making your META
tags the sole source of
-character encoding information. In these cases, it is
-especially important to make sure you have valid META
-tags on your pages and all the text before them is ASCII.
+...making your internal charset declaration (usually the META
tags)
+the sole source of character encoding
+information. In these cases, it is especially important to make
+sure you have valid META
tags on your pages and all the
+text before them is ASCII.
These directives can also be
placed in httpd.conf file for Apache, but
@@ -428,28 +436,30 @@ IIS to change character encodings, I'd be grateful.
META
tags are the most common source of embedded
encodings, but they can also come from somewhere else: XML
-processing instructions. They look like:
+Declarations. They look like:
<?xml version="1.0" encoding="UTF-8"?>
...and are most often found in XML documents (including XHTML).
-For XHTML, this processing instruction theoretically
+
For XHTML, this XML Declaration theoretically
overrides the META
tag. In reality, this happens only when the
XHTML is actually served as legit XML and not HTML, which is almost always
never due to Internet Explorer's lack of support for
application/xhtml+xml
(even though doing so is often
-argued to be good practice).
+argued to be good
+practice and is required by the XHTML 1.1 specification).
-For XML, however, this processing instruction is extremely important.
+
For XML, however, this XML Declaration is extremely important.
Since most webservers are not configured to send charsets for .xml files,
this is the only thing a parser has to go on. Furthermore, the default
for XML files is UTF-8, which often butts heads with more common
ISO-8859-1 encoding (you see this in garbled RSS feeds).
In short, if you use XHTML and have gone through the
-trouble of adding the XML header, make sure it jives
-with your META
tags and HTTP headers.
+trouble of adding the XML Declaration, make sure it jives
+with your META
tags (which should only be present
+if served in text/html) and HTTP headers.
Inside the process
@@ -545,7 +555,7 @@ an application that originally used ISO-8859-1 but switched to UTF-8
when it became far to cumbersome to support foreign languages. Bots
will now actually go through articles and convert character entities
to their corresponding real characters for the sake of user-friendliness
-and searcheability. See
+and searchability. See
Meta's
page on special characters for more details.
@@ -609,7 +619,7 @@ since UTF-8 supports every character.
-Multipart form submission takes a way a lot of the ambiguity
+
Multipart form submission takes away a lot of the ambiguity
that percent-encoding had: the server now can explicitly ask for
certain encodings, and the client can explicitly tell the server
during the form submission what encoding the fields are in.
@@ -678,7 +688,7 @@ set the encoding correctly using %Core.Encoding):