mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2025-01-03 05:11:52 +00:00
Update gitignore with post-release files, new NEWS entry and spellcheck UTF-8.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
This commit is contained in:
parent
280211f70b
commit
6fe6cc8901
3
.gitignore
vendored
3
.gitignore
vendored
@ -3,8 +3,11 @@ test-settings.php
|
|||||||
library/HTMLPurifier/DefinitionCache/Serializer/*/
|
library/HTMLPurifier/DefinitionCache/Serializer/*/
|
||||||
library/standalone/
|
library/standalone/
|
||||||
library/HTMLPurifier.standalone.php
|
library/HTMLPurifier.standalone.php
|
||||||
|
library/HTMLPurifier*.tgz
|
||||||
|
library/package*.xml
|
||||||
configdoc/*.html
|
configdoc/*.html
|
||||||
configdoc/configdoc.xml
|
configdoc/configdoc.xml
|
||||||
|
docs/doxygen*
|
||||||
*.phpt.diff
|
*.phpt.diff
|
||||||
*.phpt.exp
|
*.phpt.exp
|
||||||
*.phpt.log
|
*.phpt.log
|
||||||
|
2
NEWS
2
NEWS
@ -9,6 +9,8 @@ NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
|
|||||||
. Internal change
|
. Internal change
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
|
3.3.0, unknown release date
|
||||||
|
|
||||||
3.2.0, released 2008-10-31
|
3.2.0, released 2008-10-31
|
||||||
# Using %Core.CollectErrors forces line number/column tracking on, whereas
|
# Using %Core.CollectErrors forces line number/column tracking on, whereas
|
||||||
previously you could theoretically turn it off.
|
previously you could theoretically turn it off.
|
||||||
|
@ -481,7 +481,7 @@ if we don't know it's character encoding? And how do we figure out
|
|||||||
the character encoding, if we don't know the contents of the
|
the character encoding, if we don't know the contents of the
|
||||||
<code>META</code> tag?</p>
|
<code>META</code> tag?</p>
|
||||||
|
|
||||||
<p>Fortunantely for us, the characters we need to write the
|
<p>Fortunately for us, the characters we need to write the
|
||||||
<code>META</code> are in ASCII, which is pretty much universal
|
<code>META</code> are in ASCII, which is pretty much universal
|
||||||
over every character encoding that is in common use today. So,
|
over every character encoding that is in common use today. So,
|
||||||
all the web-browser has to do is parse all the way down until
|
all the web-browser has to do is parse all the way down until
|
||||||
@ -526,7 +526,7 @@ you don't have to use those user-unfriendly entities.</p>
|
|||||||
|
|
||||||
<h3 id="whyutf8-user">User-friendly</h3>
|
<h3 id="whyutf8-user">User-friendly</h3>
|
||||||
|
|
||||||
<p>Websites encoded in Latin-1 (ISO-8859-1) which ocassionally need
|
<p>Websites encoded in Latin-1 (ISO-8859-1) which occasionally need
|
||||||
a special character outside of their scope often will use a character
|
a special character outside of their scope often will use a character
|
||||||
entity reference to achieve the desired effect. For instance, θ can be
|
entity reference to achieve the desired effect. For instance, θ can be
|
||||||
written <code>&theta;</code>, regardless of the character encoding's
|
written <code>&theta;</code>, regardless of the character encoding's
|
||||||
@ -584,7 +584,7 @@ disappeared off the web, so I am linking to the Web Archive copy.)</p>
|
|||||||
<h4 id="whyutf8-forms-urlencoded"><code>application/x-www-form-urlencoded</code></h4>
|
<h4 id="whyutf8-forms-urlencoded"><code>application/x-www-form-urlencoded</code></h4>
|
||||||
|
|
||||||
<p>This is the Content-Type that GET requests must use, and POST requests
|
<p>This is the Content-Type that GET requests must use, and POST requests
|
||||||
use by default. It involves the ubiquituous percent encoding format that
|
use by default. It involves the ubiquitous percent encoding format that
|
||||||
looks something like: <code>%C3%86</code>. There is no official way of
|
looks something like: <code>%C3%86</code>. There is no official way of
|
||||||
determining the character encoding of such a request, since the percent
|
determining the character encoding of such a request, since the percent
|
||||||
encoding operates on a byte level, so it is usually assumed that it
|
encoding operates on a byte level, so it is usually assumed that it
|
||||||
@ -674,7 +674,7 @@ it up to the module iconv to do the dirty work.</p>
|
|||||||
<p>This approach, however, is not perfect. iconv is blithely unaware
|
<p>This approach, however, is not perfect. iconv is blithely unaware
|
||||||
of HTML character entities. HTML Purifier, in order to
|
of HTML character entities. HTML Purifier, in order to
|
||||||
protect against sophisticated escaping schemes, normalizes all character
|
protect against sophisticated escaping schemes, normalizes all character
|
||||||
and numeric entitie references before processing the text. This leads to
|
and numeric entity references before processing the text. This leads to
|
||||||
one important ramification:</p>
|
one important ramification:</p>
|
||||||
|
|
||||||
<p><strong>Any character that is not supported by the target character
|
<p><strong>Any character that is not supported by the target character
|
||||||
@ -770,7 +770,7 @@ the text when you try to convert it to UTF-8. You'll have to convert
|
|||||||
it to a binary field, convert it to a Shift-JIS field (the real encoding),
|
it to a binary field, convert it to a Shift-JIS field (the real encoding),
|
||||||
and then finally to UTF-8. Many a website had pages irreversibly mangled
|
and then finally to UTF-8. Many a website had pages irreversibly mangled
|
||||||
because they didn't realize that they'd been deluding themselves about
|
because they didn't realize that they'd been deluding themselves about
|
||||||
the character encoding all along, don't become the next victim.</p>
|
the character encoding all along; don't become the next victim.</p>
|
||||||
|
|
||||||
<p>For <a href="http://www.postgresql.org/docs/8.2/static/multibyte.html">PostgreSQL</a>, there appears to be no direct way to change the
|
<p>For <a href="http://www.postgresql.org/docs/8.2/static/multibyte.html">PostgreSQL</a>, there appears to be no direct way to change the
|
||||||
encoding of a database (as of 8.2). You will have to dump the data, and then reimport
|
encoding of a database (as of 8.2). You will have to dump the data, and then reimport
|
||||||
@ -790,7 +790,7 @@ usually supported).</p>
|
|||||||
|
|
||||||
<h4 id="migrate-db-binary">Binary</h4>
|
<h4 id="migrate-db-binary">Binary</h4>
|
||||||
|
|
||||||
<p>Due to the abovementioned compatibility issues, a more interoperable
|
<p>Due to the aforementioned compatibility issues, a more interoperable
|
||||||
way of storing UTF-8 text is to stuff it in a binary datatype.
|
way of storing UTF-8 text is to stuff it in a binary datatype.
|
||||||
<code>CHAR</code> becomes <code>BINARY</code>, <code>VARCHAR</code> becomes
|
<code>CHAR</code> becomes <code>BINARY</code>, <code>VARCHAR</code> becomes
|
||||||
<code>VARBINARY</code> and <code>TEXT</code> becomes <code>BLOB</code>.
|
<code>VARBINARY</code> and <code>TEXT</code> becomes <code>BLOB</code>.
|
||||||
@ -917,8 +917,8 @@ anyway. So we'll deal with the other two edge cases.</p>
|
|||||||
would like to read your website but get heaps of question marks or
|
would like to read your website but get heaps of question marks or
|
||||||
other meaningless characters. Fixing this problem requires the
|
other meaningless characters. Fixing this problem requires the
|
||||||
installation of a font or language pack which is often highly
|
installation of a font or language pack which is often highly
|
||||||
dependent on what the language is. <a href="http://bn.wikipedia.org/wiki/%E0%A6%89%E0%A6%87%E0%A6%95%E0%A6%BF%E0%A6%AA%E0%A7%87%E0%A6%A1%E0%A6%BF%E0%A6%AF%E0%A6%BC%E0%A6%BE:Bangla_script_display_help">Here is an example</a>
|
dependent on what the language is. <a href="http://bn.wikipedia.org/wiki/%E0%A6%89%E0%A6%87%E0%A6%95%E0%A6%BF%E0%A6%AA%E0%A7%87%E0%A6%A1%E0%A6%BF%E0%A6%AF%E0%A6%BC%E0%A6%BE:Bangla_script_display_and_input_help">Here is an example</a>
|
||||||
of such a help file for the Bengali language, I am sure there are
|
of such a help file for the Bengali language; I am sure there are
|
||||||
others out there too. You just have to point users to the appropriate
|
others out there too. You just have to point users to the appropriate
|
||||||
help file.</p>
|
help file.</p>
|
||||||
|
|
||||||
@ -928,7 +928,7 @@ help file.</p>
|
|||||||
characters embedded in what otherwise would be very bland ASCII are
|
characters embedded in what otherwise would be very bland ASCII are
|
||||||
letters of the
|
letters of the
|
||||||
<a href="http://en.wikipedia.org/wiki/International_Phonetic_Alphabet">International
|
<a href="http://en.wikipedia.org/wiki/International_Phonetic_Alphabet">International
|
||||||
Phonetic Alphabet (IPA)</a>, use to designate pronounciations in a very standard
|
Phonetic Alphabet (IPA)</a>, use to designate pronunciations in a very standard
|
||||||
manner (you probably see them all the time in your dictionary). Your
|
manner (you probably see them all the time in your dictionary). Your
|
||||||
average font probably won't have support for all of the IPA characters
|
average font probably won't have support for all of the IPA characters
|
||||||
like ʘ (bilabial click) or ʒ (voiced postalveolar fricative).
|
like ʘ (bilabial click) or ʒ (voiced postalveolar fricative).
|
||||||
@ -941,11 +941,11 @@ most widely used browser in the entire world? Microsoft IE 6
|
|||||||
is not smart enough to borrow from other fonts when a character isn't
|
is not smart enough to borrow from other fonts when a character isn't
|
||||||
present, so more often than not you'll be slapped with a nice big �.
|
present, so more often than not you'll be slapped with a nice big �.
|
||||||
To get things to work, MSIE 6 needs a little nudge. You could configure it
|
To get things to work, MSIE 6 needs a little nudge. You could configure it
|
||||||
to use a different font to render the text, but you can acheive the same
|
to use a different font to render the text, but you can achieve the same
|
||||||
effect by selectively changing the font for blocks of special characters
|
effect by selectively changing the font for blocks of special characters
|
||||||
to known good Unicode fonts.</p>
|
to known good Unicode fonts.</p>
|
||||||
|
|
||||||
<p>Fortunantely, the folks over at Wikipedia have already done all the
|
<p>Fortunately, the folks over at Wikipedia have already done all the
|
||||||
heavy lifting for you. Get the CSS from the horses mouth here:
|
heavy lifting for you. Get the CSS from the horses mouth here:
|
||||||
<a href="http://en.wikipedia.org/wiki/MediaWiki:Common.css">Common.css</a>,
|
<a href="http://en.wikipedia.org/wiki/MediaWiki:Common.css">Common.css</a>,
|
||||||
and search for ".IPA" There are also a smattering of
|
and search for ".IPA" There are also a smattering of
|
||||||
@ -972,7 +972,7 @@ users.</p>
|
|||||||
<h3 id="migrate-variablewidth">Dealing with variable width in functions</h3>
|
<h3 id="migrate-variablewidth">Dealing with variable width in functions</h3>
|
||||||
|
|
||||||
<p>When people claim that PHP6 will solve all our Unicode problems, they're
|
<p>When people claim that PHP6 will solve all our Unicode problems, they're
|
||||||
misinformed. It will not fix any of the abovementioned troubles. It will,
|
misinformed. It will not fix any of the aforementioned troubles. It will,
|
||||||
however, fix the problem we are about to discuss: processing UTF-8 text
|
however, fix the problem we are about to discuss: processing UTF-8 text
|
||||||
in PHP.</p>
|
in PHP.</p>
|
||||||
|
|
||||||
@ -1035,7 +1035,7 @@ directory.</p>
|
|||||||
<p>Well, that's it. Hopefully this document has served as a very
|
<p>Well, that's it. Hopefully this document has served as a very
|
||||||
practical springboard into knowledge of how UTF-8 works. You may have
|
practical springboard into knowledge of how UTF-8 works. You may have
|
||||||
decided that you don't want to migrate yet: that's fine, just know
|
decided that you don't want to migrate yet: that's fine, just know
|
||||||
what will happen to your output and what bug reports you may recieve.</p>
|
what will happen to your output and what bug reports you may receive.</p>
|
||||||
|
|
||||||
<p>Many other developers have already discussed the subject of Unicode,
|
<p>Many other developers have already discussed the subject of Unicode,
|
||||||
UTF-8 and internationalization, and I would like to defer to them for
|
UTF-8 and internationalization, and I would like to defer to them for
|
||||||
|
Loading…
Reference in New Issue
Block a user