mirror of
https://github.com/ezyang/htmlpurifier.git
synced 2024-11-09 15:28:40 +00:00
4919187fc6
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1335 48356398-32a2-884e-a903-53898d9a118a
409 lines
20 KiB
Plaintext
409 lines
20 KiB
Plaintext
NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
= KEY ====================
|
|
# Breaks back-compat
|
|
! Feature
|
|
- Bugfix
|
|
+ Sub-comment
|
|
. Internal change
|
|
==========================
|
|
|
|
2.1.0, unknown release date
|
|
# flush-htmldefinition-cache.php superseded in favor of a generic
|
|
flush-definition-cache.php script
|
|
! Phorum mod implemented for HTML Purifier
|
|
! With %Core.AggressivelyFixLt, <3 and similar emoticons no longer
|
|
trigger HTML removal in PHP5 (DOMLex). This directive is not necessary
|
|
for PHP4 (DirectLex).
|
|
! Standalone file now available, which greatly reduces the amount of
|
|
includes (although there are still a few files that reside in the
|
|
standalone folder)
|
|
- AutoFormatters emit friendly error messages if tags or attributes they
|
|
need are not allowed
|
|
- ConfigForm's compactification of directive names is now configurable
|
|
- AutoParagraph autoformatter algorithm refined after field-testing
|
|
- XHTML 1.1 now applies XHTML 1.0 Strict cleanup routines, namely
|
|
blockquote wrapping
|
|
- Contents of <style> tags removed by default when tags are removed
|
|
. HTMLPurifier_Config->getSerial() implemented, this is extremely useful
|
|
for output cache invalidation
|
|
. ConfigForm printer now can retrieve CSS and JS files as strings, in
|
|
case HTML Purifier's directory is not publically accessible
|
|
. Introduce new text/itext configuration directive values: these represent
|
|
longer strings that would be more appropriately edited with a textarea
|
|
. Allow newlines to act as separators for lists, hashes, lookups and
|
|
%HTML.Allowed
|
|
. ConfigForm generates textareas instead of text inputs for lists, hashes,
|
|
lookups, text and itext fields
|
|
. Hidden element content removal genericized: %Core.HiddenElements can
|
|
be used to customize this behavior, by default <script> and <style> are
|
|
hidden
|
|
. Added HTMLPURIFIER_PREFIX constant, should be used instead of dirname(__FILE__)
|
|
. Custom ChildDef added to default include list
|
|
. URIScheme reflection improved: will not attempt to include file if class
|
|
already exists. May clobber autoload, so I need to keep an eye on it
|
|
. ConfigSchema heavily optimized, will only collect information and validate
|
|
definitions when HTMLPURIFIER_SCHEMA_STRICT is true.
|
|
. AttrDef_URI unit tests and implementation refactored
|
|
. benchmarks/ directory now protected from public view with .htaccess file;
|
|
run the tests via command line
|
|
. URI scheme is munged off if there is no authority and the scheme is the
|
|
default one
|
|
. All unit tests inherit from HTMLPurifier_Harness, not UnitTestCase
|
|
. Interface for URIScheme changed
|
|
. Generic URI object to hold components of URI added, most systems involved
|
|
in URI validation have been migrated to use it
|
|
. Custom filtering for URIs factored out to URIDefinition interface for
|
|
maximum extensibility
|
|
|
|
2.0.1, released 2007-06-27
|
|
! Tag auto-closing now based on a ChildDef heuristic rather than a
|
|
manually set auto_close array; some behavior may change
|
|
! Experimental AutoFormat functionality added: auto-paragraph and
|
|
linkify your HTML input by setting %AutoFormat.AutoParagraph and
|
|
%AutoFormat.Linkify to true
|
|
! Newlines normalized internally, and then converted back to the
|
|
value of PHP_EOL. If this is not desired, set your newline format
|
|
using %Output.Newline.
|
|
! Beta error collection, messages are implemented for the most generic
|
|
cases involving Lexing or Strategies
|
|
- Clean up special case code for <script> tags
|
|
- Reorder includes for DefinitionCache decorators, fixes a possible
|
|
missing class error
|
|
- Fixed bug where manually modified definitions were not saved via cache
|
|
(mostly harmless, except for the fact that it would be a little slower)
|
|
- Configuration objects with different serials do not clobber each
|
|
others when revision numbers are unequal
|
|
- Improve Serializer DefinitionCache directory permissions checks
|
|
- DefinitionCache no longer throws errors when it encounters old
|
|
serial files that do not conform to the current style
|
|
- Stray xmlns attributes removed from configuration documentation
|
|
- configForm.php smoketest no longer has XSS vulnerability due to
|
|
unescaped print_r output
|
|
- Printer adheres to configuration's directives on output format
|
|
- Fix improperly named form field in ConfigForm printer
|
|
. Rewire some test-cases to swallow errors rather than expect them
|
|
. HTMLDefinition printer updated with some of the new attributes
|
|
. DefinitionCache keys reordered to reflect precedence: version number,
|
|
hash, then revision number
|
|
. %Core.DefinitionCache renamed to %Cache.DefinitionImpl
|
|
. Interlinking in configuration documentation added using
|
|
Injector_PurifierLinkify
|
|
. Directives now keep track of aliases to themselves
|
|
. Error collector now requires a severity to be passed, use PHP's internal
|
|
error constants for this
|
|
. HTMLPurifier_Config::getAllowedDirectivesForForm implemented, allows
|
|
much easier selective embedding of configuration values
|
|
. Doctype objects now accept public and system DTD identifiers
|
|
. %HTML.Doctype is now constrained by specific values, to specify a custom
|
|
doctype use new %HTML.CustomDoctype
|
|
. ConfigForm truncates long directives to keep the form small, and does
|
|
not re-output namespaces
|
|
|
|
2.0.0, released 2007-06-20
|
|
# Completely refactored HTMLModuleManager, decentralizing safety
|
|
information
|
|
# Transform modules changed to Tidy modules, which offer more flexibility
|
|
and better modularization
|
|
# Configuration object now finalizes itself when a read operation is
|
|
performed on it, ensuring that its internal state stays consistent.
|
|
To revert this behavior, you can set the $autoFinalize member variable
|
|
off, but it's not recommended.
|
|
# New compact syntax for AttrDef objects that can be used to instantiate
|
|
new objects via make()
|
|
# Definitions (esp. HTMLDefinition) are now cached for a significant
|
|
performance boost. You can disable caching by setting %Core.DefinitionCache
|
|
to null. You CANNOT edit raw definitions without setting the corresponding
|
|
DefinitionID directive (%HTML.DefinitionID for HTMLDefinition).
|
|
# Contents between <script> tags are now completely removed if <script>
|
|
is not allowed
|
|
# Prototype-declarations for Lexer removed in favor of configuration
|
|
determination of Lexer implementations.
|
|
! HTML Purifier now works in PHP 4.3.2.
|
|
! Configuration form-editing API makes tweaking HTMLPurifier_Config a
|
|
breeze!
|
|
! Configuration directives that accept hashes now allow new string
|
|
format: key1:value1,key2:value2
|
|
! ConfigDoc now factored into OOP design
|
|
! All deprecated elements now natively supported
|
|
! Implement TinyMCE styled whitelist specification format in
|
|
%HTML.Allowed
|
|
! Config object gives more friendly error messages when things go wrong
|
|
! Advanced API implemented: easy functions for creating elements (addElement)
|
|
and attributes (addAttribute) on HTMLDefinition
|
|
! Add native support for required attributes
|
|
- Deprecated and removed EnableRedundantUTF8Cleaning. It didn't even work!
|
|
- DOMLex will not emit errors when a custom error handler that does not
|
|
honor error_reporting is used
|
|
- StrictBlockquote child definition refrains from wrapping whitespace
|
|
in tags now.
|
|
- Bug resulting from tag transforms to non-allowed elements fixed
|
|
- ChildDef_Custom's regex generation has been improved, removing several
|
|
false positives
|
|
. Unit test for ElementDef created, ElementDef behavior modified to
|
|
be more flexible
|
|
. Added convenience functions for HTMLModule constructors
|
|
. AttrTypes now has accessor functions that should be used instead
|
|
of directly manipulating info
|
|
. TagTransform_Center deprecated in favor of generic TagTransform_Simple
|
|
. Add extra protection in AttrDef_URI against phantom Schemes
|
|
. Doctype object added to HTMLDefinition which describes certain aspects
|
|
of the operational document type
|
|
. Lexer is now pre-emptively included, with a conditional include for the
|
|
PHP5 only version.
|
|
. HTMLDefinition and CSSDefinition have a common parent class: Definition.
|
|
. DirectLex can now track line-numbers
|
|
. Preliminary error collector is in place, although no code actually reports
|
|
errors yet
|
|
. Factor out most of ValidateAttributes to new AttrValidator class
|
|
|
|
1.6.1, released 2007-05-05
|
|
! Support for more deprecated attributes via transformations:
|
|
+ hspace and vspace in img
|
|
+ size and noshade in hr
|
|
+ nowrap in td
|
|
+ clear in br
|
|
+ align in caption, table, img and hr
|
|
+ type in ul, ol and li
|
|
! DirectLex now preserves text in which a < bracket is followed by
|
|
a non-alphanumeric character. This means that certain emoticons
|
|
are now preserved.
|
|
! %Core.RemoveInvalidImg is now operational, when set to false invalid
|
|
images will hang around with an empty src
|
|
! target attribute in a tag supported, use %Attr.AllowedFrameTargets
|
|
to enable
|
|
! CSS property white-space now allows nowrap (supported in all modern
|
|
browsers) but not others (which have spotty browser implementations)
|
|
! XHTML 1.1 mode now sort-of works without any fatal errors, and
|
|
lang is now moved over to xml:lang.
|
|
! Attribute transformation smoketest available at smoketests/attrTransform.php
|
|
! Transformation of font's size attribute now handles super-large numbers
|
|
- Possibly fatal bug with __autoload() fixed in module manager
|
|
- Invert HTMLModuleManager->addModule() processing order to check
|
|
prefixes first and then the literal module
|
|
- Empty strings get converted to empty arrays instead of arrays with
|
|
an empty string in them.
|
|
- Merging in attribute lists now works.
|
|
. Demo script removed: it has been added to the website's repository
|
|
. Basic.php script modified to work out of the box
|
|
. Refactor AttrTransform classes to reduce duplication
|
|
. AttrTransform_TextAlign axed in favor of a more general
|
|
AttrTransform_EnumToCSS, refer to HTMLModule/TransformToStrict.php to
|
|
see how the new equivalent is implemented
|
|
. Unit tests now use exclusively assertIdentical
|
|
|
|
1.6.0, released 2007-04-01
|
|
! Support for most common deprecated attributes via transformations:
|
|
+ bgcolor in td, th, tr and table
|
|
+ border in img
|
|
+ name in a and img
|
|
+ width in td, th and hr
|
|
+ height in td, th
|
|
! Support for CSS attribute 'height' added
|
|
! Support for rel and rev attributes in a tags added, use %Attr.AllowedRel
|
|
and %Attr.AllowedRev to activate
|
|
- You can define ID blacklists using regular expressions via
|
|
%Attr.IDBlacklistRegexp
|
|
- Error messages are emitted when you attempt to "allow" elements or
|
|
attributes that HTML Purifier does not support
|
|
- Fix segfault in unit test. The problem is not very reproduceable and
|
|
I don't know what causes it, but a six line patch fixed it.
|
|
|
|
1.5.0, released 2007-03-23
|
|
! Added a rudimentary I18N and L10N system modeled off MediaWiki. It
|
|
doesn't actually do anything yet, but keep your eyes peeled.
|
|
! docs/enduser-utf8.html explains how to use UTF-8 and HTML Purifier
|
|
! Newly structured HTMLDefinition modeled off of XHTML 1.1 modules.
|
|
I am loathe to release beta quality APIs, but this is exactly that;
|
|
don't use the internal interfaces if you're not willing to do migration
|
|
later on.
|
|
- Allow 'x' subtag in language codes
|
|
- Fixed buggy chameleon-support for ins and del
|
|
. Added support for IDREF attributes (i.e. for)
|
|
. Renamed HTMLPurifier_AttrDef_Class to HTMLPurifier_AttrDef_Nmtokens
|
|
. Removed context variable ParentType, replaced with IsInline, which
|
|
is false when you're not inline and an integer of the parent that
|
|
caused you to become inline when you are (so possibly zero)
|
|
. Removed ElementDef->type in favor of ElementDef->descendants_are_inline
|
|
and HTMLDefinition->content_sets
|
|
. StrictBlockquote now reports what elements its supposed to allow,
|
|
rather than what it does allow
|
|
. Removed HTMLDefinition->info_flow_elements in favor of
|
|
HTMLDefinition->content_sets['Flow']
|
|
. Removed redundant "exclusionary" definitions from DTD roster
|
|
. StrictBlockquote now requires a construction parameter as if it
|
|
were an Required ChildDef, this is the "real" set of allowed elements
|
|
. AttrDef partitioned into HTML, CSS and URI segments
|
|
. Modify Youtube filter regexp to be multiline
|
|
. Require both PHP5 and DOM extension in order to use DOMLex, fixes
|
|
some edge cases where a DOMDocument class exists in a PHP4 environment
|
|
due to DOM XML extension.
|
|
|
|
1.4.1, released 2007-01-21
|
|
! docs/enduser-youtube.html updated according to new functionality
|
|
- YouTube IDs can have underscores and dashes
|
|
|
|
1.4.0, released 2007-01-21
|
|
! Implemented list-style-image, URIs now allowed in list-style
|
|
! Implemented background-image, background-repeat, background-attachment
|
|
and background-position CSS properties. Shorthand property background
|
|
supports all of these properties.
|
|
! Configuration documentation looks nicer
|
|
! Added %Core.EscapeNonASCIICharacters to workaround loss of Unicode
|
|
characters while %Core.Encoding is set to a non-UTF-8 encoding.
|
|
! Support for configuration directive aliases added
|
|
! Config object can now be instantiated from ini files
|
|
! YouTube preservation code added to the core, with two lines of code
|
|
you can add it as a filter to your code. See smoketests/preserveYouTube.php
|
|
for sample code.
|
|
! Moved SLOW to docs/enduser-slow.html and added code examples
|
|
- Replaced version check with functionality check for DOM (thanks Stephen
|
|
Khoo)
|
|
. Added smoketest 'all.php', which loads all other smoketests via frames
|
|
. Implemented AttrDef_CSSURI for url(http://google.com) style declarations
|
|
. Added convenient single test selector form on test runner
|
|
|
|
1.3.2, released 2006-12-25
|
|
! HTMLPurifier object now accepts configuration arrays, no need to manually
|
|
instantiate a configuration object
|
|
! Context object now accessible to outside
|
|
! Added enduser-youtube.html, explains how to embed YouTube videos. See
|
|
also corresponding smoketest preserveYouTube.php.
|
|
! Added purifyArray(), which takes a list of HTML and purifies it all
|
|
! Added static member variable $version to HTML Purifier with PHP-compatible
|
|
version number string.
|
|
- Fixed fatal error thrown by upper-cased language attributes
|
|
- printDefinition.php: added labels, added better clarification
|
|
. HTMLPurifier_Config::create() added, takes mixed variable and converts into
|
|
a HTMLPurifier_Config object.
|
|
|
|
1.3.1, released 2006-12-06
|
|
! Added HTMLPurifier.func.php stub for a convenient function to call the library
|
|
- Fixed bug in RemoveInvalidImg code that caused all images to be dropped
|
|
(thanks to .mario for reporting this)
|
|
. Standardized all attribute handling variables to attr, made it plural
|
|
|
|
1.3.0, released 2006-11-26
|
|
# Invalid images are now removed, rather than replaced with a dud
|
|
<img src="" alt="Invalid image" />. Previous behavior can be restored
|
|
with new directive %Core.RemoveInvalidImg set to false.
|
|
! (X)HTML Strict now supported
|
|
+ Transparently handles inline elements in block context (blockquote)
|
|
! Added GET method to demo for easier validation, added 50kb max input size
|
|
! New directive %HTML.BlockWrapper, for block-ifying inline elements
|
|
! New directive %HTML.Parent, allows you to only allow inline content
|
|
! New directives %HTML.AllowedElements and %HTML.AllowedAttributes to let
|
|
users narrow the set of allowed tags
|
|
! <li value="4"> and <ul start="2"> now allowed in loose mode
|
|
! New directives %URI.DisableExternalResources and %URI.DisableResources
|
|
! New directive %Attr.DisableURI, which eliminates all hyperlinking
|
|
! New directive %URI.Munge, munges URI so you can use some sort of redirector
|
|
service to avoid PageRank leaks or warn users that they are exiting your site.
|
|
! Added spiffy new smoketest printDefinition.php, which lets you twiddle with
|
|
the configuration settings and see how the internal rules are affected.
|
|
! New directive %URI.HostBlacklist for blocking links to bad hosts.
|
|
xssAttacks.php smoketest updated accordingly.
|
|
- Added missing type to ChildDef_Chameleon
|
|
- Remove Tidy option from demo if there is not Tidy available
|
|
. ChildDef_Required guards against empty tags
|
|
. Lookup table HTMLDefinition->info_flow_elements added
|
|
. Added peace-of-mind variable initialization to Strategy_FixNesting
|
|
. Added HTMLPurifier->info_parent_def, parent child processing made special
|
|
. Added internal documents briefly summarizing future progression of HTML
|
|
. HTMLPurifier_Config->getBatch($namespace) added
|
|
. More lenient casting to bool from string in HTMLPurifier_ConfigSchema
|
|
. Refactored ChildDef classes into their own files
|
|
|
|
1.2.0, released 2006-11-19
|
|
# ID attributes now disabled by default. New directives:
|
|
+ %HTML.EnableAttrID - restores old behavior by allowing IDs
|
|
+ %Attr.IDPrefix - %Attr.IDBlacklist alternative that munges all user IDs
|
|
so that they don't collide with your IDs
|
|
+ %Attr.IDPrefixLocal - Same as above, but for when there are multiple
|
|
instances of user content on the page
|
|
+ Profuse documentation on how to use these available in docs/enduser-id.txt
|
|
! Added MODx plugin <http://modxcms.com/forums/index.php/topic,6604.0.html>
|
|
! Added percent encoding normalization
|
|
! XSS attacks smoketest given facelift
|
|
! Configuration documentation now has table of contents
|
|
! Added %URI.DisableExternal, which prevents links to external websites. You
|
|
can also use %URI.Host to permit absolute linking to subdomains
|
|
! Non-accessible resources (ex. mailto) blocked from embedded URIs (img src)
|
|
- Type variable in HTMLDefinition was not being set properly, fixed
|
|
- Documentation updated
|
|
+ TODO added request Phalanger
|
|
+ TODO added request Native compression
|
|
+ TODO added request Remove redundant tags
|
|
+ TODO added possible plaintext formatter for HTML Purifier documentation
|
|
+ Updated ConfigDoc TODO
|
|
+ Improved inline comments in AttrDef/Class.php, AttrDef/CSS.php
|
|
and AttrDef/Host.php
|
|
+ Revamped documentation into HTML, along with misc updates
|
|
- HTMLPurifier_Context doesn't throw a variable reference error if you attempt
|
|
to retrieve a non-existent variable
|
|
. Switched to purify()-wide Context object registry
|
|
. Refactored unit tests to minimize duplication
|
|
. XSS attack sheet updated
|
|
. configdoc.xml now has xml:space attached to default value nodes
|
|
. Allow configuration directives to permit null values
|
|
. Cleaned up test-cases to remove unnecessary swallowErrors()
|
|
|
|
1.1.2, released 2006-09-30
|
|
! Add HTMLPurifier.auto.php stub file that configures include_path
|
|
- Documentation updated
|
|
+ INSTALL document rewritten
|
|
+ TODO added semi-lossy conversion
|
|
+ API Doxygen docs' file exclusions updated
|
|
+ Added notes on HTML versus XML attribute whitespace handling
|
|
+ Noted that HTMLPurifier_ChildDef_Custom isn't being used
|
|
+ Noted that config object's definitions are cached versions
|
|
- Fixed lack of attribute parsing in HTMLPurifier_Lexer_PEARSax3
|
|
- ftp:// URIs now have their typecodes checked
|
|
- Hooked up HTMLPurifier_ChildDef_Custom's unit tests (they weren't being run)
|
|
. Line endings standardized throughout project (svn:eol-style standardized)
|
|
. Refactored parseData() to general Lexer class
|
|
. Tester named "HTML Purifier" not "HTMLPurifier"
|
|
|
|
1.1.1, released 2006-09-24
|
|
! Configuration option to optionally Tidy up output for indentation to make up
|
|
for dropped whitespace by DOMLex (pretty-printing for the entire application
|
|
should be done by a page-wide Tidy)
|
|
- Various documentation updates
|
|
- Fixed parse error in configuration documentation script
|
|
- Fixed fatal error in benchmark scripts, slightly augmented
|
|
- As far as possible, whitespace is preserved in-between table children
|
|
- Sample test-settings.php file included
|
|
|
|
1.1.0, released 2006-09-16
|
|
! Directive documentation generation using XSLT
|
|
! XHTML can now be turned off, output becomes <br>
|
|
- Made URI validator more forgiving: will ignore leading and trailing
|
|
quotes, apostrophes and less than or greater than signs.
|
|
- Enforce alphanumeric namespace and directive names for configuration.
|
|
- Table child definition made more flexible, will fix up poorly ordered elements
|
|
. Renamed ConfigDef to ConfigSchema
|
|
|
|
1.0.1, released 2006-09-04
|
|
- Fixed slight bug in DOMLex attribute parsing
|
|
- Fixed rejection of case-insensitive configuration values when there is a
|
|
set of allowed values. This manifested in %Core.Encoding.
|
|
- Fixed rejection of inline style declarations that had lots of extra
|
|
space in them. This manifested in TinyMCE.
|
|
|
|
1.0.0, released 2006-09-01
|
|
! Shorthand CSS properties implemented: font, border, background, list-style
|
|
! Basic color keywords translated into hexadecimal values
|
|
! Table CSS properties implemented
|
|
! Support for charsets other than UTF-8 (defined by iconv)
|
|
! Malformed UTF-8 and non-SGML character detection and cleaning implemented
|
|
- Fixed broken numeric entity conversion
|
|
- API documentation completed
|
|
. (HTML|CSS)Definition de-singleton-ized
|
|
|
|
1.0.0beta, released 2006-08-16
|
|
! First public release, most functionality implemented. Notable omissions are:
|
|
+ Shorthand CSS properties
|
|
+ Table CSS properties
|
|
+ Deprecated attribute transformations
|