NEWS ( CHANGELOG and HISTORY ) HTMLPurifier ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| = KEY ==================== # Breaks back-compat ! Feature - Bugfix + Sub-comment . Internal change ========================== 1.7.0, unknown release date # Completely refactored HTMLModuleManager, decentralizing safety information # Transform modules changed to Tidy modules, which offer more flexibility and better modularization . Unit test for ElementDef created, ElementDef behavior modified to be more flexible . Added convenience functions for HTMLModule constructors 1.6.1, released 2007-05-05 ! Support for more deprecated attributes via transformations: + hspace and vspace in img + size and noshade in hr + nowrap in td + clear in br + align in caption, table, img and hr + type in ul, ol and li ! DirectLex now preserves text in which a < bracket is followed by a non-alphanumeric character. This means that certain emoticons are now preserved. ! %Core.RemoveInvalidImg is now operational, when set to false invalid images will hang around with an empty src ! target attribute in a tag supported, use %Attr.AllowedFrameTargets to enable ! CSS property white-space now allows nowrap (supported in all modern browsers) but not others (which have spotty browser implementations) ! XHTML 1.1 mode now sort-of works without any fatal errors, and lang is now moved over to xml:lang. ! Attribute transformation smoketest available at smoketests/attrTransform.php ! Transformation of font's size attribute now handles super-large numbers - Possibly fatal bug with __autoload() fixed in module manager - Invert HTMLModuleManager->addModule() processing order to check prefixes first and then the literal module - Empty strings get converted to empty arrays instead of arrays with an empty string in them. - Merging in attribute lists now works. . Demo script removed: it has been added to the website's repository . Basic.php script modified to work out of the box . Refactor AttrTransform classes to reduce duplication . AttrTransform_TextAlign axed in favor of a more general AttrTransform_EnumToCSS, refer to HTMLModule/TransformToStrict.php to see how the new equivalent is implemented . Unit tests now use exclusively assertIdentical 1.6.0, released 2007-04-01 ! Support for most common deprecated attributes via transformations: + bgcolor in td, th, tr and table + border in img + name in a and img + width in td, th and hr + height in td, th ! Support for CSS attribute 'height' added ! Support for rel and rev attributes in a tags added, use %Attr.AllowedRel and %Attr.AllowedRev to activate - You can define ID blacklists using regular expressions via %Attr.IDBlacklistRegexp - Error messages are emitted when you attempt to "allow" elements or attributes that HTML Purifier does not support - Fix segfault in unit test. The problem is not very reproduceable and I don't know what causes it, but a six line patch fixed it. 1.5.0, released 2007-03-23 ! Added a rudimentary I18N and L10N system modeled off MediaWiki. It doesn't actually do anything yet, but keep your eyes peeled. ! docs/enduser-utf8.html explains how to use UTF-8 and HTML Purifier ! Newly structured HTMLDefinition modeled off of XHTML 1.1 modules. I am loathe to release beta quality APIs, but this is exactly that; don't use the internal interfaces if you're not willing to do migration later on. - Allow 'x' subtag in language codes - Fixed buggy chameleon-support for ins and del . Added support for IDREF attributes (i.e. for) . Renamed HTMLPurifier_AttrDef_Class to HTMLPurifier_AttrDef_Nmtokens . Removed context variable ParentType, replaced with IsInline, which is false when you're not inline and an integer of the parent that caused you to become inline when you are (so possibly zero) . Removed ElementDef->type in favor of ElementDef->descendants_are_inline and HTMLDefinition->content_sets . StrictBlockquote now reports what elements its supposed to allow, rather than what it does allow . Removed HTMLDefinition->info_flow_elements in favor of HTMLDefinition->content_sets['Flow'] . Removed redundant "exclusionary" definitions from DTD roster . StrictBlockquote now requires a construction parameter as if it were an Required ChildDef, this is the "real" set of allowed elements . AttrDef partitioned into HTML, CSS and URI segments . Modify Youtube filter regexp to be multiline . Require both PHP5 and DOM extension in order to use DOMLex, fixes some edge cases where a DOMDocument class exists in a PHP4 environment due to DOM XML extension. 1.4.1, released 2007-01-21 ! docs/enduser-youtube.html updated according to new functionality - YouTube IDs can have underscores and dashes 1.4.0, released 2007-01-21 ! Implemented list-style-image, URIs now allowed in list-style ! Implemented background-image, background-repeat, background-attachment and background-position CSS properties. Shorthand property background supports all of these properties. ! Configuration documentation looks nicer ! Added %Core.EscapeNonASCIICharacters to workaround loss of Unicode characters while %Core.Encoding is set to a non-UTF-8 encoding. ! Support for configuration directive aliases added ! Config object can now be instantiated from ini files ! YouTube preservation code added to the core, with two lines of code you can add it as a filter to your code. See smoketests/preserveYouTube.php for sample code. ! Moved SLOW to docs/enduser-slow.html and added code examples - Replaced version check with functionality check for DOM (thanks Stephen Khoo) . Added smoketest 'all.php', which loads all other smoketests via frames . Implemented AttrDef_CSSURI for url(http://google.com) style declarations . Added convenient single test selector form on test runner 1.3.2, released 2006-12-25 ! HTMLPurifier object now accepts configuration arrays, no need to manually instantiate a configuration object ! Context object now accessible to outside ! Added enduser-youtube.html, explains how to embed YouTube videos. See also corresponding smoketest preserveYouTube.php. ! Added purifyArray(), which takes a list of HTML and purifies it all ! Added static member variable $version to HTML Purifier with PHP-compatible version number string. - Fixed fatal error thrown by upper-cased language attributes - printDefinition.php: added labels, added better clarification . HTMLPurifier_Config::create() added, takes mixed variable and converts into a HTMLPurifier_Config object. 1.3.1, released 2006-12-06 ! Added HTMLPurifier.func.php stub for a convenient function to call the library - Fixed bug in RemoveInvalidImg code that caused all images to be dropped (thanks to .mario for reporting this) . Standardized all attribute handling variables to attr, made it plural 1.3.0, released 2006-11-26 # Invalid images are now removed, rather than replaced with a dud <img src="" alt="Invalid image" />. Previous behavior can be restored with new directive %Core.RemoveInvalidImg set to false. ! (X)HTML Strict now supported + Transparently handles inline elements in block context (blockquote) ! Added GET method to demo for easier validation, added 50kb max input size ! New directive %HTML.BlockWrapper, for block-ifying inline elements ! New directive %HTML.Parent, allows you to only allow inline content ! New directives %HTML.AllowedElements and %HTML.AllowedAttributes to let users narrow the set of allowed tags ! <li value="4"> and <ul start="2"> now allowed in loose mode ! New directives %URI.DisableExternalResources and %URI.DisableResources ! New directive %Attr.DisableURI, which eliminates all hyperlinking ! New directive %URI.Munge, munges URI so you can use some sort of redirector service to avoid PageRank leaks or warn users that they are exiting your site. ! Added spiffy new smoketest printDefinition.php, which lets you twiddle with the configuration settings and see how the internal rules are affected. ! New directive %URI.HostBlacklist for blocking links to bad hosts. xssAttacks.php smoketest updated accordingly. - Added missing type to ChildDef_Chameleon - Remove Tidy option from demo if there is not Tidy available . ChildDef_Required guards against empty tags . Lookup table HTMLDefinition->info_flow_elements added . Added peace-of-mind variable initialization to Strategy_FixNesting . Added HTMLPurifier->info_parent_def, parent child processing made special . Added internal documents briefly summarizing future progression of HTML . HTMLPurifier_Config->getBatch($namespace) added . More lenient casting to bool from string in HTMLPurifier_ConfigSchema . Refactored ChildDef classes into their own files 1.2.0, released 2006-11-19 # ID attributes now disabled by default. New directives: + %HTML.EnableAttrID - restores old behavior by allowing IDs + %Attr.IDPrefix - %Attr.IDBlacklist alternative that munges all user IDs so that they don't collide with your IDs + %Attr.IDPrefixLocal - Same as above, but for when there are multiple instances of user content on the page + Profuse documentation on how to use these available in docs/enduser-id.txt ! Added MODx plugin <http://modxcms.com/forums/index.php/topic,6604.0.html> ! Added percent encoding normalization ! XSS attacks smoketest given facelift ! Configuration documentation now has table of contents ! Added %URI.DisableExternal, which prevents links to external websites. You can also use %URI.Host to permit absolute linking to subdomains ! Non-accessible resources (ex. mailto) blocked from embedded URIs (img src) - Type variable in HTMLDefinition was not being set properly, fixed - Documentation updated + TODO added request Phalanger + TODO added request Native compression + TODO added request Remove redundant tags + TODO added possible plaintext formatter for HTML Purifier documentation + Updated ConfigDoc TODO + Improved inline comments in AttrDef/Class.php, AttrDef/CSS.php and AttrDef/Host.php + Revamped documentation into HTML, along with misc updates - HTMLPurifier_Context doesn't throw a variable reference error if you attempt to retrieve a non-existent variable . Switched to purify()-wide Context object registry . Refactored unit tests to minimize duplication . XSS attack sheet updated . configdoc.xml now has xml:space attached to default value nodes . Allow configuration directives to permit null values . Cleaned up test-cases to remove unnecessary swallowErrors() 1.1.2, released 2006-09-30 ! Add HTMLPurifier.auto.php stub file that configures include_path - Documentation updated + INSTALL document rewritten + TODO added semi-lossy conversion + API Doxygen docs' file exclusions updated + Added notes on HTML versus XML attribute whitespace handling + Noted that HTMLPurifier_ChildDef_Custom isn't being used + Noted that config object's definitions are cached versions - Fixed lack of attribute parsing in HTMLPurifier_Lexer_PEARSax3 - ftp:// URIs now have their typecodes checked - Hooked up HTMLPurifier_ChildDef_Custom's unit tests (they weren't being run) . Line endings standardized throughout project (svn:eol-style standardized) . Refactored parseData() to general Lexer class . Tester named "HTML Purifier" not "HTMLPurifier" 1.1.1, released 2006-09-24 ! Configuration option to optionally Tidy up output for indentation to make up for dropped whitespace by DOMLex (pretty-printing for the entire application should be done by a page-wide Tidy) - Various documentation updates - Fixed parse error in configuration documentation script - Fixed fatal error in benchmark scripts, slightly augmented - As far as possible, whitespace is preserved in-between table children - Sample test-settings.php file included 1.1.0, released 2006-09-16 ! Directive documentation generation using XSLT ! XHTML can now be turned off, output becomes <br> - Made URI validator more forgiving: will ignore leading and trailing quotes, apostrophes and less than or greater than signs. - Enforce alphanumeric namespace and directive names for configuration. - Table child definition made more flexible, will fix up poorly ordered elements . Renamed ConfigDef to ConfigSchema 1.0.1, released 2006-09-04 - Fixed slight bug in DOMLex attribute parsing - Fixed rejection of case-insensitive configuration values when there is a set of allowed values. This manifested in %Core.Encoding. - Fixed rejection of inline style declarations that had lots of extra space in them. This manifested in TinyMCE. 1.0.0, released 2006-09-01 ! Shorthand CSS properties implemented: font, border, background, list-style ! Basic color keywords translated into hexadecimal values ! Table CSS properties implemented ! Support for charsets other than UTF-8 (defined by iconv) ! Malformed UTF-8 and non-SGML character detection and cleaning implemented - Fixed broken numeric entity conversion - API documentation completed . (HTML|CSS)Definition de-singleton-ized 1.0.0beta, released 2006-08-16 ! First public release, most functionality implemented. Notable omissions are: + Shorthand CSS properties + Table CSS properties + Deprecated attribute transformations