HTMLT tests are a compact and easy-to-use way of making assertPurification
type tests. They take the format of:
--INI--
Ns.Directive = "directive value"
--HTML--
Input HTML
--EXPECT--
Expected HTML
Expect more features and migration to be coming soon.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
The following changes were made:
* Create --type parameter which accepts 'htmlpurifier', 'phpt', 'vtest', etc.
in order to execute only that class of tests. This supercedes --only-phpt.
* Create --quick parameter for multitest.php, run only the tips of each
release series.
* Create --distro parameter for multitest.php, supercedes --exclude-normal
and --exclude-standalone.
Also, a grep for htmlt tests was added, although add_tests() doesn't do
anything with it yet.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
By default, the DirectLex and DOMLex behavior with stray angled brackets
varied a great deal due to their implementations. A little known directive
%Core.AggressivelyFixLt attempted to match DOMLex's behavior with DirectLex's,
but it was off by default. By turning it on by default, users now enjoy these
benefits, and performance-minded users can turn it back off.
Also, several refinements to stray angled bracket parsing was made. Specifically:
* DirectLex: Handle each left angled bracket individually, which prevents
strange behavior as reported by eon.
* DOMLex: Iterate aggressive lt fix, so that stacked brackets like << are
handled.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
Previously, attempting to set %Core.Encoding to an encoding iconv didn't
know about would result in a silent failure, with the return of the
boolean false. Now it will fatally error out.
Reported-by: mcgrailm <mgm19@psu.edu>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
The bugs are:
* Undefined $is_folder variable when path is empty, and
* Improper concatenation of host and path together.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
The MakeWellFormed strategy uses metadata from HTMLDefinition in order to
determine whether or not tokens need to be converted or tags need to be
auto-closed. While this functionality is good to have, it is by no means
essential, and MakeWellFormed should not error when this information is not
available.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
Injector rewind: Injectors can now use the method rewind() in order to move
the input index backwards, so that they can reprocess tokens (other injectors
are not affected by a rewind). This functionality was necessary to implement
nested node removals in %AutoFormat.RemoveEmpty.
End to start ref: To facilitate rewinding, HTMLPurifier_Token_End now
maintains a reference called $start to the starting token for their node.
%AutoFormat.RemoveEmpty removes empty nodes. Lots of people have requested
it, so here is a partially effective implementation. Because it is implemented
as an Injector, it's not possible for it to handle newly introduced empty
nodes by later validators, specifically auto-closing and child validation.
The Injector is only meant to be used on HTML-ish languages.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
Prior to this commit, the name attribute was unilaterally removed, except
for Strict doctypes or a heavy TidyLevel, when it was converted to an id
attribute. As name is actually permitted in both HTML 4.01 Strict and
XHTML 1.0 Strict, although deprecated, the more sensible default behavior
is to allow it unless TidyLevel is heavy.
Our implementation is slightly stricter than the specs, as name attributes are
treated as first class IDs, disallowing <a name="foo" id="foo"> or duplicate
names. The former should be treated as a special case, but that will be
a separate commit.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
Previously, MakeWellFormed processed tokens and appended them onto an output
array, which was presumably immutable and inaccessible to Injectors. By
having MakeWellFormed operate directly on the input array, the strategy
saves memory and will also allow for a rewind implementation, as a unifying
the two arrays allows Injectors to easily determine an index behind them they'd
like to reset state to.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
Some implementation notes: not all comments are valid; HTML makes sure
double-hyphens and trailing hyphens are not found in comments. In addition,
two new localizable messages were added.
Requested-by: Waldo Jaquith <waldo@vqronline.org>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
If %Output.SortAttr is true, attributes are sorted to be
in alphabetical order. This was requested by frank farmer.
See also: http://htmlpurifier.org/phorum/read.php?2,1576
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
UTF-8 is a variable-width encoding that uses octets, UTF-16
is a variable-width encoding that uses 16-bit words, and
UCS-2 is an obsolete fixed-width encoding that doesn't not
support characters beyond the BMP. Explaining this would be
unwieldly, so we just removed the information.
See also: http://www.reddit.com/info/6mlqc/comments/c04aold
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
* Setup usage.xml to be binary, as XMLWriter does not honor operating
system's newline format.
* Setup various files to ignore (svn:ignore was not carried over)
* Add dummy files to prevent git from ignoring empty directories
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
- Add CurrentCSSProperty context variable
- Move Munge to its own class, derived off of SecureMunge.
- Rename %URI.SecureMunge to %URI.Munge
- Rename %URI.SecureMungeSecretKey to %URI.MungeSecretKey
- Add extra substitutions for munge
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1803 48356398-32a2-884e-a903-53898d9a118a
- URIFilter->prepare can return false in order to abort loading of the filter
- Implemented post URI filtering. Set member variable $post to true to set a URIFilter as such.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1772 48356398-32a2-884e-a903-53898d9a118a
- Convert a number of calls to use new constructor signature for Generator
- Make generator require configuration; this exposes a number of latent bugs
- Removed generator hack
- Convert Printers to use new optimized ConfigSchema format
- Hack with Printer configuration; pass an array(generator config, render config) to distinguish between output and target.
- HTML/CSS Printers need to be primed, otherwise fatal errors
- Convert a few test-cases to use member properties
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1770 48356398-32a2-884e-a903-53898d9a118a
- Improve parseCDATA algorithm to take into account newline normalization
- Fix regression in FontFamily validator. We now have a legit parser in place, albeit somewhat limited in use. Will be superseded by parser for entire grammar
- Convert EncoderTest to new format
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1769 48356398-32a2-884e-a903-53898d9a118a