0
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2024-12-22 16:31:53 +00:00
Commit Graph

558 Commits

Author SHA1 Message Date
Edward Z. Yang
0124605918 Fix CSS URL innerHTML/cssText escaping bug.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-27 21:24:32 +01:00
Edward Z. Yang
afb007d22f Protect against font family innerHTML/cssText attacks.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-27 20:35:43 +01:00
Edward Z. Yang
0dd9e4faf4 Fix Internet Explorer innerHTML bug.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-27 11:50:52 +01:00
Edward Z. Yang
94ed3b1231 Implement CSS.AllowedFonts.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-24 22:54:39 +00:00
Edward Z. Yang
6a6c0ed5d7 Don't autoclose if no parents support the tag.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-22 00:26:41 +00:00
Edward Z. Yang
ee9c70ab7f Fix E_NOTICE from indexing into empty string.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-03-17 17:33:11 +00:00
Edward Z. Yang
b4469f17aa Fix missing numeric entities (shows up when DirectLexing).
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-02-27 11:58:37 +00:00
Edward Z. Yang
e76f4b45d0 Dramatically rewrite null host URI handling.
Basically, browsers don't parse what should be valid URIs correctly, so
we have to go through some backbends to accomodate them.  Specifically,
for browseable URIs, the following URIs have unintended behavior:

    - ///example.com
    - http:/example.com
    - http:///example.com

Furthermore, if the path begins with //, modifying these URLs must
be done with care, as if you remove the host-name component, the
parse tree changes.

I've modified the engine to follow correct URI semantics as much
as possible while outputting browser compatible code, and invalidate
the URI in cases where we can't deal.  There has been a refactoring
of URIScheme so that this important check is always performed,
introducing a new member variable allow_empty_host which is true
on data, file, mailto and news schemes.

This also fixes bypass bugs on URI.Munge.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-01-25 18:56:46 +00:00
Edward Z. Yang
a32d5b52e1 Fix embedding flash on non-IE browsers and allow more wmode.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-01-22 12:28:57 +00:00
Maxim Krizhanovsky
a3d71fe606 Iterative traversal of DOM.
There are some deep DOMs you can hit the maximum nesting level
limit in tokenizeDOM (we've experienced this even with maximum nesting
level of 300). Here is an iterative version of the same function with
simple queue/dequeue approach.

Signed-off-by: Maxim Krizhanovsky <darhazer@gmail.com>
2011-01-19 22:06:40 +00:00
Petr Skoda
78c4e62245 Add new Cache.SerializerPermissions option. 2011-01-13 22:57:40 +00:00
Edward Z. Yang
b63569ac22 Fix bad interaction between bootstrap autoloader and Zend Debugger/APC.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-12-31 09:48:28 +00:00
Edward Z. Yang
f3d050c517 Fix two bugs with caching of customized raw definitions.
The first bug is that we will repeatedly write out the result
of a customized raw definition to the filesystem, even when a cache
entry already exists.

The second bug is that caching these definitions doesn't actually
work (the cache entry is written but never used.)  A new API
for retrieving raw definitions permits the user to take advantage
of caching.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-12-30 23:51:53 +00:00
Edward Z. Yang
cfc4ee1faf Add initial implementation of CSS.Trusted.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-11-12 18:45:03 +00:00
Edward Z. Yang
598c5b60c9 Add sanity check against ze1_compatibility_mode.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-11-12 16:15:03 +00:00
Edward Z. Yang
feeffe6ed2 Check if schema.ser was corrupted.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-10-29 14:47:40 +01:00
Edward Z. Yang
4754d407aa Fix removal of id with DirectLex by preserving armor.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-10-28 17:25:31 +01:00
Nick Pope
0b9db1f54b Allow non-static autoload methods w/ PHP >= 5.2.11
HTML Purifier loads itself as the first autoload function by
unregistering all existing functions and re-registering them after
registering itself.

Originally an exception was thrown when a non-static object method was
encountered as the behaviour of spl_autoload_functions() did not return
the object instance, but only the class name.  This was filed on PHP
bugs (#44144).

The bug was fixed for PHP >= 5.2.11 and >= 5.3

Signed-off-by: Nick Pope <nick@nickpope.me.uk>
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-10-28 17:25:17 +01:00
Edward Z. Yang
1d4a38d055 Escape CDATA before handling conditional comments.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-28 12:11:26 -04:00
Edward Z. Yang
8c80349f9d Implement HTML.Nofollow for external links.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-28 12:01:57 -04:00
Edward Z. Yang
d848c99b74 Make IE conditional comment matching ungreedy.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-28 10:22:38 -04:00
Edward Z. Yang
882ffed9ba Release 4.2.0.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-15 02:52:57 -04:00
Edward Z. Yang
86990a21f1 Rename newline normalization directive to something better.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-15 02:50:39 -04:00
Edward Z. Yang
632bf2bbd4 Shift to 4.2.0 release cycle.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-14 23:38:51 -04:00
Edward Z. Yang
ec86598446 Add support for file:// URI scheme.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-09 00:01:26 -04:00
Edward Z. Yang
7c91104532 Implement HTML.FlashAllowFullScreen.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-08 23:39:20 -04:00
Edward Z. Yang
eac628f490 Add %CSS.ForbiddenProperties directive.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-04 02:59:03 -04:00
Edward Z. Yang
92913bc816 Add documentation about configuration directive types.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-04 02:28:53 -04:00
Edward Z. Yang
479d793562 Reword documentation to be clearer, and give warning on common user error.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-09-04 01:31:20 -04:00
Edward Z. Yang
e2c15f1c98 Fix Mac Snow Leopard APC bug.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-08-26 21:40:58 -07:00
Edward Z. Yang
c04a441b3e Actually make URI.DisableResources do something.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-06-30 05:59:17 -07:00
Edward Z. Yang
1bed8b6d5f Added %Core.RemoveProcessingInstructions.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-06-20 18:26:44 -07:00
Edward Z. Yang
33afd7d9e0 Fix improper handling of IE conditional comments.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-06-18 06:08:54 -07:00
Edward Z. Yang
18e538317a Release 4.1.1.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-31 20:17:31 -07:00
Edward Z. Yang
96a4193fc9 Fix undefined index warnings in maintenance scripts.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-31 20:07:27 -07:00
Edward Z. Yang
00c66fa9cb Fix bug in parsing single attribute with entities.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-31 19:44:18 -07:00
Edward Z. Yang
d3abcb90e3 Rewrite CSS url() and font-family output logic.
The new logic is as follows:

* Given a URL to insert into url(), check that it is properly URL
  encoded (in particular, a doublequote and backslash never occurs
  within it) and then place it as url("http://example.com").

* Given a font name, if it is strictly alphanumeric, it is safe to omit
  quotes. Otherwise, wrap in double quotes and replace '"' with '\22 '
  (note trailing space) and '\' with '\5C ' (ditto).

We introduce expandCSSEscape() which is a hack for common parsing
idioms in CSS; this means that CSS escapes are now recognized inside
URLs as well as unquoted font names.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-31 18:45:21 -07:00
Edward Z. Yang
df3100b1b3 Make test script less chatty when log_errors is on.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-20 21:50:44 -04:00
Edward Z. Yang
143e1ad718 Remove shebang and +x from test script.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-20 21:21:26 -04:00
Edward Z. Yang
875b0febde Fix infinite loop involving wrapping formedness.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-17 23:22:51 -04:00
Edward Z. Yang
3166b8a10f Fix bug in background-position with center keyword.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-05 15:08:57 -04:00
Edward Z. Yang
1a70bffd5a Emit errors when body is extracted.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-05-04 13:41:09 -04:00
Edward Z. Yang
f4c6e10ff7 Release 4.1.0.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-04-26 18:31:40 -04:00
Edward Z. Yang
da94d3d6ac Always quote the contents of url() in CSS.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-04-26 12:10:15 -04:00
Edward Z. Yang
8ef4fb22db Support for flashvars in HTML.SafeEmbed.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-30 13:33:13 -04:00
Edward Z. Yang
70a7a3f5dd Handle <ol><ol> properly by adding missing <li> tag.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-10 00:58:37 -05:00
Edward Z. Yang
0229458f8f Implement Internet Explorer compatibility code for embedded content.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-08 01:56:40 -05:00
Edward Z. Yang
dc90e8e85b Support flashvars.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-08 01:16:57 -05:00
Edward Z. Yang
97125ed18b Implement data URI scheme.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-07 21:45:39 -05:00
Edward Z. Yang
aea7d02dfe Support YouTube slideshow embedding.
YouTube slideshows contain a /cp/, not a /v/, in their URL;
relax the YouTube filter to allow them.

Signed-off-by: Nigel McNie <nigel@catalyst.net.nz>
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2010-03-07 18:57:22 -05:00
Edward Z. Yang
5b4e5c983e Support proprietary height attribute on table.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-08-27 20:17:24 -04:00
Edward Z. Yang
2b72d0445f Add 4.1.0 release NEWS entry.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-09 21:03:46 -04:00
Edward Z. Yang
53ff3e2744 Release 4.0.0.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-07 22:41:01 -04:00
Edward Z. Yang
ba9fd175d7 Make extractBody not terminate prematurely on first </body>.
Previously, if two </body> tags were present, HTML Purifier
would truncate everything after the first </body>.  This is
not ideal behavior; so HTML Purifier has been changed to
match up to the last </body>.

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-07 22:19:04 -04:00
Edward Z. Yang
4d27906b02 Make %URI.Munge respect %URI.Host (don't munge).
%URI.Munge incorrectly munged URIs that pointed to the
same host as the current website (it did, however, have
the correct behavior for when the munge URL was on the
same server).

Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2009-07-06 22:04:51 -04:00
Edward Z. Yang
c7594487a2 Fix inability to totally override content model.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-06-10 18:24:52 -04:00
Edward Z. Yang
6e66dc9cad Add HTMLPurifier_config->serialize()
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-30 00:25:14 -04:00
Edward Z. Yang
5bf7ac4e9f Add docs and facilities for having separate directories of schemas.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-29 22:16:35 -04:00
Edward Z. Yang
777781a95c Don't have mute error handler be private.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-29 17:59:30 -04:00
Edward Z. Yang
84abae08f5 Relax allowed values of class for certain doctypes, see %Attr.ClassUseCDATA
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-26 01:07:40 -04:00
Edward Z. Yang
10e2d32a79 Lock configuration objects to a single namespace, to help prevent bugs.
* Also, fix a slight bug with URI definition clearing.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-25 23:38:49 -04:00
Edward Z. Yang
baf053b016 Implement %Attr.AllowedClasses and %Attr.ForbiddenClasses.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-25 22:08:45 -04:00
Edward Z. Yang
bfbe29d5a1 Rename ExtractStyleBlocks configuration parameters.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-25 21:54:39 -04:00
Edward Z. Yang
e194b8efc6 Rename AutoFormatParam.PurifierLinkifyDocURL.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-05-25 21:51:08 -04:00
Edward Z. Yang
e3c2063f69 Implement %AutoFormat.RemoveEmpty.RemoveNbsp, by popular demand.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-04-09 00:53:19 -04:00
Edward Z. Yang
398a02039e Implement %HTML.Attr.Name.UseCDATA which relaxes name validation rules.
Sponsored-by: Ian Cook <thinkspill@gmail.com>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-03-20 19:34:38 -04:00
Edward Z. Yang
eaa906f8fc Implement configuration inheritance.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-21 03:01:02 -05:00
Edward Z. Yang
b107eec452 Revamp configuration backend.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-21 03:00:33 -05:00
Edward Z. Yang
fcbf724e6e Make name="" and id="" play nicely together.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-21 02:58:30 -05:00
Edward Z. Yang
92344cc83a Add 4.0.0 release information.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-16 22:00:22 -05:00
Edward Z. Yang
e9f529e78f Release 3.3.0.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-16 17:18:31 -05:00
Edward Z. Yang
1d70929eba Add text parameter to unit tests, forces text output.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-16 17:18:30 -05:00
Edward Z. Yang
77f57aa264 Fix CSSDefinition Printer problems with important decorator.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-15 14:11:22 -05:00
Edward Z. Yang
db218c7b2b Fix YouTube rendering problem on versions of Firefox.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-15 14:11:21 -05:00
Edward Z. Yang
b9094d5ec8 Convert HTMLPurifier_Config to use property list backend.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-02 18:42:23 -05:00
Edward Z. Yang
3dfcd016d3 Fix standards-compliance issue with YouTube filter with double hyphens.
Thanks Pierre Attar for reporting.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-12 16:27:23 -05:00
Edward Z. Yang
12b811d749 Add vim modelines to all files.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-06 04:24:59 -05:00
Edward Z. Yang
781f9a4084 Update PH5P.patch, and add NEWS entry for trailing whitespace purge.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-06 02:30:52 -05:00
Edward Z. Yang
2c955af135 Remove trailing whitespace.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-06 02:28:20 -05:00
Edward Z. Yang
3a6b63dff1 Generic implementation of property-lists.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-06 00:43:42 -05:00
Edward Z. Yang
90110a4e3a Fix broken test-suite in early versions of PHP.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-05 15:50:59 -05:00
Edward Z. Yang
5cfecebb33 Fix bug involving whitespace-only nodes. Thanks Eric Wald for reporting.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-02 20:13:47 -05:00
Edward Z. Yang
f5cd2c07ea Implement 'overflow' CSS property.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-11-27 16:14:50 -05:00
Edward Z. Yang
6691676666 Fix newline issues in tests.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-11-26 15:30:59 -05:00
Edward Z. Yang
e128c09132 Fix bug with testEncodingSupportsASCII() with strange iconv
implementations.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-11-26 15:17:09 -05:00
Edward Z. Yang
6fe6cc8901 Update gitignore with post-release files, new NEWS entry and spellcheck UTF-8.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-11-01 01:51:51 -04:00
Edward Z. Yang
280211f70b Release 3.2.0.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-31 16:30:54 -04:00
Edward Z. Yang
0e6e2c4edf Bump descriptions to 3.2.0.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-31 12:25:43 -04:00
Edward Z. Yang
3a2fd0b5db Improve floating point scaling in UnitConverter.
When precision dictates that a number be zero padded, we cannot give sprintf()
a negative precision specifier.  This commit implements manual negative precision
printing of floats, taking into account common rounding errors with floating
point numbers.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-24 12:50:59 -04:00
Edward Z. Yang
25fa53c15b Fix misleading entry in NEWS.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-24 12:28:19 -04:00
Edward Z. Yang
d304c5c976 Detect if domxml extension is loaded, and use DirectLex accordingly.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-08 17:06:10 -04:00
Edward Z. Yang
f7bc0b0875 Implement %Attr.DefaultImageAlt, allowing overriding default behavior for alt attributes.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-06 14:51:03 -04:00
Edward Z. Yang
fa413e96ac Implement Injector->handleEnd, with lots of refactoring for injector.
Previous design of injector streaming involved editability only to start, empty
and text tokens, because they could be safely modified without causing formedness
errors.  By modifying notifyEnd to operate before MakeWellFormed's safeguards
kick into effect, it can be converted into a handle function, allowing for
arbitrary modification of end tags.

This change involved quite a bit of restructuring of the MakeWellFormed code,
including the moving of end of document tags to inside the loop, so rewinding
on those tags would be functional, increased reuse of the end tag codepath by
code that inserts end tags (as they could be changed out from under you), and
processToken modified to have an extra parameter to force re-processing of
a token if the original token was an end token.

We're not exactly sure if handleEnd works at this point, but the important
talking point about this refactoring is that nothing else broke. Also, a number
of convenience functions were moved from AutoParagraph to the Injector
supertype (specifically: forward, forwardToEndToken, backward, and current).

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-10-01 00:54:51 -04:00
Edward Z. Yang
d0fdcc103e Add support for proprietary "background" attribute in table elements.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-27 21:19:35 -04:00
Edward Z. Yang
ed7983b559 Refactor lexer instantiation logic with exceptions and forced line tracking.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-05 14:04:23 -04:00
Edward Z. Yang
c6914dce51 Track column numbers in addition to line numbers.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-09-01 14:10:10 -04:00
Edward Z. Yang
9977350143 Fix bug with anonymous module and the SafeObject/SafeEmbed modules.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-31 19:06:25 -04:00
Edward Z. Yang
c9b6f125aa Forms implementation for %HTML.Trusted. Some backend changes:
* Added Charsets and Character attribute types
* Fix a heavily recursive form of ContentSets, this allows a content-set
  to include another content-set which includes another content-set, and
  so forth.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-15 18:57:44 -04:00
Edward Z. Yang
dc28346677 Fix bug where absolute paths with dots/double-dots were not collapsed.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-15 13:12:54 -04:00
Edward Z. Yang
617f70a8ac Improve auto-paragraph to preserve newlines and handle edge-cases better.
This is a very large commit that includes numerous improvements to the
AutoParagraph injector.  These are:

* Rewritten flow control of the injector to use almost exclusively
  binary conditionals.
* Improved inline documentation with "State" comments, which give concise
  examples of what the token stack looks like at flow points.
* Documentation for all flow branches, even those with no actions.
* Factoring out of common operations to improve readability, especially the
  new iterator private methods.
* Expanded test-suite which covers new flow points, and corrects some errors
  in previous cases.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-10 00:32:29 -04:00
Edward Z. Yang
0423985b45 Detect if HTML support in DOM is disabled by checking loadHTML().
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-07 18:44:21 -04:00
Edward Z. Yang
e013bc9126 Fix bug involving autoclose and inline elements in strict <blockquote>.
The newest autoclose code uses the elements property in whether or not an
element should be closed by a particular tag.  The heuristic is simple; if
the element doesn't allow that tag as a child, it closes the parent
container.  This doesn't work, however, with <blockquote>, which while not
allowing inline styles under Strict doctypes, requires them to be passed
through MakeWellFormed.

The fix was to transition MakeWellFormed to call a method to retrieve the
elements, and then have StrictBlockquote implement a special version of
this method.  Future versions of HTML Purifier may be more flexible in this
regard--further study of the HTML5 specification is required.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-01 20:52:06 -04:00
Edward Z. Yang
1d90bb2397 Allow <![CDATA[<body>...</body>]]> not to trigger Core.ConvertDocumentToFragment
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-01 19:06:28 -04:00
Edward Z. Yang
03dabec2c0 Fix documentation error in Filter.ExtractStyleBlocks and give better example.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-08-01 18:58:47 -04:00
Edward Z. Yang
85090520f1 Add double-munging protection by checking if the domains are the same.
Previously, if an absolute munge URL location was used, HTML passed through
HTML Purifier multiple times would be munged multiple times. This patch
checks if the output URI has the same URI as the input URI; if they do,
the munge is considered unnecessary and discarded.

Requested-by: Chris <justbittin@gmail.com>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-26 22:45:19 -06:00
Edward Z. Yang
3b6aa10592 %URI.DisableExternal(Resources) uses %URI.Base if %URI.Host is not available.
As part of its duties, URIDefinition determine the base URL and the host URL
of the page based on the two corresponding configuration directives. The
DisableExternal URIFilter, however, bypassed this check by directly checking
%URI.Host. This fix forwards the call through URIDefinition.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-10 18:46:46 -04:00
Edward Z. Yang
334ffac5b4 Various improvements to test script command line options, i.e. --type
The following changes were made:
* Create --type parameter which accepts 'htmlpurifier', 'phpt', 'vtest', etc.
  in order to execute only that class of tests. This supercedes --only-phpt.
* Create --quick parameter for multitest.php, run only the tips of each
  release series.
* Create --distro parameter for multitest.php, supercedes --exclude-normal
  and --exclude-standalone.

Also, a grep for htmlt tests was added, although add_tests() doesn't do
anything with it yet.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:59:29 -04:00
Edward Z. Yang
a227cb483a Allow empty sections in string hashes; previously they were left undefined.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:57:16 -04:00
Edward Z. Yang
aa0fdeee30 Refine Lexers for parsing stray angled brackets; %Core.AggressivelyFixLt = true
By default, the DirectLex and DOMLex behavior with stray angled brackets
varied a great deal due to their implementations. A little known directive
%Core.AggressivelyFixLt attempted to match DOMLex's behavior with DirectLex's,
but it was off by default. By turning it on by default, users now enjoy these
benefits, and performance-minded users can turn it back off.

Also, several refinements to stray angled bracket parsing was made. Specifically:

* DirectLex: Handle each left angled bracket individually, which prevents
  strange behavior as reported by eon.
* DOMLex: Iterate aggressive lt fix, so that stacked brackets like << are
  handled.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:52:29 -04:00
Edward Z. Yang
ba418a1f19 Redirect stderr to stdout when calling flush.php
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:15:36 -04:00
Edward Z. Yang
c845f0bb78 Give warnings when attempting to use encoding iconv doesn't support.
Previously, attempting to set %Core.Encoding to an encoding iconv didn't
know about would result in a silent failure, with the return of the
boolean false. Now it will fatally error out.

Reported-by: mcgrailm <mgm19@psu.edu>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:14:32 -04:00
Edward Z. Yang
594268ca3b Fix two bugs in MakeAbsolute filter involving base URIs that have empty path.
The bugs are:
* Undefined $is_folder variable when path is empty, and
* Improper concatenation of host and path together.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:12:44 -04:00
Edward Z. Yang
965be3bd73 Add support for unrecognized elements in MakeWellFormed.
The MakeWellFormed strategy uses metadata from HTMLDefinition in order to
determine whether or not tokens need to be converted or tags need to be
auto-closed. While this functionality is good to have, it is by no means
essential, and MakeWellFormed should not error when this information is not
available.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-05 03:11:29 -04:00
Edward Z. Yang
700d5bcbfc Implement %AutoFormat.RemoveEmpty, end to start ref, and injector rewind.
Injector rewind: Injectors can now use the method rewind() in order to move
the input index backwards, so that they can reprocess tokens (other injectors
are not affected by a rewind). This functionality was necessary to implement
nested node removals in %AutoFormat.RemoveEmpty.

End to start ref: To facilitate rewinding, HTMLPurifier_Token_End now
maintains a reference called $start to the starting token for their node.

%AutoFormat.RemoveEmpty removes empty nodes. Lots of people have requested
it, so here is a partially effective implementation. Because it is implemented
as an Injector, it's not possible for it to handle newly introduced empty
nodes by later validators, specifically auto-closing and child validation.
The Injector is only meant to be used on HTML-ish languages.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 16:09:14 -04:00
Edward Z. Yang
fd384129bf Proper support for name attribute in <a> and <img>
Prior to this commit, the name attribute was unilaterally removed, except
for Strict doctypes or a heavy TidyLevel, when it was converted to an id
attribute. As name is actually permitted in both HTML 4.01 Strict and
XHTML 1.0 Strict, although deprecated, the more sensible default behavior
is to allow it unless TidyLevel is heavy.

Our implementation is slightly stricter than the specs, as name attributes are
treated as first class IDs, disallowing <a name="foo" id="foo"> or duplicate
names. The former should be treated as a special case, but that will be
a separate commit.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 15:44:27 -04:00
Edward Z. Yang
f8b47c64dd Make Strategy_MakeWellFormed operate in place.
Previously, MakeWellFormed processed tokens and appended them onto an output
array, which was presumably immutable and inaccessible to Injectors. By
having MakeWellFormed operate directly on the input array, the strategy
saves memory and will also allow for a rewind implementation, as a unifying
the two arrays allows Injectors to easily determine an index behind them they'd
like to reset state to.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-27 01:33:48 -04:00
Edward Z. Yang
dba3ed7770 [3.1.2] Implement comments when %HTML.Trusted is on.
Some implementation notes: not all comments are valid; HTML makes sure
double-hyphens and trailing hyphens are not found in comments. In addition,
two new localizable messages were added.

Requested-by: Waldo Jaquith <waldo@vqronline.org>
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-25 23:12:19 -04:00
Edward Z. Yang
24f6db6fb2 [3.1.2] Add %Output.SortAttr to deal with FCKeditor bug
If %Output.SortAttr is true, attributes are sorted to be
in alphabetical order. This was requested by frank farmer.

See also: http://htmlpurifier.org/phorum/read.php?2,1576

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-06-24 22:36:27 -04:00
Edward Z. Yang
a84b6d5be0 Add new NEWS entries
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1824 48356398-32a2-884e-a903-53898d9a118a
2008-06-21 05:00:17 +00:00
Edward Z. Yang
7015aaff46 Release 3.1.1
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1808 48356398-32a2-884e-a903-53898d9a118a
2008-06-19 21:43:57 +00:00
Edward Z. Yang
511dfe2d4a [3.1.1] Update Munge docs.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1804 48356398-32a2-884e-a903-53898d9a118a
2008-06-19 19:06:55 +00:00
Edward Z. Yang
463aa3a0fa [3.1.1] General munge improvements
- Add CurrentCSSProperty context variable
- Move Munge to its own class, derived off of SecureMunge.
- Rename %URI.SecureMunge to %URI.Munge
- Rename %URI.SecureMungeSecretKey to %URI.MungeSecretKey
- Add extra substitutions for munge

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1803 48356398-32a2-884e-a903-53898d9a118a
2008-06-18 03:29:27 +00:00
Edward Z. Yang
643ed1bddc [3.1.1] Fix text-decoration: none bug
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1799 48356398-32a2-884e-a903-53898d9a118a
2008-06-17 03:12:50 +00:00
Edward Z. Yang
261aa1aeaa Update news, installer, and add an extra specimen.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1796 48356398-32a2-884e-a903-53898d9a118a
2008-06-15 22:13:16 +00:00
Edward Z. Yang
36bd06d53e [3.1.1] Implement SafeEmbed. Also, miscellaneous bugfixes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1781 48356398-32a2-884e-a903-53898d9a118a
2008-06-10 01:18:03 +00:00
Edward Z. Yang
13eb016e06 [3.1.1] Implement SafeObject.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1780 48356398-32a2-884e-a903-53898d9a118a
2008-06-10 00:13:44 +00:00
Edward Z. Yang
32025a12e1 [3.1.1] Allow injectors to be specified by modules.
- Make method for URI implemented
- Split out checkNeeded in Injector from prepare

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1779 48356398-32a2-884e-a903-53898d9a118a
2008-06-09 01:23:05 +00:00
Edward Z. Yang
8d1f1e8e73 [3.1.1] Improved adherence to Unicode by checking for non-character codepoints. Thanks Geoffrey Sneddon for reporting.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1773 48356398-32a2-884e-a903-53898d9a118a
2008-05-26 21:27:52 +00:00
Edward Z. Yang
322288e6c0 [3.1.1] Implement %URI.SecureMunge and %URI.SecureMungeSecretKey, thanks Chris!
- URIFilter->prepare can return false in order to abort loading of the filter
- Implemented post URI filtering. Set member variable $post to true to set a URIFilter as such.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1772 48356398-32a2-884e-a903-53898d9a118a
2008-05-26 16:26:47 +00:00
Edward Z. Yang
14d934c7ca [3.1.1] Land vs's HTMLPurifier_Generator patch, and a number of other bugfixes for that change
- Convert a number of calls to use new constructor signature for Generator
- Make generator require configuration; this exposes a number of latent bugs
- Removed generator hack
- Convert Printers to use new optimized ConfigSchema format
- Hack with Printer configuration; pass an array(generator config, render config) to distinguish between output and target.
- HTML/CSS Printers need to be primed, otherwise fatal errors
- Convert a few test-cases to use member properties

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1770 48356398-32a2-884e-a903-53898d9a118a
2008-05-26 04:05:48 +00:00
Edward Z. Yang
bb16d8eae5 [3.1.1] Fix Shift_JIS encoding wonkiness with yen symbols and whatnot
- Improve parseCDATA algorithm to take into account newline normalization
- Fix regression in FontFamily validator. We now have a legit parser in place, albeit somewhat limited in use. Will be superseded by parser for entire grammar
- Convert EncoderTest to new format

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1769 48356398-32a2-884e-a903-53898d9a118a
2008-05-25 05:40:20 +00:00
Edward Z. Yang
10530d7f81 [3.1.1] Fix stray backslashes in font-family.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1768 48356398-32a2-884e-a903-53898d9a118a
2008-05-24 18:19:36 +00:00
Edward Z. Yang
c7e172f660 Update news: not 1/2, more like 1/3
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1767 48356398-32a2-884e-a903-53898d9a118a
2008-05-23 17:15:13 +00:00
Edward Z. Yang
8ab30e24b7 [3.1.1] Memory optimizations for ConfigSchema. Changes include:
- Elimination of ConfigDef and subclasses in favor of stdclass. Most property names stay the same
- Added benchmark script for ConfigSchema
- Types are internally handled as magic integers. Use HTMLPurifier_VarParser->getTypeName to convert to human readable form. HTMLPurifier_VarParser still accepts strings.
- Parser in config schema only used for legacy interface


git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1764 48356398-32a2-884e-a903-53898d9a118a
2008-05-23 16:43:24 +00:00
Edward Z. Yang
eb9f9bc7f6 [3.1.1] Round up imagecrash support with HTML.MaxImgLength
- Add $max to AttrDef/HTML/Pixels.php
- Add %HTML.MaxImgLength
- CSS width/height allows percents when MaxImgLength is disabled


git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1762 48356398-32a2-884e-a903-53898d9a118a
2008-05-23 02:09:43 +00:00
Edward Z. Yang
fcebb7731d [3.1.1] Migrate all HTMLModules to use setup($config) rather than __construct
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1761 48356398-32a2-884e-a903-53898d9a118a
2008-05-22 19:36:59 +00:00
Edward Z. Yang
8d0d0d1a03 [3.1.1] construct() to setup() in HTMLModules
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1760 48356398-32a2-884e-a903-53898d9a118a
2008-05-22 04:34:19 +00:00
Edward Z. Yang
80f59206d7 [3.1.1] Implement percent encoding for URI query and fragment
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1758 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 02:58:41 +00:00
Edward Z. Yang
af3f5190dc [3.1.1] Lazy token updating for HTMLPurifier/AttrValidator.php
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1757 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 02:30:27 +00:00
Edward Z. Yang
5620241165 [3.1.1] Disable percent height/width attributes for img
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1756 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 02:01:25 +00:00
Edward Z. Yang
c06727190e Update NEWS accordingly.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1755 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 01:58:48 +00:00
Edward Z. Yang
1a95852007 [3.1.1] Implement more robust imagecrash protection for CSS width/height.
- Change API for HTMLPurifier_AttrDef_CSS_Length
- Implement HTMLPurifier_AttrDef_Switch class
- Implement HTMLPurifier_Length->compareTo, and make make() accept object instances

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1754 48356398-32a2-884e-a903-53898d9a118a
2008-05-21 01:56:48 +00:00
Edward Z. Yang
64b5581bf2 [3.1.1] Have CSS/Length.php use the new Length class. Also, put onus of non-negative to callee, which would compare $n.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1751 48356398-32a2-884e-a903-53898d9a118a
2008-05-20 23:15:20 +00:00
Edward Z. Yang
16fa73afa0 [3.1.1] Added HTMLPurifier_UnitConverter and HTMLPurifier_Length for convenient handling of CSS-style lengths.
- Fixed another de-underscoring in the SimpleTest library

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1746 48356398-32a2-884e-a903-53898d9a118a
2008-05-20 01:19:00 +00:00
Edward Z. Yang
32a6afa27c Update news.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1744 48356398-32a2-884e-a903-53898d9a118a
2008-05-18 20:12:25 +00:00
Edward Z. Yang
587d642826 Release 3.1.0.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1728 48356398-32a2-884e-a903-53898d9a118a
2008-05-18 05:46:06 +00:00
Edward Z. Yang
0bef016271 [3.1.0] Get testing working again for all versions
- Standalone testing setup properly with autoload
- Bootstrap autoloader deals more robustly with classes that don't exist, preventing class_exists($class, true) from barfing.
- Cleanup $_reporter to $reporter

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1727 48356398-32a2-884e-a903-53898d9a118a
2008-05-16 01:49:33 +00:00
Edward Z. Yang
86b1da9b6f [3.1.0] Fixed bug with fallback languages in LanguageFactory
- Also, reverted bogus Generator changes

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1723 48356398-32a2-884e-a903-53898d9a118a
2008-05-15 23:04:46 +00:00
Edward Z. Yang
cb5d5d0648 [3.1.0] Revamp URI handling of percent encoding and validation.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1709 48356398-32a2-884e-a903-53898d9a118a
2008-05-14 02:19:00 +00:00
Edward Z. Yang
77ce3e8b4a [3.1.0] Extend scanner to catch $this->config; chmod new directories from Serializer. I'm not exactly sure what the implications of the bugfix are, but hopefully it won't blow up.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1708 48356398-32a2-884e-a903-53898d9a118a
2008-05-13 03:17:38 +00:00