0
0
mirror of https://github.com/ezyang/htmlpurifier.git synced 2024-11-14 01:08:41 +00:00
Commit Graph

48 Commits

Author SHA1 Message Date
Mateusz Turcza
ab2887e423 Fix DOM Lexer for PHP versions older than 5.4 (#225) 2019-08-09 17:01:13 -04:00
msuzuki
8c153eef3a Supported hundreds of nested HTML (#202)
* Supported hundreds of nested HTML (#201)

* Add Core.AllowParseManyTags
2019-07-14 13:15:31 -04:00
Mateusz Turcza
3cb77da11d Make tagName and node data detection hhvm compatible (#170) 2018-03-04 13:22:03 -05:00
Edward Z. Yang
64baeda65c Deal with old libxml incompatibilities.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-12-22 22:03:02 -05:00
Edward Z. Yang
7e11c271b9 Revamp entity decoding to be more like HTML5.
See %Core.LegacyEntityDecoder for more details.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-07 17:34:59 -08:00
Edward Z. Yang
5662efc936 Fix #78.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2017-03-06 22:54:54 -08:00
Edward Z. Yang
15d1a3003a Don't truncate in DOMLex when seeing closing div
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
2014-08-31 08:50:33 +01:00
Edward Z. Yang
412bae13b5 Fix quadratic behavior in DOMLex due to array_shift.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-09-17 00:48:42 -07:00
Marcus Bointon
fac747bdbd PSR-2 reformatting PHPDoc corrections
With minor corrections.

Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk>
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2013-08-17 22:27:26 -04:00
Maxim Krizhanovsky
a3d71fe606 Iterative traversal of DOM.
There are some deep DOMs you can hit the maximum nesting level
limit in tokenizeDOM (we've experienced this even with maximum nesting
level of 300). Here is an iterative version of the same function with
simple queue/dequeue approach.

Signed-off-by: Maxim Krizhanovsky <darhazer@gmail.com>
2011-01-19 22:06:40 +00:00
Edward Z. Yang
86ca784da3 Convert all to new configuration get/set format.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2009-02-21 03:00:34 -05:00
Edward Z. Yang
12b811d749 Add vim modelines to all files.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-06 04:24:59 -05:00
Edward Z. Yang
2c955af135 Remove trailing whitespace.
Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-12-06 02:28:20 -05:00
Edward Z. Yang
aa0fdeee30 Refine Lexers for parsing stray angled brackets; %Core.AggressivelyFixLt = true
By default, the DirectLex and DOMLex behavior with stray angled brackets
varied a great deal due to their implementations. A little known directive
%Core.AggressivelyFixLt attempted to match DOMLex's behavior with DirectLex's,
but it was off by default. By turning it on by default, users now enjoy these
benefits, and performance-minded users can turn it back off.

Also, several refinements to stray angled bracket parsing was made. Specifically:

* DirectLex: Handle each left angled bracket individually, which prevents
  strange behavior as reported by eon.
* DOMLex: Iterate aggressive lt fix, so that stacked brackets like << are
  handled.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>
2008-07-07 08:52:29 -04:00
Edward Z. Yang
6c9c8f2380 [3.1.0] [BACKPORT] Fix bug with comments in styles, and some associated issues
- Restore printTokens()

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1570 48356398-32a2-884e-a903-53898d9a118a
2008-02-20 00:15:44 +00:00
Edward Z. Yang
5c0a1d467a [3.1.0] [BACKPORT] Fix bug with trusted script handling for versions of libxml 2.6.28 or later
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1553 48356398-32a2-884e-a903-53898d9a118a
2008-02-16 05:44:14 +00:00
Edward Z. Yang
522c8ed7c2 [3.1.0] The bulk of autoload support added
- Add FSTools:globr()
- require_once removed from all files
- HTMLPurifier.autoload.php added to register autoload handler
- Removed redundant chdir in maintenance script
- Modified standalone to use HTMLPurifier.includes.php for including stuff
- Added maintenance script remove-require-once.php which we used once and should never use again

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1516 48356398-32a2-884e-a903-53898d9a118a
2008-01-27 01:54:41 +00:00
Edward Z. Yang
a7fab00cdd [3.0.0] Convert all $context calls away from references
- Update TODO list
- URISchemeRegistry doesn't return a reference for instance anymore, should do the same for other singletons

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1477 48356398-32a2-884e-a903-53898d9a118a
2008-01-05 00:10:43 +00:00
Edward Z. Yang
831f552ec5 [3.0.0] <style> tags can now be extracted from input HTML using %HTML.ExtractStyleBlocks. These contents can be retrieved from $context->get('StyleBlocks');
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1468 48356398-32a2-884e-a903-53898d9a118a
2007-12-12 03:29:12 +00:00
Edward Z. Yang
3ef9bdf8a2 __construct'ify all main library classes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1459 48356398-32a2-884e-a903-53898d9a118a
2007-11-29 04:29:51 +00:00
Edward Z. Yang
43f01925cd Convert to PHP 5 only codebase, adding visibility modifiers to all members and methods in the main library area (function only for test methods)
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1458 48356398-32a2-884e-a903-53898d9a118a
2007-11-25 02:24:39 +00:00
Edward Z. Yang
cb92a57e4e [2.1.2] Implement experimental HTML5 parsing using PH5P
- Fix debugger so that tokens can be printed without an index
- Fix some broken PEAR unit tests

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1383 48356398-32a2-884e-a903-53898d9a118a
2007-08-19 18:49:35 +00:00
Edward Z. Yang
710820cbe9 [2.1.0] Repair minor PHP4 regression due to undefined configuration directive
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1338 48356398-32a2-884e-a903-53898d9a118a
2007-08-02 01:48:43 +00:00
Edward Z. Yang
a6ede3804e [2.1.0] True emoticon < fix.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1260 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 16:40:18 +00:00
Edward Z. Yang
e99520ab96 Remove trailing ?> in PHP library files, add trailing newlines to all other files.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1253 48356398-32a2-884e-a903-53898d9a118a
2007-06-27 13:58:32 +00:00
Edward Z. Yang
dda4038446 [2.0.1] Reorder definition cache includes
- Update some comments, rename some variables

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1196 48356398-32a2-884e-a903-53898d9a118a
2007-06-21 23:56:19 +00:00
Edward Z. Yang
bf0d659c47 [2.0.1] Improve special case handling for <script>
- DirectLex now honors comments with greater than or less than signs in them
- Comments are transformed into script elements, ending comments are scrapped
- Buggy generator code rewritten to be more error-proof
- AttrValidator checks if token has attributes before processing
- Remove invalid documentation from Scripting
- "Commenting" of script elements switched to the more advanced version

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1189 48356398-32a2-884e-a903-53898d9a118a
2007-06-21 14:44:26 +00:00
Edward Z. Yang
bd44105ca9 [1.7.0] DOMLex will not emit errors when a custom error handler that does not honor error_reporting is used
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1152 48356398-32a2-884e-a903-53898d9a118a
2007-06-17 20:36:29 +00:00
Edward Z. Yang
01c85b71d2 Fix minor typo.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@699 48356398-32a2-884e-a903-53898d9a118a
2007-01-28 22:19:05 +00:00
Edward Z. Yang
61f852d429 Merge in PHP5 strict changes that are applicable to PHP4.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@650 48356398-32a2-884e-a903-53898d9a118a
2007-01-16 22:22:08 +00:00
Edward Z. Yang
8f515b9cda [1.2.0]
- Partially finished migrating to new Context object (done in r485).
- Created HTMLPurifier_Harness to assist with testing, ChildDefTest migrated to that framework.

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@484 48356398-32a2-884e-a903-53898d9a118a
2006-10-01 20:47:07 +00:00
Edward Z. Yang
6c04bbdac1 [1.1.1]
- Update documentation
- Fix parse error in configuration documentation

git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@444 48356398-32a2-884e-a903-53898d9a118a
2006-09-24 02:06:12 +00:00
Edward Z. Yang
b93892a3b6 [1.1.1] Update documentation and TODO.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@436 48356398-32a2-884e-a903-53898d9a118a
2006-09-17 21:59:40 +00:00
Edward Z. Yang
6de42d8d1d Attempt to fix strange foreach troubles.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@364 48356398-32a2-884e-a903-53898d9a118a
2006-09-01 16:17:56 +00:00
Edward Z. Yang
0e715bdda6 Fix PHP 5.0 bug involving isset and DOM.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@354 48356398-32a2-884e-a903-53898d9a118a
2006-09-01 14:44:50 +00:00
Edward Z. Yang
89376a11e3 Remove a huge swath of duplicated function calls by factoring them into a normalize() function. Also made DirectLex's variable names consistent with the rest of the classes.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@340 48356398-32a2-884e-a903-53898d9a118a
2006-08-29 20:05:26 +00:00
Edward Z. Yang
1de3088276 Refactor encoding and entity specific processing to HTMLPurifier_Encoder. We also need to refactor the escaping to this class too.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@339 48356398-32a2-884e-a903-53898d9a118a
2006-08-29 19:36:40 +00:00
Edward Z. Yang
7588068b7b Hacky full docuement parse thingy removed from DOMLex, fixes barfing on full HTML documents.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@328 48356398-32a2-884e-a903-53898d9a118a
2006-08-27 22:06:58 +00:00
Edward Z. Yang
973cc43b64 Malformed UTF-8 and non-SGML character detection and cleaning implemented
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@303 48356398-32a2-884e-a903-53898d9a118a
2006-08-19 17:53:59 +00:00
Edward Z. Yang
53808ee34a Attempt to fix inconsistent DOM behavior regarding insertion of P tags.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@302 48356398-32a2-884e-a903-53898d9a118a
2006-08-19 16:24:17 +00:00
Edward Z. Yang
5690c9e0a2 Further optimization: 20% - 12%. Also fixed broken benchmarks.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@266 48356398-32a2-884e-a903-53898d9a118a
2006-08-15 21:19:45 +00:00
Edward Z. Yang
acd7ceb940 Major optimization on tokenizeDOM(), reduce execution time from 75% to 20% by passing tokens by reference and using a token factory.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@265 48356398-32a2-884e-a903-53898d9a118a
2006-08-15 20:19:16 +00:00
Edward Z. Yang
9a35dfa6b9 Add support for full document parsing, aka discard everything that's not in-between body if applicable.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@258 48356398-32a2-884e-a903-53898d9a118a
2006-08-15 00:53:24 +00:00
Edward Z. Yang
d7140f2e05 Outfit a bunch of other classes so they can accept a configuration object. Put in basic scaffolding for extractBody() functionality.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@257 48356398-32a2-884e-a903-53898d9a118a
2006-08-15 00:31:12 +00:00
Edward Z. Yang
299236f695 Fix DOM bug where default encoding for HTML docs is not UTF-8.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@252 48356398-32a2-884e-a903-53898d9a118a
2006-08-14 13:27:18 +00:00
Edward Z. Yang
609977f9f5 Add CDATA support to the Lexers, as well as give PEARSax3 entity replacement.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@106 48356398-32a2-884e-a903-53898d9a118a
2006-07-23 23:04:34 +00:00
Edward Z. Yang
bcc2b09ac7 Finish documenting PEARSax3, touch up the other docs. Nuke the original lexer.txt document.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@102 48356398-32a2-884e-a903-53898d9a118a
2006-07-23 18:56:00 +00:00
Edward Z. Yang
1ab3ae160a Move classes into Zend style setup.
git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@88 48356398-32a2-884e-a903-53898d9a118a
2006-07-22 15:38:41 +00:00