htmlpurifier/library/HTMLPurifier/URIScheme.php

<?php

/**
 * Validator for the components of a URI for a specific scheme
 */
abstract class HTMLPurifier_URIScheme
{

    /**
     * Scheme's default port (integer). If an explicit port number is
     * specified that coincides with the default port, it will be
     * elided.
     * @type int
     */
    public $default_port = null;

    /**
     * Whether or not URIs of this scheme are locatable by a browser
     * http and ftp are accessible, while mailto and news are not.
     * @type bool
     */
    public $browsable = false;

    /**
     * Whether or not data transmitted over this scheme is encrypted.
     * https is secure, http is not.
     * @type bool
     */
    public $secure = false;

    /**
     * Whether or not the URI always uses <hier_part>, resolves edge cases
     * with making relative URIs absolute
     * @type bool
     */
    public $hierarchical = false;

    /**
     * Whether or not the URI may omit a hostname when the scheme is
     * explicitly specified, ala file:///path/to/file. As of writing,
     * 'file' is the only scheme that browsers support his properly.
     * @type bool
     */
    public $may_omit_host = false;

    /**
     * Validates the components of a URI for a specific scheme.
     * @param HTMLPurifier_URI $uri Reference to a HTMLPurifier_URI object
     * @param HTMLPurifier_Config $config
     * @param HTMLPurifier_Context $context
     * @return bool success or failure
     */
    abstract public function doValidate(&$uri, $config, $context);

    /**
     * Public interface for validating components of a URI.  Performs a
     * bunch of default actions. Don't overload this method.
     * @param HTMLPurifier_URI $uri Reference to a HTMLPurifier_URI object
     * @param HTMLPurifier_Config $config
     * @param HTMLPurifier_Context $context
     * @return bool success or failure
     */
    public function validate(&$uri, $config, $context)
    {
        if ($this->default_port == $uri->port) {
            $uri->port = null;
        }
        // kludge: browsers do funny things when the scheme but not the
        // authority is set
        if (!$this->may_omit_host &&
            // if the scheme is present, a missing host is always in error
            (!is_null($uri->scheme) && ($uri->host === '' || is_null($uri->host))) ||
            // if the scheme is not present, a *blank* host is in error,
            // since this translates into '///path' which most browsers
            // interpret as being 'http://path'.
            (is_null($uri->scheme) && $uri->host === '')
        ) {
            do {
                if (is_null($uri->scheme)) {
                    if (substr($uri->path, 0, 2) != '//') {
                        $uri->host = null;
                        break;
                    }
                    // URI is '////path', so we cannot nullify the
                    // host to preserve semantics.  Try expanding the
                    // hostname instead (fall through)
                }
                // first see if we can manually insert a hostname
                $host = $config->get('URI.Host');
                if (!is_null($host)) {
                    $uri->host = $host;
                } else {
                    // we can't do anything sensible, reject the URL.
                    return false;
                }
            } while (false);
        }
        return $this->doValidate($uri, $config, $context);
    }
}

// vim: et sw=4 sts=4
Commit initial URI unit tests and implementation. They're not complete yet though. git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@206 48356398-32a2-884e-a903-53898d9a118a 2006-08-12 01:12:35 +00:00			`<?php`

Finish documentation for all base classes. git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@306 48356398-32a2-884e-a903-53898d9a118a 2006-08-20 20:59:13 +00:00			`/**`
			`* Validator for the components of a URI for a specific scheme`
			`*/`
Dramatically rewrite null host URI handling. Basically, browsers don't parse what should be valid URIs correctly, so we have to go through some backbends to accomodate them. Specifically, for browseable URIs, the following URIs have unintended behavior: - ///example.com - http:/example.com - http:///example.com Furthermore, if the path begins with //, modifying these URLs must be done with care, as if you remove the host-name component, the parse tree changes. I've modified the engine to follow correct URI semantics as much as possible while outputting browser compatible code, and invalidate the URI in cases where we can't deal. There has been a refactoring of URIScheme so that this important check is always performed, introducing a new member variable allow_empty_host which is true on data, file, mailto and news schemes. This also fixes bypass bugs on URI.Munge. Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2011-01-25 18:56:46 +00:00			`abstract class HTMLPurifier_URIScheme`
Commit initial URI unit tests and implementation. They're not complete yet though. git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@206 48356398-32a2-884e-a903-53898d9a118a 2006-08-12 01:12:35 +00:00			`{`
Remove trailing whitespace. Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com> 2008-12-06 07:28:20 +00:00
Finish documentation for all base classes. git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@306 48356398-32a2-884e-a903-53898d9a118a 2006-08-20 20:59:13 +00:00			`/**`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`* Scheme's default port (integer). If an explicit port number is`
Dramatically rewrite null host URI handling. Basically, browsers don't parse what should be valid URIs correctly, so we have to go through some backbends to accomodate them. Specifically, for browseable URIs, the following URIs have unintended behavior: - ///example.com - http:/example.com - http:///example.com Furthermore, if the path begins with //, modifying these URLs must be done with care, as if you remove the host-name component, the parse tree changes. I've modified the engine to follow correct URI semantics as much as possible while outputting browser compatible code, and invalidate the URI in cases where we can't deal. There has been a refactoring of URIScheme so that this important check is always performed, introducing a new member variable allow_empty_host which is true on data, file, mailto and news schemes. This also fixes bypass bugs on URI.Munge. Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2011-01-25 18:56:46 +00:00			`* specified that coincides with the default port, it will be`
			`* elided.`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`* @type int`
Finish documentation for all base classes. git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@306 48356398-32a2-884e-a903-53898d9a118a 2006-08-20 20:59:13 +00:00			`*/`
Convert to PHP 5 only codebase, adding visibility modifiers to all members and methods in the main library area (function only for test methods) git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1458 48356398-32a2-884e-a903-53898d9a118a 2007-11-25 02:24:39 +00:00			`public $default_port = null;`
Remove trailing whitespace. Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com> 2008-12-06 07:28:20 +00:00
[1.2.0] Non-accessible resources (ex. mailto) blocked from embedded URIs (img src) git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@528 48356398-32a2-884e-a903-53898d9a118a 2006-11-17 23:09:10 +00:00			`/**`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`* Whether or not URIs of this scheme are locatable by a browser`
[1.2.0] Non-accessible resources (ex. mailto) blocked from embedded URIs (img src) git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@528 48356398-32a2-884e-a903-53898d9a118a 2006-11-17 23:09:10 +00:00			`* http and ftp are accessible, while mailto and news are not.`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`* @type bool`
[1.2.0] Non-accessible resources (ex. mailto) blocked from embedded URIs (img src) git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@528 48356398-32a2-884e-a903-53898d9a118a 2006-11-17 23:09:10 +00:00			`*/`
Convert to PHP 5 only codebase, adding visibility modifiers to all members and methods in the main library area (function only for test methods) git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1458 48356398-32a2-884e-a903-53898d9a118a 2007-11-25 02:24:39 +00:00			`public $browsable = false;`
Remove trailing whitespace. Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com> 2008-12-06 07:28:20 +00:00
URI.Munge munges https to http URIs. Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2011-04-10 12:09:24 +00:00			`/**`
			`* Whether or not data transmitted over this scheme is encrypted.`
			`* https is secure, http is not.`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`* @type bool`
URI.Munge munges https to http URIs. Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2011-04-10 12:09:24 +00:00			`*/`
			`public $secure = false;`

[2.1.0] Implement MakeAbsolute URI filter - Move some directives with complex dependencies to URIDefinition - Fix a missing extends - Add hierarchical information to URI schemes - Fix bug in URIHarness. git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1346 48356398-32a2-884e-a903-53898d9a118a 2007-08-02 21:47:24 +00:00			`/**`
			`* Whether or not the URI always uses <hier_part>, resolves edge cases`
			`* with making relative URIs absolute`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`* @type bool`
[2.1.0] Implement MakeAbsolute URI filter - Move some directives with complex dependencies to URIDefinition - Fix a missing extends - Add hierarchical information to URI schemes - Fix bug in URIHarness. git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1346 48356398-32a2-884e-a903-53898d9a118a 2007-08-02 21:47:24 +00:00			`*/`
Convert to PHP 5 only codebase, adding visibility modifiers to all members and methods in the main library area (function only for test methods) git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@1458 48356398-32a2-884e-a903-53898d9a118a 2007-11-25 02:24:39 +00:00			`public $hierarchical = false;`
Remove trailing whitespace. Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com> 2008-12-06 07:28:20 +00:00
Finish documentation for all base classes. git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@306 48356398-32a2-884e-a903-53898d9a118a 2006-08-20 20:59:13 +00:00			`/**`
Dramatically rewrite null host URI handling. Basically, browsers don't parse what should be valid URIs correctly, so we have to go through some backbends to accomodate them. Specifically, for browseable URIs, the following URIs have unintended behavior: - ///example.com - http:/example.com - http:///example.com Furthermore, if the path begins with //, modifying these URLs must be done with care, as if you remove the host-name component, the parse tree changes. I've modified the engine to follow correct URI semantics as much as possible while outputting browser compatible code, and invalidate the URI in cases where we can't deal. There has been a refactoring of URIScheme so that this important check is always performed, introducing a new member variable allow_empty_host which is true on data, file, mailto and news schemes. This also fixes bypass bugs on URI.Munge. Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2011-01-25 18:56:46 +00:00			`* Whether or not the URI may omit a hostname when the scheme is`
			`* explicitly specified, ala file:///path/to/file. As of writing,`
			`* 'file' is the only scheme that browsers support his properly.`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`* @type bool`
Dramatically rewrite null host URI handling. Basically, browsers don't parse what should be valid URIs correctly, so we have to go through some backbends to accomodate them. Specifically, for browseable URIs, the following URIs have unintended behavior: - ///example.com - http:/example.com - http:///example.com Furthermore, if the path begins with //, modifying these URLs must be done with care, as if you remove the host-name component, the parse tree changes. I've modified the engine to follow correct URI semantics as much as possible while outputting browser compatible code, and invalidate the URI in cases where we can't deal. There has been a refactoring of URIScheme so that this important check is always performed, introducing a new member variable allow_empty_host which is true on data, file, mailto and news schemes. This also fixes bypass bugs on URI.Munge. Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2011-01-25 18:56:46 +00:00			`*/`
			`public $may_omit_host = false;`

			`/**`
			`* Validates the components of a URI for a specific scheme.`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`* @param HTMLPurifier_URI $uri Reference to a HTMLPurifier_URI object`
			`* @param HTMLPurifier_Config $config`
			`* @param HTMLPurifier_Context $context`
			`* @return bool success or failure`
Dramatically rewrite null host URI handling. Basically, browsers don't parse what should be valid URIs correctly, so we have to go through some backbends to accomodate them. Specifically, for browseable URIs, the following URIs have unintended behavior: - ///example.com - http:/example.com - http:///example.com Furthermore, if the path begins with //, modifying these URLs must be done with care, as if you remove the host-name component, the parse tree changes. I've modified the engine to follow correct URI semantics as much as possible while outputting browser compatible code, and invalidate the URI in cases where we can't deal. There has been a refactoring of URIScheme so that this important check is always performed, introducing a new member variable allow_empty_host which is true on data, file, mailto and news schemes. This also fixes bypass bugs on URI.Munge. Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2011-01-25 18:56:46 +00:00			`*/`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`abstract public function doValidate(&$uri, $config, $context);`
Dramatically rewrite null host URI handling. Basically, browsers don't parse what should be valid URIs correctly, so we have to go through some backbends to accomodate them. Specifically, for browseable URIs, the following URIs have unintended behavior: - ///example.com - http:/example.com - http:///example.com Furthermore, if the path begins with //, modifying these URLs must be done with care, as if you remove the host-name component, the parse tree changes. I've modified the engine to follow correct URI semantics as much as possible while outputting browser compatible code, and invalidate the URI in cases where we can't deal. There has been a refactoring of URIScheme so that this important check is always performed, introducing a new member variable allow_empty_host which is true on data, file, mailto and news schemes. This also fixes bypass bugs on URI.Munge. Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2011-01-25 18:56:46 +00:00
			`/**`
			`* Public interface for validating components of a URI. Performs a`
			`* bunch of default actions. Don't overload this method.`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`* @param HTMLPurifier_URI $uri Reference to a HTMLPurifier_URI object`
			`* @param HTMLPurifier_Config $config`
			`* @param HTMLPurifier_Context $context`
			`* @return bool success or failure`
Finish documentation for all base classes. git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@306 48356398-32a2-884e-a903-53898d9a118a 2006-08-20 20:59:13 +00:00			`*/`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`public function validate(&$uri, $config, $context)`
			`{`
			`if ($this->default_port == $uri->port) {`
			`$uri->port = null;`
			`}`
Dramatically rewrite null host URI handling. Basically, browsers don't parse what should be valid URIs correctly, so we have to go through some backbends to accomodate them. Specifically, for browseable URIs, the following URIs have unintended behavior: - ///example.com - http:/example.com - http:///example.com Furthermore, if the path begins with //, modifying these URLs must be done with care, as if you remove the host-name component, the parse tree changes. I've modified the engine to follow correct URI semantics as much as possible while outputting browser compatible code, and invalidate the URI in cases where we can't deal. There has been a refactoring of URIScheme so that this important check is always performed, introducing a new member variable allow_empty_host which is true on data, file, mailto and news schemes. This also fixes bypass bugs on URI.Munge. Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2011-01-25 18:56:46 +00:00			`// kludge: browsers do funny things when the scheme but not the`
			`// authority is set`
			`if (!$this->may_omit_host &&`
			`// if the scheme is present, a missing host is always in error`
			`(!is_null($uri->scheme) && ($uri->host === '' \|\| is_null($uri->host))) \|\|`
			`// if the scheme is not present, a blank host is in error,`
			`// since this translates into '///path' which most browsers`
			`// interpret as being 'http://path'.`
PSR-2 reformatting PHPDoc corrections With minor corrections. Signed-off-by: Marcus Bointon <marcus@synchromedia.co.uk> Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2013-07-16 11:56:14 +00:00			`(is_null($uri->scheme) && $uri->host === '')`
Dramatically rewrite null host URI handling. Basically, browsers don't parse what should be valid URIs correctly, so we have to go through some backbends to accomodate them. Specifically, for browseable URIs, the following URIs have unintended behavior: - ///example.com - http:/example.com - http:///example.com Furthermore, if the path begins with //, modifying these URLs must be done with care, as if you remove the host-name component, the parse tree changes. I've modified the engine to follow correct URI semantics as much as possible while outputting browser compatible code, and invalidate the URI in cases where we can't deal. There has been a refactoring of URIScheme so that this important check is always performed, introducing a new member variable allow_empty_host which is true on data, file, mailto and news schemes. This also fixes bypass bugs on URI.Munge. Signed-off-by: Edward Z. Yang <ezyang@mit.edu> 2011-01-25 18:56:46 +00:00			`) {`
			`do {`
			`if (is_null($uri->scheme)) {`
			`if (substr($uri->path, 0, 2) != '//') {`
			`$uri->host = null;`
			`break;`
			`}`
			`// URI is '////path', so we cannot nullify the`
			`// host to preserve semantics. Try expanding the`
			`// hostname instead (fall through)`
			`}`
			`// first see if we can manually insert a hostname`
			`$host = $config->get('URI.Host');`
			`if (!is_null($host)) {`
			`$uri->host = $host;`
			`} else {`
			`// we can't do anything sensible, reject the URL.`
			`return false;`
			`}`
			`} while (false);`
			`}`
			`return $this->doValidate($uri, $config, $context);`
Commit initial URI unit tests and implementation. They're not complete yet though. git-svn-id: http://htmlpurifier.org/svnroot/htmlpurifier/trunk@206 48356398-32a2-884e-a903-53898d9a118a 2006-08-12 01:12:35 +00:00			`}`
			`}`

Add vim modelines to all files. Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com> 2008-12-06 09:24:59 +00:00			`// vim: et sw=4 sts=4`