2006-08-12 01:12:35 +00:00
|
|
|
<?php
|
|
|
|
|
2006-08-20 20:59:13 +00:00
|
|
|
/**
|
|
|
|
* Validator for the components of a URI for a specific scheme
|
|
|
|
*/
|
Dramatically rewrite null host URI handling.
Basically, browsers don't parse what should be valid URIs correctly, so
we have to go through some backbends to accomodate them. Specifically,
for browseable URIs, the following URIs have unintended behavior:
- ///example.com
- http:/example.com
- http:///example.com
Furthermore, if the path begins with //, modifying these URLs must
be done with care, as if you remove the host-name component, the
parse tree changes.
I've modified the engine to follow correct URI semantics as much
as possible while outputting browser compatible code, and invalidate
the URI in cases where we can't deal. There has been a refactoring
of URIScheme so that this important check is always performed,
introducing a new member variable allow_empty_host which is true
on data, file, mailto and news schemes.
This also fixes bypass bugs on URI.Munge.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-01-25 18:56:46 +00:00
|
|
|
abstract class HTMLPurifier_URIScheme
|
2006-08-12 01:12:35 +00:00
|
|
|
{
|
2008-12-06 07:28:20 +00:00
|
|
|
|
2006-08-20 20:59:13 +00:00
|
|
|
/**
|
Dramatically rewrite null host URI handling.
Basically, browsers don't parse what should be valid URIs correctly, so
we have to go through some backbends to accomodate them. Specifically,
for browseable URIs, the following URIs have unintended behavior:
- ///example.com
- http:/example.com
- http:///example.com
Furthermore, if the path begins with //, modifying these URLs must
be done with care, as if you remove the host-name component, the
parse tree changes.
I've modified the engine to follow correct URI semantics as much
as possible while outputting browser compatible code, and invalidate
the URI in cases where we can't deal. There has been a refactoring
of URIScheme so that this important check is always performed,
introducing a new member variable allow_empty_host which is true
on data, file, mailto and news schemes.
This also fixes bypass bugs on URI.Munge.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-01-25 18:56:46 +00:00
|
|
|
* Scheme's default port (integer). If an explicit port number is
|
|
|
|
* specified that coincides with the default port, it will be
|
|
|
|
* elided.
|
2006-08-20 20:59:13 +00:00
|
|
|
*/
|
2007-11-25 02:24:39 +00:00
|
|
|
public $default_port = null;
|
2008-12-06 07:28:20 +00:00
|
|
|
|
2006-11-17 23:09:10 +00:00
|
|
|
/**
|
|
|
|
* Whether or not URIs of this schem are locatable by a browser
|
|
|
|
* http and ftp are accessible, while mailto and news are not.
|
|
|
|
*/
|
2007-11-25 02:24:39 +00:00
|
|
|
public $browsable = false;
|
2008-12-06 07:28:20 +00:00
|
|
|
|
2007-08-02 21:47:24 +00:00
|
|
|
/**
|
|
|
|
* Whether or not the URI always uses <hier_part>, resolves edge cases
|
|
|
|
* with making relative URIs absolute
|
|
|
|
*/
|
2007-11-25 02:24:39 +00:00
|
|
|
public $hierarchical = false;
|
2008-12-06 07:28:20 +00:00
|
|
|
|
2006-08-20 20:59:13 +00:00
|
|
|
/**
|
Dramatically rewrite null host URI handling.
Basically, browsers don't parse what should be valid URIs correctly, so
we have to go through some backbends to accomodate them. Specifically,
for browseable URIs, the following URIs have unintended behavior:
- ///example.com
- http:/example.com
- http:///example.com
Furthermore, if the path begins with //, modifying these URLs must
be done with care, as if you remove the host-name component, the
parse tree changes.
I've modified the engine to follow correct URI semantics as much
as possible while outputting browser compatible code, and invalidate
the URI in cases where we can't deal. There has been a refactoring
of URIScheme so that this important check is always performed,
introducing a new member variable allow_empty_host which is true
on data, file, mailto and news schemes.
This also fixes bypass bugs on URI.Munge.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-01-25 18:56:46 +00:00
|
|
|
* Whether or not the URI may omit a hostname when the scheme is
|
|
|
|
* explicitly specified, ala file:///path/to/file. As of writing,
|
|
|
|
* 'file' is the only scheme that browsers support his properly.
|
|
|
|
*/
|
|
|
|
public $may_omit_host = false;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Validates the components of a URI for a specific scheme.
|
|
|
|
* @param $uri Reference to a HTMLPurifier_URI object
|
|
|
|
* @param $config HTMLPurifier_Config object
|
|
|
|
* @param $context HTMLPurifier_Context object
|
|
|
|
* @return Bool success or failure
|
|
|
|
*/
|
|
|
|
public abstract function doValidate(&$uri, $config, $context);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Public interface for validating components of a URI. Performs a
|
|
|
|
* bunch of default actions. Don't overload this method.
|
|
|
|
* @param $uri Reference to a HTMLPurifier_URI object
|
2006-08-20 20:59:13 +00:00
|
|
|
* @param $config HTMLPurifier_Config object
|
2006-10-27 01:20:10 +00:00
|
|
|
* @param $context HTMLPurifier_Context object
|
2007-08-01 18:34:46 +00:00
|
|
|
* @return Bool success or failure
|
2006-08-20 20:59:13 +00:00
|
|
|
*/
|
2008-01-05 00:10:43 +00:00
|
|
|
public function validate(&$uri, $config, $context) {
|
2007-08-01 18:34:46 +00:00
|
|
|
if ($this->default_port == $uri->port) $uri->port = null;
|
Dramatically rewrite null host URI handling.
Basically, browsers don't parse what should be valid URIs correctly, so
we have to go through some backbends to accomodate them. Specifically,
for browseable URIs, the following URIs have unintended behavior:
- ///example.com
- http:/example.com
- http:///example.com
Furthermore, if the path begins with //, modifying these URLs must
be done with care, as if you remove the host-name component, the
parse tree changes.
I've modified the engine to follow correct URI semantics as much
as possible while outputting browser compatible code, and invalidate
the URI in cases where we can't deal. There has been a refactoring
of URIScheme so that this important check is always performed,
introducing a new member variable allow_empty_host which is true
on data, file, mailto and news schemes.
This also fixes bypass bugs on URI.Munge.
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
2011-01-25 18:56:46 +00:00
|
|
|
// kludge: browsers do funny things when the scheme but not the
|
|
|
|
// authority is set
|
|
|
|
if (!$this->may_omit_host &&
|
|
|
|
// if the scheme is present, a missing host is always in error
|
|
|
|
(!is_null($uri->scheme) && ($uri->host === '' || is_null($uri->host))) ||
|
|
|
|
// if the scheme is not present, a *blank* host is in error,
|
|
|
|
// since this translates into '///path' which most browsers
|
|
|
|
// interpret as being 'http://path'.
|
|
|
|
(is_null($uri->scheme) && $uri->host === '')
|
|
|
|
) {
|
|
|
|
do {
|
|
|
|
if (is_null($uri->scheme)) {
|
|
|
|
if (substr($uri->path, 0, 2) != '//') {
|
|
|
|
$uri->host = null;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
// URI is '////path', so we cannot nullify the
|
|
|
|
// host to preserve semantics. Try expanding the
|
|
|
|
// hostname instead (fall through)
|
|
|
|
}
|
|
|
|
// first see if we can manually insert a hostname
|
|
|
|
$host = $config->get('URI.Host');
|
|
|
|
if (!is_null($host)) {
|
|
|
|
$uri->host = $host;
|
|
|
|
} else {
|
|
|
|
// we can't do anything sensible, reject the URL.
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
} while (false);
|
|
|
|
}
|
|
|
|
return $this->doValidate($uri, $config, $context);
|
2006-08-12 01:12:35 +00:00
|
|
|
}
|
2008-12-06 07:28:20 +00:00
|
|
|
|
2006-08-12 01:12:35 +00:00
|
|
|
}
|
|
|
|
|
2008-12-06 09:24:59 +00:00
|
|
|
// vim: et sw=4 sts=4
|