GLib.Uri¶
record (struct)
The GUri type and related functions can be used to parse URIs into
their components, and build valid URIs from individual components.
Since GUri only represents absolute URIs, all GUris will have a
URI scheme, so Uri.get_scheme will always return a non-NULL
answer. Likewise, by definition, all URIs have a path component, so
Uri.get_path will always return a non-NULL string (which may
be empty).
If the URI string has an
‘authority’ component (that
is, if the scheme is followed by :// rather than just :), then the
GUri will contain a hostname, and possibly a port and ‘userinfo’.
Additionally, depending on how the GUri was constructed/parsed (for example,
using the G_URI_FLAGS_HAS_PASSWORD and G_URI_FLAGS_HAS_AUTH_PARAMS flags),
the userinfo may be split out into a username, password, and
additional authorization-related parameters.
Normally, the components of a GUri will have all %-encoded
characters decoded. However, if you construct/parse a GUri with
G_URI_FLAGS_ENCODED, then the %-encoding will be preserved instead in
the userinfo, path, and query fields (and in the host field if also
created with G_URI_FLAGS_NON_DNS). In particular, this is necessary if
the URI may contain binary data or non-UTF-8 text, or if decoding
the components might change the interpretation of the URI.
For example, with the encoded flag:
g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue", G_URI_FLAGS_ENCODED, &err);
g_assert_cmpstr (g_uri_get_query (uri), ==, "query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue");
While the default %-decoding behaviour would give:
g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fparam%3Dvalue", G_URI_FLAGS_NONE, &err);
g_assert_cmpstr (g_uri_get_query (uri), ==, "query=http://host/path?param=value");
During decoding, if an invalid UTF-8 string is encountered, parsing will fail with an error indicating the bad string location:
g_autoptr(GUri) uri = g_uri_parse ("http://host/path?query=http%3A%2F%2Fhost%2Fpath%3Fbad%3D%00alue", G_URI_FLAGS_NONE, &err);
g_assert_error (err, G_URI_ERROR, G_URI_ERROR_BAD_QUERY);
You should pass G_URI_FLAGS_ENCODED or G_URI_FLAGS_ENCODED_QUERY if you
need to handle that case manually. In particular, if the query string
contains = characters that are %-encoded, you should let
Uri.parse_params do the decoding once of the query.
GUri is immutable once constructed, and can safely be accessed from
multiple threads. Its reference counting is atomic.
Note that the scope of GUri is to help manipulate URIs in various applications,
following RFC 3986. In particular,
it doesn't intend to cover web browser needs, and doesn’t implement the
WHATWG URL standard. No APIs are provided to
help prevent
homograph attacks, so
GUri is not suitable for formatting URIs for display to the user for making
security-sensitive decisions.
Relative and absolute URIs¶
As defined in RFC 3986, the hierarchical nature of URIs means that they can either be ‘relative references’ (sometimes referred to as ‘relative URIs’) or ‘URIs’ (for clarity, ‘URIs’ are referred to in this documentation as ‘absolute URIs’ — although in contrast to RFC 3986, fragment identifiers are always allowed).
Relative references have one or more components of the URI missing. In
particular, they have no scheme. Any other component, such as hostname,
query, etc. may be missing, apart from a path, which has to be specified (but
may be empty). The path may be relative, starting with ./ rather than /.
For example, a valid relative reference is ./path?query,
/?query#fragment or //example.com.
Absolute URIs have a scheme specified. Any other components of the URI which
are missing are specified as explicitly unset in the URI, rather than being
resolved relative to a base URI using Uri.parse_relative.
For example, a valid absolute URI is file:///home/bob or
https://search.com?query=string.
A GUri instance is always an absolute URI. A string may be an absolute URI
or a relative reference; see the documentation for individual functions as to
what forms they accept.
Parsing URIs¶
The most minimalist APIs for parsing URIs are Uri.split and
Uri.split_with_user. These split a URI into its component
parts, and return the parts; the difference between the two is that
Uri.split treats the ‘userinfo’ component of the URI as a
single element, while Uri.split_with_user can (depending on the
UriFlags you pass) treat it as containing a username, password,
and authentication parameters. Alternatively, Uri.split_network
can be used when you are only interested in the components that are
needed to initiate a network connection to the service (scheme,
host, and port).
Uri.parse is similar to Uri.split, but instead of
returning individual strings, it returns a GUri structure (and it requires
that the URI be an absolute URI).
Uri.resolve_relative and Uri.parse_relative allow
you to resolve a relative URI relative to a base URI.
Uri.resolve_relative takes two strings and returns a string,
and Uri.parse_relative takes a GUri and a string and returns a
GUri.
All of the parsing functions take a UriFlags argument describing
exactly how to parse the URI; see the documentation for that type
for more details on the specific flags that you can pass. If you
need to choose different flags based on the type of URI, you can
use Uri.peek_scheme on the URI string to check the scheme
first, and use that to decide what flags to parse it with.
For example, you might want to use G_URI_PARAMS_WWW_FORM when parsing the
params for a web URI, so compare the result of Uri.peek_scheme
against http and https.
Building URIs¶
Uri.join and Uri.join_with_user can be used to construct
valid URI strings from a set of component strings. They are the
inverse of Uri.split and Uri.split_with_user.
Similarly, Uri.build and Uri.build_with_user can be
used to construct a GUri from a set of component strings.
As with the parsing functions, the building functions take a
UriFlags argument. In particular, it is important to keep in mind
whether the URI components you are using are already %-encoded. If so,
you must pass the G_URI_FLAGS_ENCODED flag.
file:// URIs¶
Note that Windows and Unix both define special rules for parsing
file:// URIs (involving non-UTF-8 character sets on Unix, and the
interpretation of path separators on Windows). GUri does not
implement these rules. Use filename_from_uri and
filename_to_uri if you want to properly convert between
file:// URIs and local filenames.
URI Equality¶
Note that there is no g_uri_equal () function, because comparing
URIs usefully requires scheme-specific knowledge that GUri does
not have. GUri can help with normalization if you use the various
encoded UriFlags as well as G_URI_FLAGS_SCHEME_NORMALIZE
however it is not comprehensive.
For example, data:,foo and data:;base64,Zm9v resolve to the same
thing according to the data: URI specification which GLib does not
handle.
Methods¶
get_auth_params¶
Gets uri's authentication parameters, which may contain
%-encoding, depending on the flags with which uri was created.
(If uri was not created with UriFlags.HAS_AUTH_PARAMS then this will
be None.)
Depending on the URI scheme, Uri.parse_params may be useful for
further parsing this information.
get_flags¶
Gets uri's flags set upon construction.
get_fragment¶
Gets uri's fragment, which may contain %-encoding, depending on
the flags with which uri was created.
get_host¶
Gets uri's host. This will never have %-encoded characters,
unless it is non-UTF-8 (which can only be the case if uri was
created with UriFlags.NON_DNS).
If uri contained an IPv6 address literal, this value will be just
that address, without the brackets around it that are necessary in
the string form of the URI. Note that in this case there may also
be a scope ID attached to the address. Eg, fe80::1234%``em1 (or
fe80::1234%``25em1 if the string is still encoded).
get_password¶
Gets uri's password, which may contain %-encoding, depending on
the flags with which uri was created. (If uri was not created
with UriFlags.HAS_PASSWORD then this will be None.)
get_path¶
Gets uri's path, which may contain %-encoding, depending on the
flags with which uri was created.
get_port¶
Gets uri's port.
get_query¶
Gets uri's query, which may contain %-encoding, depending on the
flags with which uri was created.
For queries consisting of a series of name=value parameters,
UriParamsIter or Uri.parse_params may be useful.
get_scheme¶
Gets uri's scheme. Note that this will always be all-lowercase,
regardless of the string or strings that uri was created from.
get_user¶
Gets the ‘username’ component of uri's userinfo, which may contain
%-encoding, depending on the flags with which uri was created.
If uri was not created with UriFlags.HAS_PASSWORD or
UriFlags.HAS_AUTH_PARAMS, this is the same as Uri.get_userinfo.
get_userinfo¶
Gets uri's userinfo, which may contain %-encoding, depending on
the flags with which uri was created.
parse_relative¶
Parses uri_ref according to flags and, if it is a
relative URI, resolves it relative to base_uri.
If the result is not a valid absolute URI, it will be discarded, and an error
returned.
Parameters:
uri_ref— a string representing a relative or absolute URIflags— flags describing how to parseuri_ref
to_string¶
Returns a string representing uri.
This is not guaranteed to return a string which is identical to the
string that uri was parsed from. However, if the source URI was
syntactically correct (according to RFC 3986), and it was parsed
with UriFlags.ENCODED, then Uri.to_string is guaranteed to return
a string which is at least semantically equivalent to the source
URI (according to RFC 3986).
If uri might contain sensitive details, such as authentication parameters,
or private data in its query string, and the returned string is going to be
logged, then consider using Uri.to_string_partial to redact parts.
to_string_partial¶
Returns a string representing uri, subject to the options in
flags. See Uri.to_string and UriHideFlags for more details.
Parameters:
flags— flags describing what parts ofurito hide
Static functions¶
build¶
@staticmethod
def build(flags: UriFlags | int, scheme: str, userinfo: str | None, host: str | None, port: int, path: str, query: str | None = ..., fragment: str | None = ...) -> Uri
Creates a new Uri from the given components according to flags.
See also Uri.build_with_user, which allows specifying the
components of the "userinfo" separately.
Parameters:
flags— flags describing how to build theUrischeme— the URI schemeuserinfo— the userinfo component, orNonehost— the host component, orNoneport— the port, or-1path— the path componentquery— the query component, orNonefragment— the fragment, orNone
build_with_user¶
@staticmethod
def build_with_user(flags: UriFlags | int, scheme: str, user: str | None, password: str | None, auth_params: str | None, host: str | None, port: int, path: str, query: str | None = ..., fragment: str | None = ...) -> Uri
Creates a new Uri from the given components according to flags
(UriFlags.HAS_PASSWORD is added unconditionally). The flags must be
coherent with the passed values, in particular use %-encoded values with
UriFlags.ENCODED.
In contrast to Uri.build, this allows specifying the components
of the ‘userinfo’ field separately. Note that user must be non-None
if either password or auth_params is non-None.
Parameters:
flags— flags describing how to build theUrischeme— the URI schemeuser— the user component of the userinfo, orNonepassword— the password component of the userinfo, orNoneauth_params— the auth params of the userinfo, orNonehost— the host component, orNoneport— the port, or-1path— the path componentquery— the query component, orNonefragment— the fragment, orNone
error_quark¶
escape_bytes¶
@staticmethod
def escape_bytes(unescaped: list[int], reserved_chars_allowed: str | None = ...) -> str
Escapes arbitrary data for use in a URI.
Normally all characters that are not ‘unreserved’ (i.e. ASCII
alphanumerical characters plus dash, dot, underscore and tilde) are
escaped. But if you specify characters in reserved_chars_allowed
they are not escaped. This is useful for the ‘reserved’ characters
in the URI specification, since those are allowed unescaped in some
portions of a URI.
Though technically incorrect, this will also allow escaping nul
bytes as %``00.
Parameters:
unescaped— the unescaped input data.reserved_chars_allowed— a string of reserved characters that are allowed to be used, orNone.
escape_string¶
@staticmethod
def escape_string(unescaped: str, reserved_chars_allowed: str | None, allow_utf8: bool) -> str
Escapes a string for use in a URI.
Normally all characters that are not "unreserved" (i.e. ASCII
alphanumerical characters plus dash, dot, underscore and tilde) are
escaped. But if you specify characters in reserved_chars_allowed
they are not escaped. This is useful for the "reserved" characters
in the URI specification, since those are allowed unescaped in some
portions of a URI.
Parameters:
unescaped— the unescaped input string.reserved_chars_allowed— a string of reserved characters that are allowed to be used, orNone.allow_utf8—Trueif the result can include UTF-8 characters.
is_valid¶
Parses uri_string according to flags, to determine whether it is a valid
absolute URI, i.e. it does not need to be resolved
relative to another URI using Uri.parse_relative.
If it’s not a valid URI, an error is returned explaining how it’s invalid.
See Uri.split, and the definition of UriFlags, for more
information on the effect of flags.
Parameters:
uri_string— a string containing an absolute URIflags— flags for parsinguri_string
join¶
@staticmethod
def join(flags: UriFlags | int, scheme: str | None, userinfo: str | None, host: str | None, port: int, path: str, query: str | None = ..., fragment: str | None = ...) -> str
Joins the given components together according to flags to create
an absolute URI string. path may not be None (though it may be the empty
string).
When host is present, path must either be empty or begin with a slash (/)
character. When host is not present, path cannot begin with two slash
characters (//). See
RFC 3986, section 3.
See also Uri.join_with_user, which allows specifying the
components of the ‘userinfo’ separately.
UriFlags.HAS_PASSWORD and UriFlags.HAS_AUTH_PARAMS are ignored if set
in flags.
Parameters:
flags— flags describing how to build the URI stringscheme— the URI scheme, orNoneuserinfo— the userinfo component, orNonehost— the host component, orNoneport— the port, or-1path— the path componentquery— the query component, orNonefragment— the fragment, orNone
join_with_user¶
@staticmethod
def join_with_user(flags: UriFlags | int, scheme: str | None, user: str | None, password: str | None, auth_params: str | None, host: str | None, port: int, path: str, query: str | None = ..., fragment: str | None = ...) -> str
Joins the given components together according to flags to create
an absolute URI string. path may not be None (though it may be the empty
string).
In contrast to Uri.join, this allows specifying the components
of the ‘userinfo’ separately. It otherwise behaves the same.
UriFlags.HAS_PASSWORD and UriFlags.HAS_AUTH_PARAMS are ignored if set
in flags.
Parameters:
flags— flags describing how to build the URI stringscheme— the URI scheme, orNoneuser— the user component of the userinfo, orNonepassword— the password component of the userinfo, orNoneauth_params— the auth params of the userinfo, orNonehost— the host component, orNoneport— the port, or-1path— the path componentquery— the query component, orNonefragment— the fragment, orNone
list_extract_uris¶
Splits an URI list conforming to the text/uri-list mime type defined in RFC 2483 into individual URIs, discarding any comments. The URIs are not validated.
Parameters:
uri_list— an URI list
parse¶
Parses uri_string according to flags. If the result is not a
valid absolute URI, it will be discarded, and an
error returned.
Parameters:
uri_string— a string representing an absolute URIflags— flags describing how to parseuri_string
parse_params¶
@staticmethod
def parse_params(params: str, length: int, separators: str, flags: UriParamsFlags | int) -> dict[str, str]
Many URI schemes include one or more attribute/value pairs as part of the URI
value. This method can be used to parse them into a hash table. When an
attribute has multiple occurrences, the last value is the final returned
value. If you need to handle repeated attributes differently, use
UriParamsIter.
The params string is assumed to still be %-encoded, but the returned
values will be fully decoded. (Thus it is possible that the returned values
may contain = or separators, if the value was encoded in the input.)
Invalid %-encoding is treated as with the UriFlags.PARSE_RELAXED
rules for Uri.parse. (However, if params is the path or query string
from a Uri that was parsed without UriFlags.PARSE_RELAXED and
UriFlags.ENCODED, then you already know that it does not contain any
invalid encoding.)
UriParamsFlags.WWW_FORM is handled as documented for UriParamsIter.init.
If UriParamsFlags.CASE_INSENSITIVE is passed to flags, attributes will be
compared case-insensitively, so a params string attr=123&Attr=456 will only
return a single attribute–value pair, Attr=456. Case will be preserved in
the returned attributes.
If params cannot be parsed (for example, it contains two separators
characters in a row), then error is set and None is returned.
Parameters:
params— a%-encoded string containingattribute=valueparameterslength— the length ofparams, or-1if it is nul-terminatedseparators— the separator byte character set between parameters. (usually&, but sometimes;or both&;). Note that this function works on bytes not characters, so it can't be used to delimit UTF-8 strings for anything but ASCII characters. You may pass an empty set, in which case no splitting will occur.flags— flags to modify the way the parameters are handled.
parse_scheme¶
Gets the scheme portion of a URI string. RFC 3986 decodes the scheme as:
Common schemes include file, https, svn+ssh, etc.
Parameters:
uri— a valid URI.
peek_scheme¶
Gets the scheme portion of a URI string. RFC 3986 decodes the scheme as:
Common schemes include file, https, svn+ssh, etc.
Unlike Uri.parse_scheme, the returned scheme is normalized to
all-lowercase and does not need to be freed.
Parameters:
uri— a valid URI.
resolve_relative¶
@staticmethod
def resolve_relative(base_uri_string: str | None, uri_ref: str, flags: UriFlags | int) -> str
Parses uri_ref according to flags and, if it is a
relative URI, resolves it relative to
base_uri_string. If the result is not a valid absolute URI, it will be
discarded, and an error returned.
(If base_uri_string is None, this just returns uri_ref, or
None if uri_ref is invalid or not absolute.)
Parameters:
base_uri_string— a string representing a base URIuri_ref— a string representing a relative or absolute URIflags— flags describing how to parseuri_ref
split¶
@staticmethod
def split(uri_ref: str, flags: UriFlags | int) -> tuple[bool, str, str, str, int, str, str, str]
Parses uri_ref (which can be an
absolute or relative URI) according to flags, and
returns the pieces. Any component that doesn't appear in uri_ref will be
returned as None (but note that all URIs always have a path component,
though it may be the empty string).
If flags contains UriFlags.ENCODED, then %-encoded characters in
uri_ref will remain encoded in the output strings. (If not,
then all such characters will be decoded.) Note that decoding will
only work if the URI components are ASCII or UTF-8, so you will
need to use UriFlags.ENCODED if they are not.
Note that the UriFlags.HAS_PASSWORD and
UriFlags.HAS_AUTH_PARAMS flags are ignored by Uri.split,
since it always returns only the full userinfo; use
Uri.split_with_user if you want it split up.
Parameters:
uri_ref— a string containing a relative or absolute URIflags— flags for parsinguri_ref
split_network¶
@staticmethod
def split_network(uri_string: str, flags: UriFlags | int) -> tuple[bool, str, str, int]
Parses uri_string (which must be an absolute URI)
according to flags, and returns the pieces relevant to connecting to a host.
See the documentation for Uri.split for more details; this is
mostly a wrapper around that function with simpler arguments.
However, it will return an error if uri_string is a relative URI,
or does not contain a hostname component.
Parameters:
uri_string— a string containing an absolute URIflags— flags for parsinguri_string
split_with_user¶
@staticmethod
def split_with_user(uri_ref: str, flags: UriFlags | int) -> tuple[bool, str, str, str, str, str, int, str, str, str]
Parses uri_ref (which can be an
absolute or relative URI) according to flags, and
returns the pieces. Any component that doesn't appear in uri_ref will be
returned as None (but note that all URIs always have a path component,
though it may be the empty string).
See Uri.split, and the definition of UriFlags, for more
information on the effect of flags. Note that password will only
be parsed out if flags contains UriFlags.HAS_PASSWORD, and
auth_params will only be parsed out if flags contains
UriFlags.HAS_AUTH_PARAMS.
Parameters:
uri_ref— a string containing a relative or absolute URIflags— flags for parsinguri_ref
unescape_bytes¶
@staticmethod
def unescape_bytes(escaped_string: str, length: int, illegal_characters: str | None = ...) -> Bytes
Unescapes a segment of an escaped string as binary data.
Note that in contrast to Uri.unescape_string, this does allow
nul bytes to appear in the output.
If any of the characters in illegal_characters appears as an escaped
character in escaped_string, then that is an error and None will be
returned. This is useful if you want to avoid for instance having a slash
being expanded in an escaped path element, which might confuse pathname
handling.
Parameters:
escaped_string— A URI-escaped stringlength— the length (in bytes) ofescaped_stringto escape, or-1if it is nul-terminated.illegal_characters— a string of illegal characters not to be allowed, orNone.
unescape_segment¶
@staticmethod
def unescape_segment(escaped_string: str | None = ..., escaped_string_end: str | None = ..., illegal_characters: str | None = ...) -> str | None
Unescapes a segment of an escaped string.
If any of the characters in illegal_characters or the NUL
character appears as an escaped character in escaped_string, then
that is an error and None will be returned. This is useful if you
want to avoid for instance having a slash being expanded in an
escaped path element, which might confuse pathname handling.
Note: NUL byte is not accepted in the output, in contrast to
Uri.unescape_bytes.
Parameters:
escaped_string— A string, may beNoneescaped_string_end— Pointer to end ofescaped_string, may beNoneillegal_characters— An optional string of illegal characters not to be allowed, may beNone
unescape_string¶
@staticmethod
def unescape_string(escaped_string: str, illegal_characters: str | None = ...) -> str | None
Unescapes a whole escaped string.
If any of the characters in illegal_characters or the NUL
character appears as an escaped character in escaped_string, then
that is an error and None will be returned. This is useful if you
want to avoid for instance having a slash being expanded in an
escaped path element, which might confuse pathname handling.
Parameters:
escaped_string— an escaped string to be unescaped.illegal_characters— a string of illegal characters not to be allowed, orNone.