URL: Difference between revisions
m (→Definition) |
|
(No difference)
|
Revision as of 11:18, 12 February 2007
Definition
- A URL is an Internet address for a resource
- A Uniform Resource Locator (URL) is a compact string representation for a resource available via the Internet
URLs are just one kind of Uniform Resource Identifiers (URIs)
- This piece is just a short (cut&paste) summary from the specification...
Formal Syntax
According to the RFC1738 specification, URLs are written as follows:
<scheme>:<scheme-specific-part>
A URL contains the name of the scheme being used (<nowki><scheme></nowiki>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.
A scheme refers an Internet protocol like HTTP or Telnet or Email. This is why one also could write:
<protocol>:<protocol-specific-part>
Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").
Each scheme (protocol) further defines specific parts, e.g. see HTTP Scheme below.
Unsafe characters
Do not use the following characters (unless you know what you do)
- The SPACE because significant spaces may disappear
- "<" and ">" are unsafe because they are used as the delimiters around URLs in free text
- the quote mark (""") is used to delimit URLs in some systems
- "#", because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor
- "%", because it is used for encodings of other characters.
- The follow characters are unsafe because some gateways and other transport agents may eat them up: {", "}", "|", "\", "^", "~", "[", "]", and "`".
Reserved characters
Many URL schemes reserve certain characters for a special meaning, e.g. ";", "/", "?", ":", "@", "=" and "&"
Major Schemes
http Hypertext Transfer Protocol ftp File Transfer protocol mailto Electronic mail address news USENET news nntp USENET news using NNTP access telnet Reference to interactive sessions file Host-specific file names
- Past (popular in the early nineties)
prospero Prospero Directory Service gopher The Gopher protocol wais Wide Area Information Servers
The HTTP Scheme
An HTTP URL takes the form:
http://<host>:<port>/<path>?<searchpart>
If :<port> is omitted, the port defaults to 80. No user name or password is allowed. <path></nowki> is an HTTP selector, and <nowiki><searchpart> is a query string. The <path> is optional, as is the <searchpart> and its preceding "?". If neither <path> nor <searchpart> is present, the "/" may also be omitted.
Within the <path> and <searchpart> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure.
References
- Standards
- Related standards