URL: Difference between revisions
m (→References) |
No edit summary |
||
(3 intermediate revisions by 2 users not shown) | |||
Line 70: | Line 70: | ||
Within the <nowiki><path></nowiki> and <nowiki><searchpart></nowiki> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure. | Within the <nowiki><path></nowiki> and <nowiki><searchpart></nowiki> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure. | ||
== Links == | |||
* [http://www.investintech.com/content/beginnersurl/ The Beginners Guide to URLs] (Investintech, a page with some good links) | |||
== References == | == References == | ||
Line 82: | Line 86: | ||
[[Category: | |||
[[Category: | [[Category:Networking technologies]] | ||
[[Category: | [[Category: Web authoring]][[Category:web standards]] |
Latest revision as of 09:48, 31 July 2009
Definition
- A URL is an Internet address for a resource
- A Uniform Resource Locator (URL) is a compact string representation for a resource available via the Internet
URLs are just one kind of Uniform Resource Identifiers (URIs) and formally speaking the URL Specification is obsolete and has been replaced by the URI (RFC 3986) specification. However, in practical terms it is still useful (much easier to understand than the URI specs ...).
This piece is just a short (cut&paste) summary from the obsolete RFC 1738 specification.
Formal Syntax
According to the RFC1738 specification, URLs are written as follows:
<scheme>:<scheme-specific-part>
A URL contains the name of the scheme being used (<nowki><scheme></nowiki>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.
A scheme refers an Internet protocol like HTTP or Telnet or Email. This is why one also could write:
<protocol>:<protocol-specific-part>
Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").
Each scheme (protocol) further defines specific parts, e.g. see HTTP Scheme below.
Unsafe characters
Do not use the following characters (unless you know what you do)
- The SPACE because significant spaces may disappear
- "<" and ">" are unsafe because they are used as the delimiters around URLs in free text
- the quote mark (""") is used to delimit URLs in some systems
- "#", because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor
- "%", because it is used for encodings of other characters.
- The follow characters are unsafe because some gateways and other transport agents may eat them up: {", "}", "|", "\", "^", "~", "[", "]", and "`".
Reserved characters
Many URL schemes reserve certain characters for a special meaning, e.g. ";", "/", "?", ":", "@", "=" and "&"
Major Schemes
http Hypertext Transfer Protocol ftp File Transfer protocol mailto Electronic mail address news USENET news nntp USENET news using NNTP access telnet Reference to interactive sessions file Host-specific file names
- Past (popular in the early nineties)
prospero Prospero Directory Service gopher The Gopher protocol wais Wide Area Information Servers
The HTTP Scheme
An HTTP URL takes the form:
http://<host>:<port>/<path>?<searchpart>
If :<port> is omitted, the port defaults to 80. No user name or password is allowed. <path></nowki> is an HTTP selector, and <nowiki><searchpart> is a query string. The <path> is optional, as is the <searchpart> and its preceding "?". If neither <path> nor <searchpart> is present, the "/" may also be omitted.
Within the <path> and <searchpart> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure.
Links
- The Beginners Guide to URLs (Investintech, a page with some good links)
References
- Standards
- Related standards