URL: Difference between revisions
(using an external editor) |
No edit summary |
||
(10 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
== Definition == | == Definition == | ||
* A URL is an Internet address for a | * A URL is an Internet address for a resource | ||
* A '''Uniform Resource Locator''' (URL) is a compact string | * A '''Uniform Resource Locator''' (URL) is a compact string representation for a resource available via the Internet | ||
representation for a resource available via the Internet | |||
URLs are just one kind of Uniform Resource Identifiers ([[URI]]s) and formally speaking the URL Specification is obsolete and has been replaced by the URI (RFC 3986) specification. However, in practical terms it is still useful (much easier to understand than the URI specs ...). | |||
This piece is just a short (cut&paste) summary from the obsolete [http://www.faqs.org/rfcs/rfc1738.html RFC 1738] specification. | |||
== Formal Syntax == | == Formal Syntax == | ||
Line 16: | Line 17: | ||
A URL contains the name of the scheme being used (<nowki><scheme></nowiki>) followed by a colon and then a string (the <nowiki><scheme-specific-part></nowiki>) whose interpretation depends on the scheme. | A URL contains the name of the scheme being used (<nowki><scheme></nowiki>) followed by a colon and then a string (the <nowiki><scheme-specific-part></nowiki>) whose interpretation depends on the scheme. | ||
A scheme refers an Internet protocol like HTTP or Telnet or Email. This is why one also could write: | |||
<protocol>:<protocol-specific-part> | |||
Scheme names consist of a sequence of characters. The lower case | Scheme names consist of a sequence of characters. The lower case | ||
Line 22: | Line 27: | ||
interpreting URLs should treat upper case letters as equivalent to | interpreting URLs should treat upper case letters as equivalent to | ||
lower case in scheme names (e.g., allow "HTTP" as well as "http"). | lower case in scheme names (e.g., allow "HTTP" as well as "http"). | ||
Each scheme (protocol) further defines specific parts, e.g. see [[#The_HTTP_Scheme|HTTP]] Scheme below. | |||
=== Unsafe characters === | === Unsafe characters === | ||
Line 40: | Line 47: | ||
== Major Schemes == | == Major Schemes == | ||
http Hypertext Transfer Protocol | http Hypertext Transfer Protocol | ||
ftp File Transfer protocol | ftp File Transfer protocol | ||
mailto Electronic mail address | mailto Electronic mail address | ||
news USENET news | news USENET news | ||
nntp USENET news using NNTP access | nntp USENET news using NNTP access | ||
telnet Reference to interactive sessions | telnet Reference to interactive sessions | ||
file Host-specific file names | file Host-specific file names | ||
; Past (popular in the early nineties) | ; Past (popular in the early nineties) | ||
prospero Prospero Directory Service | prospero Prospero Directory Service | ||
gopher The Gopher protocol | gopher The Gopher protocol | ||
wais Wide Area Information Servers | wais Wide Area Information Servers | ||
=== The HTTP Scheme === | === The HTTP Scheme === | ||
Line 58: | Line 65: | ||
An HTTP URL takes the form: | An HTTP URL takes the form: | ||
http://<host>:<port>/<path>?<searchpart> | <nowiki>http://<host>:<port>/<path>?<searchpart></nowiki> | ||
If :<port> is omitted, the port defaults to 80. No user name or password is allowed. <nowiki><path></nowki> is an HTTP selector, and <nowiki><searchpart></nowiki> is a query string. The <nowiki><path></nowiki> is optional, as is the <nowiki><searchpart></nowiki> and its preceding "?". If neither <path> nor <searchpart> is present, the "/" may also be omitted. | If :<port> is omitted, the port defaults to 80. No user name or password is allowed. <nowiki><path></nowki> is an HTTP selector, and <nowiki><searchpart></nowiki> is a query string. The <nowiki><path></nowiki> is optional, as is the <nowiki><searchpart></nowiki> and its preceding "?". If neither <path> nor <searchpart> is present, the "/" may also be omitted. | ||
Within the <nowiki><path></nowiki> and <nowiki><searchpart></nowiki> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure. | Within the <nowiki><path></nowiki> and <nowiki><searchpart></nowiki> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure. | ||
== Links == | |||
* [http://www.investintech.com/content/beginnersurl/ The Beginners Guide to URLs] (Investintech, a page with some good links) | |||
== References == | == References == | ||
Line 68: | Line 79: | ||
; Standards | ; Standards | ||
[http://www.faqs.org/rfcs/rfc1738.html RFC 1738] | * [http://www.faqs.org/rfcs/rfc1738.html RFC 1738 - URL Syntax] | ||
; Related standards | |||
* [http://www.rfc-editor.org/rfc/rfc2141.txt URN Syntax] | |||
* [http://www.rfc-editor.org/rfc/rfc3986.txt URI Syntax] | |||
[[Category:Networking technologies]] | |||
[[Category: Web authoring]][[Category:web standards]] |
Latest revision as of 09:48, 31 July 2009
Definition
- A URL is an Internet address for a resource
- A Uniform Resource Locator (URL) is a compact string representation for a resource available via the Internet
URLs are just one kind of Uniform Resource Identifiers (URIs) and formally speaking the URL Specification is obsolete and has been replaced by the URI (RFC 3986) specification. However, in practical terms it is still useful (much easier to understand than the URI specs ...).
This piece is just a short (cut&paste) summary from the obsolete RFC 1738 specification.
Formal Syntax
According to the RFC1738 specification, URLs are written as follows:
<scheme>:<scheme-specific-part>
A URL contains the name of the scheme being used (<nowki><scheme></nowiki>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.
A scheme refers an Internet protocol like HTTP or Telnet or Email. This is why one also could write:
<protocol>:<protocol-specific-part>
Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").
Each scheme (protocol) further defines specific parts, e.g. see HTTP Scheme below.
Unsafe characters
Do not use the following characters (unless you know what you do)
- The SPACE because significant spaces may disappear
- "<" and ">" are unsafe because they are used as the delimiters around URLs in free text
- the quote mark (""") is used to delimit URLs in some systems
- "#", because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor
- "%", because it is used for encodings of other characters.
- The follow characters are unsafe because some gateways and other transport agents may eat them up: {", "}", "|", "\", "^", "~", "[", "]", and "`".
Reserved characters
Many URL schemes reserve certain characters for a special meaning, e.g. ";", "/", "?", ":", "@", "=" and "&"
Major Schemes
http Hypertext Transfer Protocol ftp File Transfer protocol mailto Electronic mail address news USENET news nntp USENET news using NNTP access telnet Reference to interactive sessions file Host-specific file names
- Past (popular in the early nineties)
prospero Prospero Directory Service gopher The Gopher protocol wais Wide Area Information Servers
The HTTP Scheme
An HTTP URL takes the form:
http://<host>:<port>/<path>?<searchpart>
If :<port> is omitted, the port defaults to 80. No user name or password is allowed. <path></nowki> is an HTTP selector, and <nowiki><searchpart> is a query string. The <path> is optional, as is the <searchpart> and its preceding "?". If neither <path> nor <searchpart> is present, the "/" may also be omitted.
Within the <path> and <searchpart> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure.
Links
- The Beginners Guide to URLs (Investintech, a page with some good links)
References
- Standards
- Related standards