URL: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
No edit summary
 
(2 intermediate revisions by 2 users not shown)
Line 86: Line 86:




[[Category: Technologies]]
 
[[Category: Networking]]
[[Category:Networking technologies]]
[[Category: Web authoring]]
[[Category: Web authoring]][[Category:web standards]]

Latest revision as of 09:48, 31 July 2009

Definition

  • A URL is an Internet address for a resource
  • A Uniform Resource Locator (URL) is a compact string representation for a resource available via the Internet

URLs are just one kind of Uniform Resource Identifiers (URIs) and formally speaking the URL Specification is obsolete and has been replaced by the URI (RFC 3986) specification. However, in practical terms it is still useful (much easier to understand than the URI specs ...).

This piece is just a short (cut&paste) summary from the obsolete RFC 1738 specification.

Formal Syntax

According to the RFC1738 specification, URLs are written as follows:

 <scheme>:<scheme-specific-part>

A URL contains the name of the scheme being used (<nowki><scheme></nowiki>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.

A scheme refers an Internet protocol like HTTP or Telnet or Email. This is why one also could write:

 <protocol>:<protocol-specific-part>

Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").

Each scheme (protocol) further defines specific parts, e.g. see HTTP Scheme below.

Unsafe characters

Do not use the following characters (unless you know what you do)

  • The SPACE because significant spaces may disappear
  • "<" and ">" are unsafe because they are used as the delimiters around URLs in free text
  • the quote mark (""") is used to delimit URLs in some systems
  • "#", because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor
  • "%", because it is used for encodings of other characters.
  • The follow characters are unsafe because some gateways and other transport agents may eat them up: {", "}", "|", "\", "^", "~", "[", "]", and "`".

Reserved characters

Many URL schemes reserve certain characters for a special meaning, e.g. ";", "/", "?", ":", "@", "=" and "&"

Major Schemes

http                    Hypertext Transfer Protocol
ftp                     File Transfer protocol
mailto                  Electronic mail address
news                    USENET news
nntp                    USENET news using NNTP access
telnet                  Reference to interactive sessions
file                    Host-specific file names
Past (popular in the early nineties)
prospero                Prospero Directory Service
gopher                  The Gopher protocol
wais                    Wide Area Information Servers

The HTTP Scheme

An HTTP URL takes the form:

http://<host>:<port>/<path>?<searchpart>

If :<port> is omitted, the port defaults to 80. No user name or password is allowed. <path></nowki> is an HTTP selector, and <nowiki><searchpart> is a query string. The <path> is optional, as is the <searchpart> and its preceding "?". If neither <path> nor <searchpart> is present, the "/" may also be omitted.

Within the <path> and <searchpart> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure.

Links

References

Standards
Related standards