URL

The educational technology and digital learning wiki
Revision as of 12:04, 12 February 2007 by Daniel K. Schneider (talk | contribs) (using an external editor)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Definition

  • A URL is an Internet address for a ressource
  • A Uniform Resource Locator (URL) is a compact string

representation for a resource available via the Internet

{{comment|This piece is just a short (cut&paste) summary from the spectification...

Formal Syntax

According to the RFC1738 specification, URLs are written as follows:

 <scheme>:<scheme-specific-part>

A URL contains the name of the scheme being used (<nowki><scheme></nowiki>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.

Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").

Unsafe characters

Do not use the following characters (unless you know what you do)

  • The SPACE because significant spaces may disappear
  • "<" and ">" are unsafe because they are used as the delimiters around URLs in free text
  • the quote mark (""") is used to delimit URLs in some systems
  • "#", because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor
  • "%", because it is used for encodings of other characters.
  • The follow characters are unsafe because some gateways and other transport agents may eat them up: {", "}", "|", "\", "^", "~", "[", "]", and "`".

Reserved characters

Many URL schemes reserve certain characters for a special meaning, e.g. ";", "/", "?", ":", "@", "=" and "&"

Major Schemes

http Hypertext Transfer Protocol ftp File Transfer protocol mailto Electronic mail address news USENET news nntp USENET news using NNTP access telnet Reference to interactive sessions file Host-specific file names

Past (popular in the early nineties)

prospero Prospero Directory Service gopher The Gopher protocol wais Wide Area Information Servers

The HTTP Scheme

An HTTP URL takes the form:

http://<host>:<port>/<path>?<searchpart>

If :<port> is omitted, the port defaults to 80. No user name or password is allowed. <path></nowki> is an HTTP selector, and <nowiki><searchpart> is a query string. The <path> is optional, as is the <searchpart> and its preceding "?". If neither <path> nor <searchpart> is present, the "/" may also be omitted.

Within the <path> and <searchpart> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure.

References

Standards

RFC 1738