HTML and XHTML elements and attributes: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
Line 20: Line 20:
=== SGML and XML markup ===
=== SGML and XML markup ===


[[SGML]] and [[XML]] are the formalisms with which formal languages like HTML (in SGML) and XHTML (in XML) are defined. Such definitions are called "document type definitions", "schemas" or "grammars". Read the [[DTD tutorial]] if you wish to know details. For the moment, you just need to understand that these grammars are sets of rules that define:
[[SGML]] and [[XML]] are the formalisms with which formal languages like HTML (in SGML) and XHTML (in XML) are defined. SGML at some time was replaced by XML which is simpler in structure, but more powerful in terms of tools that have been built around it.
 
Definitions of formal languages are called "document type definitions", "schemas" or "grammars". Read the [[DTD tutorial]] if you wish to know details. For the moment, you just need to understand that these grammars are sets of rules that define:
* a set of '' elements'' (tags) and their '' attributes'' that identify various the structural elements of an HTML page;
* a set of '' elements'' (tags) and their '' attributes'' that identify various the structural elements of an HTML page;
* '' how'' these elements can be '' embedded'' ;
* '' how'' these elements can be '' embedded'' ;
* different sorts of entities (reusable fragments, special characters).
* different sorts of entities (reusable fragments, special characters).
SGML and XML languages, e.g. HTML and XHTML have three kinds of components:
* elements
* attributes
* character and entity references (not explained here)
These elements use a special syntax that we shall introduce now with an explanation taken from [http://en.wikipedia.org/wiki/HTML Wikipedia]
=== An introduction to the (X)HTML markup formalism ===
HTML elements are the basic components for HTML markup. Elements have two basic properties: attributes and content. Each element's attribute and each element's content has certain restrictions that must be followed for an HTML document to be considered valid. An element usually has a start tag (e.g. <code><element-name></code>) and an end tag (e.g. <code></element-name></code>). The element's attributes are contained in the start tag and content is located between the tags (e.g. <code><element-name&nbsp;attribute="value">Content</element-name></code>). Some elements, such as <code><nowiki><br></nowiki></code>, do not have any content and must not have a closing tag. Listed below are several types of markup elements used in HTML.
'''Structural''' markup describes the purpose of text. For example, <code><nowiki><h2>Golf</h2></nowiki></code> establishes "Golf" as a second-level [[heading]], which would be rendered in a browser in a manner similar to the "HTML markup" title at the start of this section. Structural markup does not denote any specific rendering, but most Web browsers have standardized default styles for element formatting. Text may be further styled with [[Cascading Style Sheets]] (CSS).
'''Presentational''' markup describes the appearance of the text, regardless of its function. For example <code><nowiki><b>boldface</b></nowiki></code> indicates that visual output devices should render "boldface" in bold text, but gives no indication what devices which are unable to do this (such as aural devices that read the text aloud) should do. In the case of both <code><nowiki><b>bold</b></nowiki></code> and <code><nowiki><i>italic</i></nowiki></code>, there are elements which usually have an equivalent visual rendering but are more semantic in nature, namely <code><nowiki><strong>strong emphasis</strong></nowiki></code> and <code><nowiki><em>emphasis</em></nowiki></code> respectively. It is easier to see how an aural user agent should interpret the latter two elements. However, they are not equivalent to their presentational counterparts: it would be undesirable for a screen-reader to emphasize the name of a book, for instance, but on a screen such a name would be italicized. Most presentational markup elements have become [[Deprecation|deprecated]] under the HTML 4.0 specification, in favor of [[Cascading Style Sheets|CSS]] based style design.
'''Hypertext''' markup links parts of the document to other documents. HTML up through version [[XHTML]] 1.1 requires the use of an anchor element to create a hyperlink in the flow of text: <code><nowiki><a>Wikipedia</a></nowiki></code>. However, the <code>href</code> attribute must also be set to a valid [[Uniform Resource Locator|URL]] so for example the HTML markup, <code><nowiki><a&nbsp;href="http://en.wikipedia.org/">Wikipedia</a></nowiki></code>, will render the word "<span class="plainlinks">[http://en.wikipedia.org/ Wikipedia]</span>" as a [[hyperlink]].To link on an image, the anchor tag use the following syntax: <code><a href="url"><img src="image.gif" alt="alternative text" width="50" height="50"></a></code>
Let's now look at attributes.
Most of the attributes of an element are name-value pairs, separated by "=", and written within the start tag of an element, after the element's name. The value may be enclosed in single or double quotes, although values consisting of certain characters can be left unquoted in HTML (but not XHTML).
Most elements can take any of several common attributes:
* The <code>id</code> attribute provides a document-wide unique identifier for an element. This can be used by stylesheets to provide presentational properties, by browsers to focus attention on the specific element, or by scripts to alter the contents or presentation of an element. Appended to the URL of the page, it provides a globally-unique identifier for an element; typically a sub-section of the page. For example, the ID "Attributes" in <code><nowiki>http://en.wikipedia.org/wiki/HTML#Attributes</nowiki></code>
* The <code>class</code> attribute provides a way of classifying similar elements. This can be used for presentation purposes for example. An HTML document might use the designation <code>class="notation"</code> to indicate that all elements with this class value are subordinate to the main text of the document. Such elements might be gathered together and presented as footnotes on a page instead of appearing in the place where they occur in the HTML source.
* An author may use the <code>style</code> non-attributal codes presentational properties to a particular element. It is considered better practice to use an element’s <code>id</code> or <code>class</code> attributes to select the element with a stylesheet, though sometimes this can be too cumbersome for a simple ad hoc application of styled properties.
* The <code>title</code> attribute is used to attach subtextual explanation to an element. In most browsers this attribute is displayed as what is often referred to as a tooltip.
The abbreviation element, <code>abbr</code>, can be used to demonstrate these various attributes:
<source lang="html4strict"><abbr id="anId" class="aClass" style="color:blue;" title="Hypertext Markup Language">HTML</abbr></source>
This example displays as <span id="anId" class="aClass" style="color:blue;" title="Hypertext Markup Language">HTML</span>; in most browsers, pointing the cursor at the abbreviation should display the title text "Hypertext Markup Language."
Most elements also take the language-related attributes <code>lang</code> and <code>dir</code>.
=== Structure of an HTML page ===


Markup of an HTML page is divided into two big parts: the head contains information that the user will not see inside the browser window and the body contains the contents to be displayed. We can express this with a simple formula:
Markup of an HTML page is divided into two big parts: the head contains information that the user will not see inside the browser window and the body contains the contents to be displayed. We can express this with a simple formula:

Revision as of 19:00, 1 September 2009

This article or section is currently under construction

In principle, someone is working on it and there should be a better version in a not so distant future.
If you want to modify this page, please discuss it with the person working on it (see the "history")

<pageby nominor="false" comments="false"/>

Introduction

Learning goals
  • Learn basic HTML and XHTML markup
Prerequisites
  • None
Moving on
Level and target population
  • Beginners
Remarks
  • For the moment, this article is intended to be a "handout" for "lab" teaching. In other words, a teacher + hands-on activities are needed.

SGML and XML markup

SGML and XML are the formalisms with which formal languages like HTML (in SGML) and XHTML (in XML) are defined. SGML at some time was replaced by XML which is simpler in structure, but more powerful in terms of tools that have been built around it.

Definitions of formal languages are called "document type definitions", "schemas" or "grammars". Read the DTD tutorial if you wish to know details. For the moment, you just need to understand that these grammars are sets of rules that define:

  • a set of elements (tags) and their attributes that identify various the structural elements of an HTML page;
  • how these elements can be embedded ;
  • different sorts of entities (reusable fragments, special characters).

SGML and XML languages, e.g. HTML and XHTML have three kinds of components:

  • elements
  • attributes
  • character and entity references (not explained here)

These elements use a special syntax that we shall introduce now with an explanation taken from Wikipedia

An introduction to the (X)HTML markup formalism

HTML elements are the basic components for HTML markup. Elements have two basic properties: attributes and content. Each element's attribute and each element's content has certain restrictions that must be followed for an HTML document to be considered valid. An element usually has a start tag (e.g. <element-name>) and an end tag (e.g. </element-name>). The element's attributes are contained in the start tag and content is located between the tags (e.g. <element-name attribute="value">Content</element-name>). Some elements, such as <br>, do not have any content and must not have a closing tag. Listed below are several types of markup elements used in HTML.

Structural markup describes the purpose of text. For example, <h2>Golf</h2> establishes "Golf" as a second-level heading, which would be rendered in a browser in a manner similar to the "HTML markup" title at the start of this section. Structural markup does not denote any specific rendering, but most Web browsers have standardized default styles for element formatting. Text may be further styled with Cascading Style Sheets (CSS).

Presentational markup describes the appearance of the text, regardless of its function. For example <b>boldface</b> indicates that visual output devices should render "boldface" in bold text, but gives no indication what devices which are unable to do this (such as aural devices that read the text aloud) should do. In the case of both <b>bold</b> and <i>italic</i>, there are elements which usually have an equivalent visual rendering but are more semantic in nature, namely <strong>strong emphasis</strong> and <em>emphasis</em> respectively. It is easier to see how an aural user agent should interpret the latter two elements. However, they are not equivalent to their presentational counterparts: it would be undesirable for a screen-reader to emphasize the name of a book, for instance, but on a screen such a name would be italicized. Most presentational markup elements have become deprecated under the HTML 4.0 specification, in favor of CSS based style design.

Hypertext markup links parts of the document to other documents. HTML up through version XHTML 1.1 requires the use of an anchor element to create a hyperlink in the flow of text: <a>Wikipedia</a>. However, the href attribute must also be set to a valid URL so for example the HTML markup, <a href="http://en.wikipedia.org/">Wikipedia</a>, will render the word "Wikipedia" as a hyperlink.To link on an image, the anchor tag use the following syntax: <a href="url"><img src="image.gif" alt="alternative text" width="50" height="50"></a>

Let's now look at attributes.

Most of the attributes of an element are name-value pairs, separated by "=", and written within the start tag of an element, after the element's name. The value may be enclosed in single or double quotes, although values consisting of certain characters can be left unquoted in HTML (but not XHTML).

Most elements can take any of several common attributes:

  • The id attribute provides a document-wide unique identifier for an element. This can be used by stylesheets to provide presentational properties, by browsers to focus attention on the specific element, or by scripts to alter the contents or presentation of an element. Appended to the URL of the page, it provides a globally-unique identifier for an element; typically a sub-section of the page. For example, the ID "Attributes" in http://en.wikipedia.org/wiki/HTML#Attributes
  • The class attribute provides a way of classifying similar elements. This can be used for presentation purposes for example. An HTML document might use the designation class="notation" to indicate that all elements with this class value are subordinate to the main text of the document. Such elements might be gathered together and presented as footnotes on a page instead of appearing in the place where they occur in the HTML source.
  • An author may use the style non-attributal codes presentational properties to a particular element. It is considered better practice to use an element’s id or class attributes to select the element with a stylesheet, though sometimes this can be too cumbersome for a simple ad hoc application of styled properties.
  • The title attribute is used to attach subtextual explanation to an element. In most browsers this attribute is displayed as what is often referred to as a tooltip.

The abbreviation element, abbr, can be used to demonstrate these various attributes:

<abbr id="anId" class="aClass" style="color:blue;" title="Hypertext Markup Language">HTML</abbr>

This example displays as HTML; in most browsers, pointing the cursor at the abbreviation should display the title text "Hypertext Markup Language."

Most elements also take the language-related attributes lang and dir.

Structure of an HTML page

Markup of an HTML page is divided into two big parts: the head contains information that the user will not see inside the browser window and the body contains the contents to be displayed. We can express this with a simple formula:

html = head + body

A most simple HTML document that would just display Hello EdutechWiki reader! would look like this:

<html>
  <head>
    <title>Hello Page</title>
  </head>
  <body>
    <p>Hello EdutechWiki reader!</p>
  </body>
</html>

In addition, the first line(s) of an (X)HTML page usually contains a declaration that precisely defines what HTML dialect is being used.

HTML and XHTML code examples

HTML 4.01 strict example

Source: http://www.w3.org/TR/html4/struct/global.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<HTML>
   <HEAD>
      <TITLE>My first HTML document</TITLE>
   </HEAD>
   <BODY>
      <P>Hello world!
   </BODY>
</HTML>

HTML tags may use any kind of case, e.g. HEAD, Head, head, heaD would be correct. To insure XHTML compatibily we suggest to adopt the following strategy:

  • use only lower case as in the example below that is formally identical to the one above
  • always close tags (more on that later ...)
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
   <head>
      <title>My first HTML document</title>
   </head>
   <body>
      <p>Hello world!</p>
   </body>
</html>
XHTML 1.0 strict example

Source: http://www.w3.org/TR/xhtml1/

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
      <title>My first XHTML document</title>
  </head>
  <body>
      <p>Hello world!</p>
  </body>
</html>

As you can see HTML and XHTML look very similar. The major difference between HTML and XHTML are the following:

  • In HTML, some tags e.g. the p and li tags can be left "open", i.e. it is not necessary to add a closing tag
  • Attributes in HTML do not always need a value.
  • In XHTML, the html tag needs a namespace declaration (but not in HTML).

This may be confusing for a beginner. So to make things simple:

  • Always start with one of the two templates above (your web authoring system may do this automatically for your)
  • Always close all tags, even when you just write "old" HTML code

HTML and XHTML structure and document type information

Let's now have a look at the lines before the html tag.

Correct HTML files should include the following document type declaration information starting on line 1. Before we add more explanation we suggest that you either use HTML 4.01 Transitional or XHTML 4.01 Transitional for pages meant for reading on a computer and XHTML Basic for (modern) cellphones and PDAs.

The rationale for including this information is that display will be better when the browser knows what kind of (X)HTML you intended to use.

There exist three major HTML document types:

HTML 4.01 Strict
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
        "http://www.w3.org/TR/html4/strict.dtd">
HTML 4.01 Transitional
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">
HTML 4.01 Frameset
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
        "http://www.w3.org/TR/html4/frameset.dtd">

There exist four major XHTML document types:

XHTML 4.01 Strict
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML 4.01 Transitional
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
XHTML 4.01 Frameset
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
XHTML Basic
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.1//EN"
    "http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd">

Note regarding XHTML and XML:

  • If you intend to serve XHTML as XML (e.g. in order to include other XML languages within your document) we suggest adding an XML declaration at the very beginning of the file.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  • In adition, do not forget to declare the xhtml namespace attribute in the html tag

The head element

Definition of the character set:

 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />


Structuring the document body

Inside the body tag, a variety high level elements may used in any order as the following pseudo-formal rule shows:

body = ( address | blockquote | div | dl | h1 | h2 | h3 | ol | p | pre | table | ul )*

Headings (titles)

Paragraphs

Lists

Adding markup to block elements

inline element = ( Your text | a | abbr | acronym | br | cite | code | em | img | kbd | q | samp | span | strong ) *

Acknowledgement and copyright