XML principles

The educational technology and digital learning wiki
Jump to navigation Jump to search

<pageby nominor="false" comments="false"/>

Learning goals

  • Understand the technical structure of an XML document
  • Understand well-formed documents
  • Understand valid documents
Prerequisites
Next steps


The two most fundamental ideas you should aquire are the following:

  • XML does not exist as a language. It is just a formalism for creating languages
  • XML can be used to describe almost any kind of structure

Structure of an XML document

An XML document usually includes:

  1. Processing instructions (at least an XML declaration on top !)
  2. Declarations, in particular a Document Schema lik a DTD
  3. Element markup: content delimited by tags like <my_tag>contents</my_tag> or tags without contents like <self_closing_tag/>
  4. Attribute markup like <my_tag style="green">....</my_tag>
  5. Entities (i.e. symbols that are substituted by other contents at runtime)
  6. comments: <!-- .... -->

XML documents are trees

For a computer person, an XML document is a tree (“boxes within boxes”). Inside a browser or most other clients, the document is represented as a tree-based data structure, the so-called Document Object Model (DOM)

Below is a CALS (Docbook) table example, i.e. both an XML markup and a graphic that shows its tree structure.

 <TABLE>  
   <TBODY>
     <TR> 
       <TD>Pierre Muller</TD> 
       <TD>http://pm.com/</TD> 
     </TR>
     <TR> <TD>Elisabeth Dupont</TD> <TD></TD> </TR> 
   </TBODY> 
 </TABLE>
Tree representation of a table display structure

The totally different example below (inspired by Wikipedia's Wikipedia RDF entry) shows an RDF fragment, i.e. a language that allows to define relationships between concepts.

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/">

        <rdf:Description rdf:about="http://edutechwiki.unige.ch/en/XML_principles">
                <dc:title>DKS</dc:title>
                <dc:publisher>EduTechWiki</dc:publisher>
        </rdf:Description>
</rdf:RDF>


All XML documents must be well-formed. XML documents can be valid with respect to a grammar (also called schema, document type, language, etc.). See below for details.

Well-formed XML documents

Any XML document must be at least well-formed. Well-formed XML documents obey the following rules:

(1) A document must start with an XML declaration, including the XML version number

<?xml version="1.0"?>

You may specify an encoding scheme (default is utf-8). Of course this means that you'll have to stick to this encoding ! Make sure to check your editor's settings.

<?xml version="1.0" encoding="ISO-8859-1"?> 

We suggest not to use any other encoding than UTF-8. However, you may have to deal with legacy XML documents that do use a restricted encoding scheme like ISO-8859-1.

(2) The XML structure must be hierarchical

  • start-tags and end-tags must match
  • no cross-overs like in the following bad example
  <i>...<b>...</i> .... </b>
  • pay attention to case sensitivity, e.g. "LI" is not "li"
  • "EMPTY" tags must use self-closing, e.g. <br></br> should be written as <br/>, a lonely <br> would be illegal

(3) Attributes must have values and values must be quoted:

e.g. <a href="http://scholar.google.com"> or <person status="employed">
e.g. <input type="radio" checked="checked">

(4) A single root element per document

The root element opens and closes content
The root element should not appear in the definition part of any other element

(5) Special characters: <, &, >,", and ’. Use one of the five predefined characters:

 &lt; &amp; &gt; &quot; &apos;

instead of

 <, &, >, ", '

This principle also applies also to URLs !!

bad: http://truc.unige.ch/programme?bla&machin
good: http://truc.unige.ch/programme?bla&amp;machin

Example of a minimal well-formed XML document:

 <?xml version="1.0" ?>
 <page updated="jan 2007">
  <title>Hello friend</title>
  <content> Here is some content :) </content> 
  <comment> Written by DKS/Tecfa </comment>
 </page>

This example:

  • has an XML declaration on top
  • has a root element (i.e. page)
  • elements are nested and tags are closed
  • the updated attribute has quoted value

XML names and CDATA Sections

Names used for elements should start with a letter and only use letters, numbers, the underscore, the hyphen and the period (no other punctuation marks) !

Good: <driversLicenceNo> <drivers_licence_no>
Bad: <driver’s_licence_number> <driver’s_licence_#> <drivers licence number>

When you want to display data that includes "XMLish" things like the < sign that should not be interpreted, then you can use so called CDATA Sections:

 <example> 
  <!CDATA[ 
   (x < y) is a math expression
 ]]>
</example>

Valid XML documents

Un valid document must be

  1. “well-formed” (see above)
  2. conform to a grammar (also called "schema"), .e.g. only use tags defined by the grammar and respect nesting, ordering and other constraints defined by that grammar.

Kinds of XML grammars:

  • DTDs are part of the XML standard
  • XML Schema (XSD) is a more recent W3C standard, used to express stronger constraints
  • Relax NG (RNG,RNC) is a OASIS standard (made by well known XML experts and who don’t like XML Schema ...). It has functionality comparable to XML Schema.
  • Schematron. A complementary standard that is used to define additional constraints that can't be expressed with either XML Schema or Relax NG

Name spaces

It is possible to use several vocabularies within a well-formed document. If the markup language formally includes compound languages, such documents also can be validated

E.g. there is a so-called profile for XHtml + SVG + MathML

Now, image that you just could mix tags from different languages together. The problem would be that the client application could not know which tags belong to which XML language. Also, there could be so-called naming conflicts (e.g. "title" does not means the same thing in XHTML and SVG). To address these problems so-called name-spaces have been invented, one can prefix element and attribute names with a label that represents a name space

Declaring name spaces for additional vocabularies

The "xmlns: name_space" attribute allows to introduce a new vocabulary. It tells that all elements or attributes prefixed by "name_space" belong to a different vocabulary

Syntax:

xmlns:name_space="URL_name_of_name_space"

SVG within XHTML example

 <html xmlns:svg="http://www.w3.org/2000/svg">
    <svg:rect x="50" y="50" rx="5" ry="5" width="200" height="100" ....
xmlns:svg = "..." means that svg: prefixed elements are part of SVG

Xlink example:

XLink is a language to define links (only works with Firefox-based browsers)

 <RECIT xmlns:xlink="http://www.w3.org/1999/xlink">
 <INFOS>
   <Date>30 octobre 2003 - </Date>
   <Auteur>DKS - </Auteur>
   <A xlink:href="http://jigsaw.w3.org/css-validator/check/referer"
      xlink:type="simple">CSS Validator</A>
  </INFOS>

Namespace declaration for the main vocabulary

The main vocabulary can be introduced by an attribute like:

 xmlns="URL_name_of_name_space"

Some specifications (e.g. SVG or XHTML) require a name space declaration in any case (even if you do not use any other vocabulary) !

SVG namespace example

 <svg xmlns="http://www.w3.org/2000/svg">
    <rect x="50" y="50" rx="5" ry="5" width="200" height="100" ....

What are Namespace URLs ?

URLs that define namespaces are just names, there doesn’t need to be a real link. E.g. for your own purposes you could very well make up something like:

 <account xmls:pin = "http://joe.miller.com/pin">
   <pin:name>Joe</pin:name>
 </account>

... and the URL http://joe.miller.com/pin doesn’t need to exist for real.

XML with style

XML per se doesn't say anything about display and style, however:

  • Some languages like HTML or SVG or X3D do have built-in rendering mechanisms
  • XML documents can be associated with a CSS stylesheet for rendering in a web browser. However, using CSS only makes sense when the XML is text-centric and contents are embedded withing tags (as opposed to attributes). Read the CSS for XML tutorial if you want to learn more.
  • XSLT allows to translate and XML document into something else, e.g. you could translate your own little XMLL language into HTML or SVG for display.
  • Other specialized styling languages exist, like XSL-FO for producing print documents.