Editing XML tutorial: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
m (Text replacement - "<pageby nominor="false" comments="false"/>" to "<!-- <pageby nominor="false" comments="false"/> -->")
 
(36 intermediate revisions by the same user not shown)
Line 1: Line 1:
<pageby nominor="false" comments="false"/>
<!-- <pageby nominor="false" comments="false"/> -->
{{web technology tutorial|beginner}}
{{web technology tutorial|beginner}}
{{incomplete}}
{{incomplete}}
Line 7: Line 7:
This is a beginners tutorial for XML editing made from slides
This is a beginners tutorial for XML editing made from slides


;Objectives
<div class="tut_goals">
'''Learning goals'''


* Be able to read schemas and find other documentation
* Be able to somewhat understand Document Type Definition (DTD) schemas
* Understand the necessity of using an XML editor
* Understand the necessity of using an XML editor
* Be able to edit XML without hand-editing tags, profit from DTD and Schema awareness
* Be able to edit XML without hand-editing tags, profit from editors that have DTD and Schema awareness (most do not !)
* Be able to check well-formedness and validate
* Be able to check well-formedness and validate
* Be able to fix errors
* Be able to fix well-formedness and validity errors


; Prerequisites
; Prerequisites


* Some idea what [[XML]] is about
* Some idea what [[XML]] is about
* XML namespaces (some, for more information, have a look at the [[XML namespace]] article)
* [[XML principles]] (important !)
* HTML and CSS (some)
* [[XML namespace]] (optional)
* [[HTML]] and [[CSS]] (some)


; Next steps
; Next steps


* [[CSS for XML tutorial]]
* [[DTD tutorial]]
* [[DTD tutorial]]
* [[XML namespace]] (more details about XML namespaces)
* [[XML Schema tutorial - Basics]]
* [[XML Schema tutorial - Basics]]
* [[XSLT Tutorial - Basics]]
* [[XSLT Tutorial - Basics]]
Line 30: Line 34:
* [[PHP - MySQL - XML tutorial - basics]] (shows how to display an XML result-set retrieved form MySQL with XSLT)
* [[PHP - MySQL - XML tutorial - basics]] (shows how to display an XML result-set retrieved form MySQL with XSLT)


== XML Principles ==
</div>


Let's recall some principles that you also may have read in the [[XML]] article. In particular:
=== Recall of XML principles ===
 
Let us recall some principles that you also may have read in the [[XML principles]] article. In particular:
# An XML document is a hierarchical structure
# An XML document is a hierarchical structure
# Syntax must be well-formed (all tags closed, etc.)
# Syntax must be well-formed (all tags closed, etc.)
Line 39: Line 45:
# Often, more than one XML language appears in a document. In that case, so-called namespaces must be used
# Often, more than one XML language appears in a document. In that case, so-called namespaces must be used


We now shall provide some more details about these principles.
=== Defining XML languages ===


=== Structure of an XML document ===
Many XML languages are defined with so-called schemas, i.e. some sort of grammars that define elements (tags) and attributes and how they can be combined. There exist several schema formalism. Other languages are defined with a simple textual description, e.g. the well-known RSS 0.9 syndication language. Often a language is defined using both schemas and text, e.g. HTML and SVG define the main structure with a DTD but add extra constraints for certain elements and attributes through simple descriptions. A good example would be measures. A length can be expressed in m, cm, in, pt, px, %, etc. and that cannot be defined with the simple DTD language.


'''An XML document usually includes:'''
There are four more or less popular schema languages:


# Processing instructions (at least an XML declaration on top !)
(1) '''Document Type Definitions''' (DTDs)
# Declarations, in particular a Document Schema lik a [[DTD tutorial|DTD]]
* DTD uses a terse formal syntax that declares in particular which elements and attributes may appear in a document and how they should be nested. On can define how elements can be nested within other elements, what attributes can be used within an element and finally it is possible to declare some very simple data types for attributes.
# Element markup: content delimited by tags like <nowiki><my_tag>contents</my_tag></nowiki> or tags without contents like <nowiki><self_closing_tag/></nowiki>
# Attribute markup like <nowiki><my_tag style="green">....</my_tag> </nowiki>
# Entities (i.e. symbols that are substituted by other contents at runtime)
# comments: <nowiki><!-- .... --></nowiki>


'''XML documents are trees'''
(2) '''XML Schema'''
* [[XML Schema]] has the same purpose as DTDs, but allows to add additional constraints, e.g. you could require that and element should include only a number and that this number should be in the range of 1:10. XML Schemas are mostly used to describe complex document and data formats, e.g. e-learning standards or Microsoft "dotX" formats.


For a computer person, an XML document is a tree (“boxes within boxes”). Inside a browser or most other clients, the document is represented as a tree-based data structure, the so-called Document Object Model (DOM)
(3) '''Relax NG'''
* is an XML Schema language that represents a sort of compromise between the simplicity of DTDs and the complexity of W3C [[XML Schema]]


Below is a CALS (Docbook) table example, i.e. both an XML markup and a graphic that shows its tree structure.
(4) '''Schematron'''
<source lang="XML">
* is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It can be used in addition to other Schema languages.
<TABLE> 
  <TBODY>
    <TR>
      <TD>Pierre Muller</TD>
      <TD>http://pm.com/</TD>
    </TR>
    <TR> <TD>Elisabeth Dupont</TD> <TD></TD> </TR>
  </TBODY>
</TABLE>
</source>


[[image:xml-edit2.png|frame|none|Tree representation of a table display structure]]
== Using DTDs (Document Type Definitions) ==


'''All XML documents must be well-formed'''. XML documents ''can'' be '''valid''' with respect to a '''grammar''' (also called schema, document type, language, etc.). See below for details.
=== Principles ===


===  Well-formed and valid XML documents ===
DTD grammars are just a set of rules that define:


'''Any''' XML document must be at least '''well-formed'''. Well-formed XML documents obey the following rules:
* a '''set of elements''' (tags) and '''their attributes''';
* how elements can be '''combined/embedded''';
* different sorts of '''entities''' (reusable fragments, special characters).


(1) A document must start with an XML declaration (including version number !)
The most important part in a formal XML specification making use of DTDs, is usually the DTD. In addition, other constraints can be added ! In particular:
<?xml version="1.0"?>
* The DTD does not identify the root element ! You have to tell the users what elements can be root elements
You may specify and encoding (default is utf-8). Of course this means that you'll have to stick to an encoding ! Make sure to check your editor's settings.
* Since DTDs can’t express data constraints, you may write out additional ones in a specification document
<?xml version="1.0" encoding="ISO-8859-1"?>  
: e.g. "the value of length attribute is a string composed of a number plus one of "inch", "em", "cm".
<source lang="XML">
<size length="10cm">
</source>


(2) XML structure must be hierarchical
'''DTD file association with an XML file'''
* '''start-tags''' and '''end-tags''' must '''match'''
* '''no cross-overs''' as in
<source lang="xml">
  <i>...<b>...</i> .... </b>
</source>
* pay attention to '''case sensitivity''', e.g. "LI" is not "li"
* "EMPTY" tags must use '''self-closing''', e.g. <nowiki><br></br></nowiki> should be written as <nowiki><br/></nowiki>, a lonely <nowiki><br></nowiki> would be illegal


(3) '''Attributes''' must have values and '''values are quoted''':
XML grammars like DTDs and XML Schemas can be directly associated with an XML file. This way, the XML carries information about its content structure that allows any client to verify if it is valid.
: e.g. <a href="http://scholar.google.com"> or <person status="employed">
: e.g. <input type="radio" checked="checked">


(4) A '''single root element''' per document
'''A simple DTD example''' (file "page.dtd")
: Root element opens and closes content
: The root element should not appear in any other element


(5) '''Special characters''': <, &, >,", and ’. Use one of the five predefined characters:
<source lang="xml">
<source lang="xml">
  &lt; &amp; &gt; &quot; &apos;
  <!ELEMENT page  (title, content, comment?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT content (#PCDATA)>
<!ELEMENT comment (#PCDATA)>
</source>
</source>
instead of
<source lang="xml">
<, &, >, ", '
</source>
This principle also applies also to URLs !!
:bad:  http://truc.unige.ch/programme?bla&machin
:good: http://truc.unige.ch/programme?bla&amp;amp;machin


Example of a minimal well-formed XML document:
The following XML document is a valid with respect to the grammar defined in "page.dtd" (just above)
<source lang="xml">
<source lang="xml">
  <?xml version="1.0" ?>
  <?xml version="1.0"?>
  <page updated="jan 2007">
  <!DOCTYPE page SYSTEM "page.dtd">
<page>
   <title>Hello friend</title>
   <title>Hello friend</title>
   <content> Here is some content :) </content>  
   <content>Here is some content :)</content>
   <comment> Written by DKS/Tecfa </comment>
   <comment>Written by Anonymous</comment>
  </page>
  </page>
</source>
</source>
This example:
* has an XML declaration on top
* has a root element (i.e. '''page''')
* elements are nested and tags are closed
* the ''updated'' attribute has quoted value


=== XML names and CDATA Sections ===
A DTD document contains just definition of rules .... nothing else (see later for explanations). The "page" DTD defines the following:
* a ''page'' element, that must include a ''title'' followed by a ''content'' element and optionally a ''comment'' element.
* the ''title, content and comment'' elements only can include tags, i.e. no other tags.


Names used for elements should start with a letter and only use letters, numbers, the underscore, the hyphen and the period (no other punctuation marks) !
'''Specification of a markup language. Is a DTD enough ?'''
: Good: <driversLicenceNo> <drivers_licence_no>
: Bad: <driver’s_licence_number> <driver’s_licence_#> <drivers licence number>
When you want to display data that includes "XMLish" things like the &lt; sign that should not be interpreted, then you can use so called CDATA Sections:
<source lang="xml">
<example>
  <!CDATA[
  (x < y) is a math expression
]]>
</example>
</source>


=== Valid XML documents ===
DTDs can’t define what the character data (element contents) and most attribute values should look like. For example, if you require that the user enters a number between 10 and 15 or the name of 15 different capitals, then you would have to use another formalism than DTD.


Un valid document must be
We introduce some of the DTD "language" below, but details are explained in the [[DTD tutorial]]. But let us now first systematically describe how a DTD file can be associated with an XML document.


# “well-formed” (see above)
=== Associating a DTD with an XML document ===
# conform to a grammar (also called "schema"), .e.g.  only use tags defined by the grammar and respect nesting, ordering and other constraints defined by that grammar.


'''Kinds of XML grammars''':
There are four ways of using a DTD with an XML file:


* '''DTD'''s are part of the XML standard
(1) '''No DTD'''  
* '''XML Schema''' (XSD) is a more recent W3C standard, used to express stronger constraints
* XML document will just be well-formed, or validation takes place in some other contexts, e.g. there exist tools that allow you to find out if a given XML document is valid with respect to a given DTD file)
* '''Relax NG''' (RNG,RNC) is a OASIS standard (made by well known XML experts and who don’t like XML Schema ...). It has functionality comparable to XML Schema.
* '''Schematron'''. A complementary standard that is used to define additional constraints that can't be expressed with either XML Schema or Relax NG


=== Name spaces ===
It is possible to use several vocabularies within a well-formed document. If the markup language formally includes compound languages, such documents also can be validated
: E.g. there is a so-called profile for XHtml + SVG + MathML
Now, image that you just could mix tags from different languages together. The problem would be that the client application could not know which tags belong to which XML language. Also, there could be so-called naming conflicts (e.g. "title" does not means the same thing in XHTML and SVG). To address these problems so-called name-spaces have been invented, one can prefix element and attribute names with a label that represents a '''name space'''
'''Declaring name spaces for additional vocabularies'''
The "'''xmlns: name_space'''" attribute allows to introduce a new vocabulary. It tells that all elements or attributes prefixed by "''name_space''" belong to a different vocabulary
Syntax:
:xmlns:''name_space''="URL_name_of_name_space"
'''SVG within XHTML example'''
<source lang="xml">
<source lang="xml">
  <html xmlns:svg="http://www.w3.org/2000/svg">
  <?xml version="1.0" standalone="yes"?>
    <svg:rect x="50" y="50" rx="5" ry="5" width="200" height="100" ....
<hello> Hello XML et hello cher lecteur ! </hello>
</source>
</source>


: '''xmlns:svg''' = "..." means that '''svg:''' prefixed elements are part of SVG


'''Xlink example''':
(2) '''DTD rules are defined inside the XML document'''
 
* We get a "standalone" document (the XML document is self-sufficient)
XLink is a language to define links (only works with Firefox-based browsers)
* Notice the use of brackets [....]


<source lang="xml">
<source lang="xml">
<RECIT xmlns:xlink="http://www.w3.org/1999/xlink">
  <?xml version="1.0" standalone="yes"?>
<INFOS>
  <!DOCTYPE hello [
  <Date>30 octobre 2003 - </Date>
    <!ELEMENT hello (#PCDATA)>
  <Auteur>DKS - </Auteur>
    ]>
  <A xlink:href="http://jigsaw.w3.org/css-validator/check/referer"
  <hello> Hello XML et hello dear readers ! </hello>
      xlink:type="simple">CSS Validator</A>
  </INFOS>
</source>
</source>


'''Namespace declaration for the main vocabulary'''


The main vocabulary can be introduced by an attribute like:
(3) '''Private/System DTDs'''
  ''xmlns="URL_name_of_name_space"''
* the DTD is located on the system (own computer or the Internet).
Some specifications (e.g. SVG or XHTML) require a name space declaration in any case (even if you do not use any other vocabulary) !
* That's what you are going to use when you write your own DTDs.


'''SVG namespace example'''
<source lang="xml">
<source lang="xml">
  <svg xmlns="http://www.w3.org/2000/svg">
  <?xml version="1.0" ?>
    <rect x="50" y="50" rx="5" ry="5" width="200" height="100" ....
<!DOCTYPE hello SYSTEM "hello.dtd">
<hello> This is a very simple XML document </hello>
</source>
</source>


'''What are Namespace URLs''' ?


URLs that define namespaces are '''just names''', there doesn’t need to be a real link. E.g. for your own purposes you could very well make up something like:
(4) '''Public DTDs'''
* We use a name for the DTD. This means that both your XML editor and user software know the DTD. This is the strategy used for common Web DTDs like XHTML, SVG, MathML, etc.
* The naming convention also allows for a fallback URI that should include the physical DTD file.


<source lang="xml">
<source lang="xml">
  <account xmls:pin = "http://joe.miller.com/pin">
  <?xml version="1.0" "?>
  <pin:name>Joe</pin:name>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
</account>
  "http://my.netscape.com/publish/formats/rss-0.91.dtd">
</source>
  <rss version="0.91">
... and the URL http://joe.miller.com/pin doesn’t need to exist for real.
  <channel> ...... </channel>
 
  </rss>
=== XML with style ===
 
XML per se doesn't say anything about display and style, however:
 
* Some languages like [[HTML]] or [[SVG]] or [[X3D]] do have built-in rendering mechanisms
* XML documents can be associated with a CSS stylesheet for rendering in a web browser. However, using CSS only makes sense when the XML is text-centric and contents are embedded withing tags (as opposed to attributes). Read the [[CSS for XML tutorial]] if you want to learn more.
* [[XSLT Tutorial - Basics|XSLT]] allows to translate and XML document into something else, e.g. you could translate your own little XMLL language into HTML or SVG for display.
* Other specialized styling languages exist, like XSL-FO for producing print documents.
 
== Using DTDs (Document Type Definitions) ==
 
DTD grammars are just a set of rules that define:
 
* a '''set of elements''' (tags) and '''their attributes''';
* how elements can be '''combined/embedded''';
* different sorts of '''entities''' (reusable fragments, special characters).
 
'''Specification of a markup language. Is a DTD enough ?'''
 
DTDs can’t define what the character data (element contents) and most attribute values should look like. For example, one could require that the user enters a number between 10 and 15 or the name of 15 different capitals.
 
The most important part in a formal XML specification making use of DTDs, is usually the DTD. In addition, other constraints can be added ! In particular:
* The DTD does not identify the root element ! You have to tell the users what elements can be root elements
* Since DTDs can’t express data constraints, you may write out additional ones in a specification document
: e.g. "the value of length attribute is a string composed of a number plus one of "inch", "em", "cm".
<source lang="XML">
  <size length="10cm">
</source>
 
'''A simple DTD example'''
 
<source lang="xml">
  <!ELEMENT page  (title, content, comsment?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT content (#PCDATA)>
<!ELEMENT comment (#PCDATA)>
</source>
 
A DTD document contains just definition of rules .... nothing else (see later for explanations)
 
=== Using a DTD with an XML document ===
 
A valid XML document '''may include a declaration that identifies a DTD to be used'''. Therefore, the <!DOCTYPE...> declaration is part of the XML file, '''not''' of the DTD ....
 
'''Example of an XML file with a DTD declaration'''
 
<source lang="xml">
  <?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE hello SYSTEM "hello.dtd">
</source>
</source>


There are four ways of using a DTDs, each one in different use contexts:
=== Syntax of the DTD declaration in the XML document ===
 
(1) No DTD (XML document will just be well-formed)
 
(2) DTD rules are defined '''inside''' the XML document
 
: In that case, we get a "standalone" document (the XML document is self-sufficient)
 
(3) Private/System" DTDs, the DTD is located on the system (own computer or the Internet)
 
: That’s what '''you''' are going to use when you write your own DTDs
 
(4) Public DTDs, i.e. we use an official name for the DTD.
 
: This implies that both your XML editor and the user software knows the DTD. It's a strategy used for common Web technology DTDs like XHTML, SVG, MathML, etc.
 
'''Where to insert the DTD declaration?'''


A DTD is always declared on top of the file after the XML declaration. All XML declarations, DTD declaration etc. are part of the so-called '''prologue'''.
The syntax rules are fairly simple and can be understood from looking at the example above, and you may skip this section ....
 
'''Syntax of the DTD declaration in the XML document'''


(1) Every DTD declaration must start with
(1) Every DTD declaration must start with
  <!DOCTYPE .... >
  <!DOCTYPE .... >


(2) Then, the root element must be specified next. Remember that DTDs don’t know their root element, root is defined in the XML document ! DTDs must define this root element just like any other element ! In some cases, DTDs are meant to be used in different ways, i.e. several elements could be used as root elements.
(2) Then, '''the root element must be specified next'''. Remember that DTDs don’t know their root element, root is defined in the XML document ! DTDs must define this root element just like any other element ! In some cases, DTDs are meant to be used in different ways, i.e. several elements could be used as root elements.
   <!DOCTYPE hello .... >
   <!DOCTYPE hello .... >


(3) The next elements of the DTD declaration are different according to the DTD type (public or private)
(3) The next elements of the DTD declaration are different according to the DTD type (public or private)


(1) Syntax for internal DTDs (only !). DTD rules are inserted between brackets [ ... ]
(a) Syntax for internal DTDs (only !). DTD rules are inserted between brackets [ ... ]
 
<source lang="xml">
<source lang="xml">
  <!DOCTYPE hello [
  <!DOCTYPE hello [
Line 301: Line 185:
</source>
</source>


(2) Syntax to define "private" external DTDs: The DTD is identified by the URL after the "'''SYSTEM'''" keyword
(b) Syntax to define "private" external DTDs: The DTD is identified by the URL after the "'''SYSTEM'''" keyword
 
<source lang="xml">
<source lang="xml">
  <!DOCTYPE hello SYSTEM "hello.dtd">
  <!DOCTYPE hello SYSTEM "hello.dtd">
</source>
</source>
 
Example using an URL
(3) Syntax for public DTDs: After the "PUBLIC" keyword you have to specify an official name and a backup URL that a validator could use. For example:
<source lang="xml">
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
  "http://my.netscape.com/publish/formats/rss-0.91.dtd">
</source>
 
=== Some examples ===
 
The DTD file itself does not contain any DTD declaration, just rules. Below are some examples of XML documents with DTD declarations:
 
'''Hello XML without DTD'''
 
<source lang="xml">
<?xml version="1.0" standalone="yes"?>
<hello> Hello XML et hello cher lecteur ! </hello>
</source>
 
'''Hello XML with an internal DTD'''
 
<source lang="xml">
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE hello [
    <!ELEMENT hello (#PCDATA)>
    ]>
<hello> Hello XML et hello chère lectrice ! </hello>
</source>
 
'''Hello XML with an external DTD'''
 
<source lang="xml">
<source lang="xml">
<?xml version="1.0" encoding="ISO-8859-1" ?>
  <!DOCTYPE hello SYSTEM "http://tecfa.unige.ch/guides/xml/examples/simple/hello-page.dtd">
  <!DOCTYPE hello SYSTEM "hello.dtd">
<hello> Hello XèMèLè et hello cher lectrice ! </hello>
</source>
</source>


That’s what you should do with your own home-made DTDs
(c) Syntax for public DTDs: After the "PUBLIC" keyword you have to specify an official name and a backup URL that a validator could use. For example:
 
'''XML with a public external DTD (RSS 0.91)'''


<source lang="xml">
<source lang="xml">
<?xml version="1.0" encoding="ISO-8859-1"?>
  <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
  <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
  "http://my.netscape.com/publish/formats/rss-0.91.dtd">
  "http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91">
<channel> ...... </channel>
</rss>
</source>
</source>


Line 362: Line 210:
Below is a simple XML document of type <page>:
Below is a simple XML document of type <page>:
<source lang="xml">
<source lang="xml">
<?xml version="1.0"?>
  <page>
  <page>
   <title>Hello friend</title>
   <title>Hello friend</title>
Line 382: Line 231:
</source>
</source>


[[image:xml-edit-3.png|frame|none|Simple page DTD explained]]
Firstly it defines a page element that must include a title element, a content element, and optionally a comment element.
Second, each of these sub-elements can only include text data, i.e. no other text.
 
[[image:xml-edit-3.png|thumb|758px|none|Simple page DTD explained]]


=== Schemas for recipes ===
=== Schemas for recipes ===


Recipes are very popular in XML education. Let's first look at a quite simple example, originally published by Jay Greenspan (dead link)
Recipes are very popular in XML education.
 
'''Take one'''
Let's first look at a quite simple example, originally published by Jay Greenspan (dead link)
<source lang="xml">
<source lang="xml">
<?xml version="1.0"?>
<?xml version="1.0"?>
Line 406: Line 261:
</ingredients>
</ingredients>
<directions>
<directions>
Preheat oven to 350 degrees. Melt butter; combine with brown sugar and vanilla in large
Preheat oven to 350 degrees. Melt butter; combine with brown sugar and
mixing bowl. Set aside to cool. Combine flour, baking powder, and salt; set aside. Add
vanilla in large mixing bowl. Set aside to cool. Combine flour, baking
eggs to cooled sugar mixture; beat well. Stir in reserved dry ingredients, nuts, and
powder, and salt; set aside. Add eggs to cooled sugar mixture; beat
chips.
well. Stir in reserved dry ingredients, nuts, and chips. Spread in
Spread in greased 13-by-9-inch pan. Bake for 25 to 30 minutes until golden brown; cool.
greased 13-by-9-inch pan. Bake for 25 to 30 minutes until golden
Cut into squares.
brown; cool. Cut into squares.
</directions>
</directions>
</recipe>
</recipe>
Line 419: Line 274:


The DTD would look like this
The DTD would look like this
[[image:xml-edit-4.png|frame|none|A simple recipe DTD]]
[[image:xml-edit-4.png|thumb|758px|none|A simple recipe DTD]]
 
'''Take two'''


Below is half-filled in example of a sligthly more complex recipe list in XML.  
Below is half-filled in example of a slightly more complex recipe list in XML. As you can see, this example uses a more nested structure. For example, author, date, and version are children of a ''meta'' element. ''Directions'' includes a ''para'' element, i.e. a kind of formatting instruction which is meant to produce more legible text.


<source lang="xml">
<source lang="xml">
Line 445: Line 302:
     </ingredients>
     </ingredients>
     <directions>
     <directions>
     <para>Cut the vegies into little pieces. Then boil with water. Add some salt and pepper</para>
     <para>Cut the vegies into little pieces. Then boil with
    water. Add some salt and pepper</para>
     </directions>
     </directions>
   </recipe>
   </recipe>
Line 458: Line 316:
   list = a list of recipees
   list = a list of recipees
   recipee = container for a recipee
   recipee = container for a recipee
   meta = Metainformation: must include author of this file, date, version in this order
   meta = Metainformation: must include author of this file,  
        date, version in this order
   recipee_author = optional name of recipee author
   recipee_author = optional name of recipee author
   mail = title of meal
   mail = title of meal
Line 466: Line 325:


  <!ELEMENT list (recipe+)>
  <!ELEMENT list (recipe+)>
  <!ELEMENT recipe (meta, recipe_author?, recipe_name, meal, ingredients, directions)>
  <!ELEMENT recipe (meta, recipe_author?, recipe_name, meal,  
                  ingredients, directions)>
  <!ELEMENT meta (author, date, version)>
  <!ELEMENT meta (author, date, version)>
  <!ELEMENT version (#PCDATA)>
  <!ELEMENT version (#PCDATA)>
Line 487: Line 347:


<source lang="xml">
<source lang="xml">
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml version="1.0"?>


<!-- DTD to write simple stories
<!-- DTD to write simple stories
Line 523: Line 383:
'''Below is a short story'''
'''Below is a short story'''


<source lang="xml">
<source lang="xml" enclose="div">
<?xml version="1.0" encoding="ISO-8859-1" ?>
<?xml version="1.0"?>
<!DOCTYPE STORY SYSTEM "story-grammar.dtd">
<!DOCTYPE STORY SYSTEM "story-grammar.dtd">
<?xml-stylesheet href="story-grammar.css" type="text/css"?>
<?xml-stylesheet href="story-grammar.css" type="text/css"?>
<STORY xmlns:xlink="http://www.w3.org/1999/xlink">
<STORY xmlns:xlink="http://www.w3.org/1999/xlink">
   <title>The little Flexer</title>
   <title>The little Flexer</title>
   <context>Once upon a time, in a dark small office.</context>
   <context>Once upon a time, in a dark small office.</context>
   <problem>Kaspar was trying to learn Flex but didn't have a real project. He then decided that it would be a good idea to look at Data-Driven Controls. These are most useful in combination with an external datasources in XML format.</problem>
 
   <goal>So he decided how to write a mx:Tree application that imports XML data.</goal>
   <problem>Kaspar was trying to learn Flex but didn't have a real
  project. He then decided that it would be a good idea to look at
  Data-Driven Controls. These are most useful in combination with an
  external datasources in XML format.</problem>
    
<goal>So he decided how to write a mx:Tree application that imports
XML data.</goal>
   <THREADS>
   <THREADS>


Line 537: Line 405:
       <subgoal>He decided to play with a little example.</subgoal>
       <subgoal>He decided to play with a little example.</subgoal>
       <ATTEMPT>
       <ATTEMPT>
<action>So he went to see the LiveDocs and copied an example.</action>
<action>So he went to see the LiveDocs and copied an
example.</action>
       </ATTEMPT>
       </ATTEMPT>
       <result>The example worked but he didn't understand why since he didn't know about E4X.</result>
       <result>The example worked but he didn't understand why since he
      didn't know about E4X.</result>
     </EPISODE>
     </EPISODE>


     <EPISODE>
     <EPISODE>
       <subgoal>He then decided to learn e4X first.
       <subgoal>He then decided to learn e4X first.</subgoal>
</subgoal>
       <ATTEMPT>
       <ATTEMPT>
<action>
<action>
Reading 2-3 tutorials and creating a simple example only took 2-3 hours.
  Reading 2-3 tutorials and creating a simple example only took
  2-3 hours.
         </action>
         </action>
       </ATTEMPT>
       </ATTEMPT>
       <result>
       <result>
      He now understood how to write e4X code in Flex.
He now understood how to write e4X code in Flex.
 
      </result>
</result>
     </EPISODE>
     </EPISODE>
   </THREADS>
   </THREADS>
 
 
   <moral>Divide a problem into subproblems and you will get there ...</moral>
   <moral>Divide a problem into subproblems and you will get there ...</moral>
   <INFOS>
   <INFOS>
Line 564: Line 433:
</STORY>
</STORY>
</source>
</source>
Story grammar is text centric DTD. There it can be easily styled with CSS. You can look at the file
Story grammar is text centric DTD. There it can be easily styled with CSS. You can look at the file
[http://tecfa.unige.ch/guides/xml/examples/recit/story-grammar.xml story-grammar.xml] and also consult [http://tecfa.unige.ch/guides/xml/examples/recit/story-grammar.css story-grammar.css].
[http://tecfa.unige.ch/guides/xml/examples/recit/story-grammar.xml story-grammar.xml] and also consult [http://tecfa.unige.ch/guides/xml/examples/recit/story-grammar.css story-grammar.css].
Line 570: Line 438:
=== A simple family DTD ===
=== A simple family DTD ===


[[image:xml-edit-6.png|frame|none|Simple family DTD]]
[[image:xml-edit-6.png|thumb|758px|none|Simple family DTD]]


'''A valid XML file'''
'''A valid XML file'''
Line 590: Line 458:
  <!ELEMENT rss (channel)>
  <!ELEMENT rss (channel)>
  <!ATTLIST rss version CDATA #REQUIRED> <!-- must be "0.91"> -->
  <!ATTLIST rss version CDATA #REQUIRED> <!-- must be "0.91"> -->
  <!ELEMENT channel (title | description | link | language | item+ | rating? | image? | textinput? | copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | webMaster? | skipHours? | skipDays?)*>
  <!ELEMENT channel (title | description | link | language | item+ | rating? |  
          image? | textinput? | copyright? | pubDate? | lastBuildDate? |  
          docs? | managingEditor? | webMaster? | skipHours? | skipDays?)*>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT description (#PCDATA)>
  <!ELEMENT description (#PCDATA)>
Line 632: Line 502:
     <item>
     <item>
       <title>http://www.course.com/</title>
       <title>http://www.course.com/</title>
       <description>You can find Thomson text-books materials (exercise data) on this web site</description>
       <description>You can find Thomson text-books materials
      (exercise data) on this web site</description>
       <link>http://www.course.com/</link>
       <link>http://www.course.com/</link>
     </item>
     </item>
Line 674: Line 545:
|}
|}


== Understanding DTD entities ==
'''Understanding DTD entities'''
 
Most professional DTDs use entities. Entities are just symbols that contain some information which substitutes when the symbol is used ...


There exist tow kinds of entities: XML entities and DTD entities
Most professional DTDs use so-called entities. Entities are just symbols that contain some information which substitutes when the symbol is used.


'''DTD entities'''
'''DTD entities''': Some more complex DTD use the same structures all over. Instead of typing these several times one can use a ENTITY construction like this:
 
Some more complex DTD use the same structures all over. Instead of typing these several times one can use a ENTITY construction like this:


  <!ENTITY % Content "(Para | List | Listing)*">
  <!ENTITY % Content "(Para | List | Listing)*">
Line 697: Line 564:


... think of these entities as shortcuts.
... think of these entities as shortcuts.
Note: There also exist tow kinds of entities XML entities. XML entities allow to define an XML fragment of text and then to include it later.


== Choosing and using an XML Editor ==
== Choosing and using an XML Editor ==
Line 741: Line 610:
* '''Download''': [[http://code.google.com/p/exchangerxml/ Download at Google]] (multi-platform, needs [http://www.java.com/ java] to be installed first).
* '''Download''': [[http://code.google.com/p/exchangerxml/ Download at Google]] (multi-platform, needs [http://www.java.com/ java] to be installed first).


'''Hints for editing witch Exchanger'''
'''Hints for editing with Exchanger'''


To insert an element or attribute:
To insert an element or attribute:
Line 767: Line 636:
* Most XML editors are written in Java an rely on the "Java RunTime engine". Both websites of the recommended editors above give you a choice: Download an editor with or without Java. If you don't have Java installed on your own PC, I suggest taking it '''first''' from [http://www.java.com/ http://www.java.com/] ... and then always download the "no java vm" versions of the editor software
* Most XML editors are written in Java an rely on the "Java RunTime engine". Both websites of the recommended editors above give you a choice: Download an editor with or without Java. If you don't have Java installed on your own PC, I suggest taking it '''first''' from [http://www.java.com/ http://www.java.com/] ... and then always download the "no java vm" versions of the editor software
* To test if you have java, open a command terminal and type "Java". To open a command terminal under Windows: Start Menu -> Execute and then type "cmd".
* To test if you have java, open a command terminal and type "Java". To open a command terminal under Windows: Start Menu -> Execute and then type "cmd".
== Links ==
; About XML
* [[XML principles]]
* [http://en.wikipedia.org/wiki/XML XML] (Wikipedia)
* [http://www.w3schools.com/xml/ XML Tutorial] (W3Schools)
; About DTDs
* [http://en.wikipedia.org/wiki/Document_Type_Definition Document Type Definition] (Wikipedia)
* [http://www.w3schools.com/dtd/default.asp DTD tutorial] (W3Schools)
; Related subjects
* [http://en.wikipedia.org/wiki/XML_Schema_(W3C) XML Schema (W3C)] (Wikipedia)
* [http://en.wikipedia.org/wiki/RELAX_NG RELAX NG] (Wikipedia)
* [http://en.wikipedia.org/wiki/XML_Schema_Language_comparison XML Schema languages] (Wikipedia)
* [http://en.wikipedia.org/wiki/Schematron Schematron] (Wikipedia)


[[Category: XML]]
[[Category: XML]]

Latest revision as of 19:35, 22 August 2016

Introduction

This is a beginners tutorial for XML editing made from slides

Learning goals

  • Be able to somewhat understand Document Type Definition (DTD) schemas
  • Understand the necessity of using an XML editor
  • Be able to edit XML without hand-editing tags, profit from editors that have DTD and Schema awareness (most do not !)
  • Be able to check well-formedness and validate
  • Be able to fix well-formedness and validity errors
Prerequisites
Next steps

Recall of XML principles

Let us recall some principles that you also may have read in the XML principles article. In particular:

  1. An XML document is a hierarchical structure
  2. Syntax must be well-formed (all tags closed, etc.)
  3. Special XML characters like the < and the > must be dealt with in a special way
  4. Content may be validated by a schema (aka grammar)
  5. Often, more than one XML language appears in a document. In that case, so-called namespaces must be used

Defining XML languages

Many XML languages are defined with so-called schemas, i.e. some sort of grammars that define elements (tags) and attributes and how they can be combined. There exist several schema formalism. Other languages are defined with a simple textual description, e.g. the well-known RSS 0.9 syndication language. Often a language is defined using both schemas and text, e.g. HTML and SVG define the main structure with a DTD but add extra constraints for certain elements and attributes through simple descriptions. A good example would be measures. A length can be expressed in m, cm, in, pt, px, %, etc. and that cannot be defined with the simple DTD language.

There are four more or less popular schema languages:

(1) Document Type Definitions (DTDs)

  • DTD uses a terse formal syntax that declares in particular which elements and attributes may appear in a document and how they should be nested. On can define how elements can be nested within other elements, what attributes can be used within an element and finally it is possible to declare some very simple data types for attributes.

(2) XML Schema

  • XML Schema has the same purpose as DTDs, but allows to add additional constraints, e.g. you could require that and element should include only a number and that this number should be in the range of 1:10. XML Schemas are mostly used to describe complex document and data formats, e.g. e-learning standards or Microsoft "dotX" formats.

(3) Relax NG

  • is an XML Schema language that represents a sort of compromise between the simplicity of DTDs and the complexity of W3C XML Schema

(4) Schematron

  • is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It can be used in addition to other Schema languages.

Using DTDs (Document Type Definitions)

Principles

DTD grammars are just a set of rules that define:

  • a set of elements (tags) and their attributes;
  • how elements can be combined/embedded;
  • different sorts of entities (reusable fragments, special characters).

The most important part in a formal XML specification making use of DTDs, is usually the DTD. In addition, other constraints can be added ! In particular:

  • The DTD does not identify the root element ! You have to tell the users what elements can be root elements
  • Since DTDs can’t express data constraints, you may write out additional ones in a specification document
e.g. "the value of length attribute is a string composed of a number plus one of "inch", "em", "cm".
 <size length="10cm">

DTD file association with an XML file

XML grammars like DTDs and XML Schemas can be directly associated with an XML file. This way, the XML carries information about its content structure that allows any client to verify if it is valid.

A simple DTD example (file "page.dtd")

 <!ELEMENT page  (title, content, comment?)>
 <!ELEMENT title (#PCDATA)>
 <!ELEMENT content (#PCDATA)>
 <!ELEMENT comment (#PCDATA)>

The following XML document is a valid with respect to the grammar defined in "page.dtd" (just above)

 <?xml version="1.0"?>
 <!DOCTYPE page SYSTEM "page.dtd">
 <page>
  <title>Hello friend</title>
  <content>Here is some content :)</content>
  <comment>Written by Anonymous</comment>
 </page>

A DTD document contains just definition of rules .... nothing else (see later for explanations). The "page" DTD defines the following:

  • a page element, that must include a title followed by a content element and optionally a comment element.
  • the title, content and comment elements only can include tags, i.e. no other tags.

Specification of a markup language. Is a DTD enough ?

DTDs can’t define what the character data (element contents) and most attribute values should look like. For example, if you require that the user enters a number between 10 and 15 or the name of 15 different capitals, then you would have to use another formalism than DTD.

We introduce some of the DTD "language" below, but details are explained in the DTD tutorial. But let us now first systematically describe how a DTD file can be associated with an XML document.

Associating a DTD with an XML document

There are four ways of using a DTD with an XML file:

(1) No DTD

  • XML document will just be well-formed, or validation takes place in some other contexts, e.g. there exist tools that allow you to find out if a given XML document is valid with respect to a given DTD file)
 <?xml version="1.0" standalone="yes"?>
 <hello> Hello XML et hello cher lecteur ! </hello>


(2) DTD rules are defined inside the XML document

  • We get a "standalone" document (the XML document is self-sufficient)
  • Notice the use of brackets [....]
  <?xml version="1.0" standalone="yes"?>
  <!DOCTYPE hello [
     <!ELEMENT hello (#PCDATA)>
     ]>
  <hello> Hello XML et hello dear readers ! </hello>


(3) Private/System DTDs

  • the DTD is located on the system (own computer or the Internet).
  • That's what you are going to use when you write your own DTDs.
 <?xml version="1.0" ?>
 <!DOCTYPE hello SYSTEM "hello.dtd">
 <hello> This is a very simple XML document </hello>


(4) Public DTDs

  • We use a name for the DTD. This means that both your XML editor and user software know the DTD. This is the strategy used for common Web DTDs like XHTML, SVG, MathML, etc.
  • The naming convention also allows for a fallback URI that should include the physical DTD file.
 <?xml version="1.0" "?>
 <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
  "http://my.netscape.com/publish/formats/rss-0.91.dtd">
 <rss version="0.91">
 <channel> ...... </channel>
 </rss>

Syntax of the DTD declaration in the XML document

The syntax rules are fairly simple and can be understood from looking at the example above, and you may skip this section ....

(1) Every DTD declaration must start with

<!DOCTYPE .... >

(2) Then, the root element must be specified next. Remember that DTDs don’t know their root element, root is defined in the XML document ! DTDs must define this root element just like any other element ! In some cases, DTDs are meant to be used in different ways, i.e. several elements could be used as root elements.

 <!DOCTYPE hello .... >

(3) The next elements of the DTD declaration are different according to the DTD type (public or private)

(a) Syntax for internal DTDs (only !). DTD rules are inserted between brackets [ ... ]

 <!DOCTYPE hello [
  <!ELEMENT hello (#PCDATA)>
 ]>

(b) Syntax to define "private" external DTDs: The DTD is identified by the URL after the "SYSTEM" keyword

 <!DOCTYPE hello SYSTEM "hello.dtd">

Example using an URL

 <!DOCTYPE hello SYSTEM "http://tecfa.unige.ch/guides/xml/examples/simple/hello-page.dtd">

(c) Syntax for public DTDs: After the "PUBLIC" keyword you have to specify an official name and a backup URL that a validator could use. For example:

 <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
  "http://my.netscape.com/publish/formats/rss-0.91.dtd">

Understanding DTDs by example

Below we will present a few DTDs in increasing complexity.

Hello text with XML

Below is a simple XML document of type <page>:

<?xml version="1.0"?>
 <page>
  <title>Hello friend</title>
  <content>
       Here is some content :)
  </content> 
  <comment>
       Written by DKS/Tecfa, adapted from S.M./the Cocoon samples
  </comment>
 </page>

The following DTD could validate the document:

<!ELEMENT page (title, content, comment?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT content (#PCDATA)>
<!ELEMENT comment (#PCDATA)>

Firstly it defines a page element that must include a title element, a content element, and optionally a comment element. Second, each of these sub-elements can only include text data, i.e. no other text.

Simple page DTD explained

Schemas for recipes

Recipes are very popular in XML education.

Take one Let's first look at a quite simple example, originally published by Jay Greenspan (dead link)

<?xml version="1.0"?>
<!DOCTYPE list SYSTEM "simple_recipe.dtd">
<list>
<recipe>
<author>Carol Schmidt</author>
<recipe_name>Chocolate Chip Bars</recipe_name>
<meal>Dinner
<course>Dessert</course>
</meal>
<ingredients>
<item>2/3 C butter</item> <item>2 C brown sugar</item>
<item>1 tsp vanilla</item> <item>1 3/4 C unsifted all-purpose flour</item>
<item>1 1/2 tsp baking powder</item>
<item>1/2 tsp salt</item> <item>3 eggs</item>
<item>1/2 C chopped nuts</item>
<item>2 cups (12-oz pkg.) semi-sweet choc. chips</item>
</ingredients>
<directions>
Preheat oven to 350 degrees. Melt butter; combine with brown sugar and
vanilla in large mixing bowl. Set aside to cool. Combine flour, baking
powder, and salt; set aside. Add eggs to cooled sugar mixture; beat
well. Stir in reserved dry ingredients, nuts, and chips.  Spread in
greased 13-by-9-inch pan. Bake for 25 to 30 minutes until golden
brown; cool.  Cut into squares.
</directions>
</recipe>
</list>


The DTD would look like this

A simple recipe DTD

Take two

Below is half-filled in example of a slightly more complex recipe list in XML. As you can see, this example uses a more nested structure. For example, author, date, and version are children of a meta element. Directions includes a para element, i.e. a kind of formatting instruction which is meant to produce more legible text.

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE list SYSTEM "recipe-2.dtd">
<?xml-stylesheet href="recipe-2.css" type="text/css"?>
<list>
  <recipe>
    <meta>
      <author>Joe</author>
      <date></date>
      <version></version>
    </meta>
    <recipe_name>Vegetable soup</recipe_name>
    <meal>dinner</meal>
    <ingredients>
      <item>4 Carrots</item>
      <item>2 Onions</item>
      <item>Garlic</item>
      <itme>1/2 Cabbage</item>
      <item>Salt</item>
      <item>Pepper</item>
    </ingredients>
    <directions>
     <para>Cut the vegies into little pieces. Then boil with
     water. Add some salt and pepper</para>
    </directions>
  </recipe>
</list>

Contents of the DTD (simple_recipe.dtd)

<!-- Simple recipe DTD -->

<!-- This DTD will allow to write simple recipees
  list = a list of recipees
  recipee = container for a recipee
  meta = Metainformation: must include author of this file, 
         date, version in this order
  recipee_author = optional name of recipee author
  mail = title of meal
  ingredients = list of items you need
  directions = How to cook, may include either para's or bullet's.
-->

 <!ELEMENT list (recipe+)>
 <!ELEMENT recipe (meta, recipe_author?, recipe_name, meal, 
                  ingredients, directions)>
 <!ELEMENT meta (author, date, version)>
 <!ELEMENT version (#PCDATA)>
 <!ELEMENT date (#PCDATA)>
 <!ELEMENT author (#PCDATA)>
 <!ELEMENT recipe_author (#PCDATA)>
 <!ELEMENT recipe_name (#PCDATA)>
 <!ELEMENT meal (#PCDATA)>
 <!ELEMENT ingredients (item+)>
 <!ELEMENT item (#PCDATA)>
 <!ELEMENT directions (para | bullet)* >
 <!ELEMENT bullet (#PCDATA|strong)*>
 <!ELEMENT para (#PCDATA|strong)*>
 <!ELEMENT strong (#PCDATA)>

A simple story grammar

Let's present the grammar first

<?xml version="1.0"?>

<!-- DTD to write simple stories
     Made by Daniel K. Schneider / TECFA / University of Geneva
     VERSION 1.0
     30/10/2003
-->

<!ELEMENT STORY (title, context, problem, goal, THREADS, moral, INFOS)>
<!ATTLIST STORY xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink">

<!ELEMENT THREADS (EPISODE+)>
<!ELEMENT EPISODE (subgoal, ATTEMPT+, result) >
<!ELEMENT ATTEMPT (action | EPISODE) >
<!ELEMENT INFOS ( ( date | author | a )* ) >

<!ELEMENT title (#PCDATA) >
<!ELEMENT context (#PCDATA) >
<!ELEMENT problem (#PCDATA) >
<!ELEMENT goal (#PCDATA) >
<!ELEMENT subgoal (#PCDATA) >
<!ELEMENT result (#PCDATA) >
<!ELEMENT moral (#PCDATA) >
<!ELEMENT action (#PCDATA) >
<!ELEMENT date (#PCDATA) >
<!ELEMENT author (#PCDATA) >

<!ELEMENT a (#PCDATA)>
<!ATTLIST a
     xlink:href CDATA #REQUIRED
     xlink:type CDATA #FIXED "simple"
>


Below is a short story

<?xml version="1.0"?>
<!DOCTYPE STORY SYSTEM "story-grammar.dtd">
<?xml-stylesheet href="story-grammar.css" type="text/css"?>
<STORY xmlns:xlink="http://www.w3.org/1999/xlink">

  <title>The little Flexer</title>

  <context>Once upon a time, in a dark small office.</context>

  <problem>Kaspar was trying to learn Flex but didn't have a real
  project.  He then decided that it would be a good idea to look at
  Data-Driven Controls.  These are most useful in combination with an
  external datasources in XML format.</problem>
  
<goal>So he decided how to write a mx:Tree application that imports
XML data.</goal>
  <THREADS>

    <EPISODE>
      <subgoal>He decided to play with a little example.</subgoal>
      <ATTEMPT>
	<action>So he went to see the LiveDocs and copied an
	example.</action>
      </ATTEMPT>
      <result>The example worked but he didn't understand why since he
      didn't know about E4X.</result>
    </EPISODE>

    <EPISODE>
      <subgoal>He then decided to learn e4X first.</subgoal>
      <ATTEMPT>
	<action>
	  Reading 2-3 tutorials and creating a simple example only took
	  2-3 hours.
        </action>
      </ATTEMPT>
      <result>
	He now understood how to write e4X code in Flex.
      </result>
    </EPISODE>
  </THREADS>
  
  <moral>Divide a problem into subproblems and you will get there ...</moral>
  <INFOS>
    <a xlink:href="http://edutechwiki.unige.ch/en/ECMAscript_for_XML"
       xlink:type="simple">ECMAscript for XML</a>
  </INFOS>
</STORY>

Story grammar is text centric DTD. There it can be easily styled with CSS. You can look at the file story-grammar.xml and also consult story-grammar.css.

A simple family DTD

Simple family DTD

A valid XML file

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE family SYSTEM "family.dtd">
<family>
  <person name="Joe Miller" gender="male" 
          type="father" id="123.456.789"/>
  <person name="Josette Miller" gender="female" 
          type="girl" id="123.456.987"/>
</family>

RSS

RSS is a news syndication format. There are several RSS variants. RSS 0.91 is Netscape’s original (still being used)

 <!ELEMENT rss (channel)>
 <!ATTLIST rss version CDATA #REQUIRED> <!-- must be "0.91"> -->
 <!ELEMENT channel (title | description | link | language | item+ | rating? | 
           image? | textinput? | copyright? | pubDate? | lastBuildDate? | 
           docs? | managingEditor? | webMaster? | skipHours? | skipDays?)*>
 <!ELEMENT title (#PCDATA)>
 <!ELEMENT description (#PCDATA)>
 <!ELEMENT link (#PCDATA)>
 <!ELEMENT image (title | url | link | width? | height? | description?)*>
 <!ELEMENT url (#PCDATA)>
 <!ELEMENT item (title | link | description)*>
 <!ELEMENT textinput (title | description | name | link)*>
 <!ELEMENT name (#PCDATA)>
 <!ELEMENT rating (#PCDATA)>
 <!ELEMENT language (#PCDATA)>
 <!ELEMENT width (#PCDATA)>
 <!ELEMENT height (#PCDATA)>
 <!ELEMENT copyright (#PCDATA)>
 <!ELEMENT pubDate (#PCDATA)>
 <!ELEMENT lastBuildDate (#PCDATA)>
 <!ELEMENT docs (#PCDATA)>
 <!ELEMENT managingEditor (#PCDATA)>
 <!ELEMENT webMaster (#PCDATA)>
 <!ELEMENT hour (#PCDATA)>
 <!ELEMENT day (#PCDATA)>
 <!ELEMENT skipHours (hour+)>
 <!ELEMENT skipDays (day+)>

Possible XML document for RSS

 <?xml version="1.0" encoding="ISO-8859-1" ?>
 <!DOCTYPE rss SYSTEM "rss-0.91.dtd">
 <rss version="0.91">
   <channel>
     <title>Webster University</title>
     <description>Home Page of Webster University</description>
     <link>http://www.webster.edu</link>
     <item>
       <title>Webster Univ. Geneva</title>
       <description>Home page of Webster University Geneva</description>
       <link>http://www.webster.ch</link>
     </item>
     <item>
       <title>http://www.course.com/</title>
       <description>You can find Thomson text-books materials
       (exercise data) on this web site</description>
       <link>http://www.course.com/</link>
     </item>
   </channel>
 </rss>

Summary syntax of DTD element definitions

We will come back to this when we will learn how to write our own DTDs in the DTD tutorial (don’t worry too much about unexplained details ....)


syntax element
short explanation
Example
,
order of elements <!ELEMENT Name (First, Middle, Last)>
?
optional element MiddleName?
+
at least one element movie+
*
zero or more elements item*
|
pick one (or operator) economics|law
()
grouping construct (A,B,C)

Understanding DTD entities

Most professional DTDs use so-called entities. Entities are just symbols that contain some information which substitutes when the symbol is used.

DTD entities: Some more complex DTD use the same structures all over. Instead of typing these several times one can use a ENTITY construction like this:

<!ENTITY % Content "(Para | List | Listing)*">

Later in the DTD we then can have Element definitions like this:

<!ELEMENT Intro (Title, %Content; ) >
<!ELEMENT Goal (Title, %Content; ) >

The computer will then simply translate these into:

<!ELEMENT Intro (Title, (Para | List | Listing)*) >
<!ELEMENT Goal (Title, (Para | List | Listing)* ) >

... think of these entities as shortcuts.

Note: There also exist tow kinds of entities XML entities. XML entities allow to define an XML fragment of text and then to include it later.

Choosing and using an XML Editor

Requirements

There a lots of XML editors and there is no easy choice ! Depending on your needs you may choose a different editor:

  • To edit strongly structured data (i.e. data-centric XML) a sort of "tree" or "boxed" view is practical
  • To edit text-centric data (e.g. an article) you either want a text-processor like tool are a structure editor.
  • Really good XML editors cost a lot ...

Here is my own little advice with respect to XML editors (also read the XML editor article)

Minimal things your XML editor should be able to do

  • Check for XML well-formedness
  • Check for validity against several kinds of XML grammars (DTD, Relax NG, XML Schema)
  • Highlight errors (of all sorts)
  • Suggest available XML tags (in a given context). Also clearly show which ones are mandatory and which ones are optional, and display them in the right order.
  • Allow the user to move/split/join elements in a more or less ergonomic way (although it is admitted that these operations need some training)
  • Include support for XSLT and XQuery (However, if you have installation skills you can easily compensate lack of support by installing a processor like Saxon

We then suggest some additional criteria depending on the kind of XML

For data-centric XML:

  • Allow viewing and editing of XML documents in a tree view or boxed view (or both together)
  • Provide a context-dependent choice of XML tags and attributes (DTD/XSD awareness)

For text-centric XML:

  • Allow editing of XML documents in a structure view
  • Allow editing of XML documents in somewhat WYSIWYG view. Such a view can be based on an associated CSS (most common solution) or XSLFO (I am dreaming here) or use some proprietary format (which is not very practical for casual users!). Also allow users to switch on/off tags or element boundary markers.
  • Provide a context-dependent choice of XML tags and attributes (DTD/XSD awareness). The user should be able to right-click within the XML text and not in some distant tree representation.
  • Automatically insert all mandatory sub-elements when an element is created.
  • Automatically complete XML Tags when working without a DTD or other schema.
  • Indent properly (and assist users to indent single lines as well as the whole document)

Suggested free editors

Any XML editor is difficult to learn (because XML editing is not so easy). Please, make an effort to learn the interface, e.g. read the help !

(1) Exchanger XML Lite V3.3

If you are looking for a general purpose editor that is both DTD and Schema aware and that offers XSLT support, I suggest to try this editor first. Try others you are unhappy with it or if you plan to focus on a single kind of editing, e.g. just edit "data-centric" XML documents.

Hints for editing with Exchanger

To insert an element or attribute:

  • In the contents window press Ctrl-T to insert an element.
  • Pressing "<" in the editing window gives more options and you can do it in any place.
  • To insert an attribute, position the cursor after the element name and press the space bar
  • Alternatively (and better if you don't know your DTD): Select the Helper pane to the left. Then (in the editing window) click on the element tag you wish to edit or put your cursor in a location between child elements. The helper pane will then display the structure of the current parent element and list available elements on which you can click to insert.

Read more in Exchanger XML Editor

(2) XMLmind Standard Edition is another free editor XMLmind may be better choice if you plan to edit data-centric XML and/or if you like to work with "tree views". The free edition doesn't include XSLT processing. But you can do this with another tool (e.g. Exchanger lite or just a command line call)

Download: http://www.xmlmind.com/xmleditor/download.shtml (multi-platform, java-based)

Hints for editing with XMLmind

  • Element manipulation is trough the "tree view". After selecting an element you can insert elements either by selecting (tiny) before/after/within buttons in the top right elements pane
  • or use shortcuts: (ctrl-h = insert before, ctrl-i = insert within, ctrl-j = insert after). Same principle for the attributes pane.

Other Alternatives

  • If you plan to edit DTD-based text-centric XML, you also should have a look at the user-friendly epcEdit (windows/linux).
  • If you can't install or want to go through Java installation, consider XML Copy Editor
  • Programmers also may consider using a programmer’s editor. However make sure that there is an XML plugin, that the editor is "DTD aware" (can show elements to insert in a given context) and that it can validate. Otherwise forget it !!

About Java

  • Most XML editors are written in Java an rely on the "Java RunTime engine". Both websites of the recommended editors above give you a choice: Download an editor with or without Java. If you don't have Java installed on your own PC, I suggest taking it first from http://www.java.com/ ... and then always download the "no java vm" versions of the editor software
  • To test if you have java, open a command terminal and type "Java". To open a command terminal under Windows: Start Menu -> Execute and then type "cmd".

Links

About XML
About DTDs
Related subjects