DTD tutorial: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
Line 535: Line 535:


{| class="wikitable"   
{| class="wikitable"   
| Special elements
! Special elements
| Explanation
! Explanation
| DTD examples
! DTD examples
| XML example
! XML example
|-
|-
|  #PCDATA
|  #PCDATA
Line 745: Line 745:
| <!ENTITY explanation SYSTEM "project1a.xml">
| <!ENTITY explanation SYSTEM "project1a.xml">
|  <citation> &amp;explanation; </citation>
|  <citation> &amp;explanation; </citation>
| <citation> contents of project1a.xml ... </citation>
| <citation> .... contents of file project1a.xml inserted here ... </citation>
|}
|}


Line 777: Line 777:
</source>
</source>


=== Example 13 DTD entities to define reusable attribute definitions ===
'''Example of a DTD entity declration


* You may use the same procedure to define "bricks" for attribute definitions
The example below defines reusable attribute definitions.
* Entity example that defines part of an attribute definition


<source lang="xml">
<source lang="xml">
Line 802: Line 801:
  <!ATTLIST main %stamp; >
  <!ATTLIST main %stamp; >
</source>
</source>
You also can have a look at [http://tecfa.unige.ch/guides/xml/examples/dtd-examples/ePBL11/ibtwsh6.dtd ibtwsh6.dtd] which is a mini XHTML that shows good usage of entities. It's a professional made DTD that we use in production, for example in the "project document" [http://tecfa.unige.ch/guides/xml/examples/dtd-examples/ePBL11/ePBLproject11.dtd ePBLproject11.dtd] that we used in project-oriented teaching.


== Some advice for designing DTDs ==
== Some advice for designing DTDs ==

Revision as of 18:40, 5 November 2010

<pageby nominor="false" comments="false"/>

Introduction

This is a short tutorial about creating simple DTDs.

Objectives

  • Be able to read a DTD
  • Be able to create a simple DTD with nested elements, i.e. understand combination features
  • Be able to define attributes for elements
  • Know how to create and use DTD entities

Prerequisites

  • XML (Conceptual overview)
  • Tour de XML or équivalent (having seen some real world applications would be good for motivation)
  • Editing XML tutorial (optional, but strongly recommended)

Next steps

Executive overview

DTD grammars are a set of rules that define:

  1. a set of elements (tags) and their attributes that can be used to create an XML document;
  2. how elements can be embedded ;
  3. different sorts of entities (reusable fragments, special characters).

DTDs can't define content types, i.e. what text will go inside elements. Most attribute can't be typed either. For example, with a DTD one cannot specify that input should be a number from 0 to 100..

Specification of a markup language

  • The most important part of a DTD-based markup language is usually the DTD itself, but in addition other constraints can be added in a design document!
  • The DTD does not identify the root element ! You will have to tell the users what elements can be root elements.
  • Since DTDs can't express data constraints, write them out in a specification document, e.g. "the value of length attribute is a string composed of a number plus "cm" or "inch" or "em". Exemple
<size length="10cm">

Example 1: A simple DTD

 <!ELEMENT page  (title, content, comment?)>
 <!ELEMENT title (#PCDATA)>
 <!ELEMENT content (#PCDATA)>
 <!ELEMENT comment (#PCDATA)>

A DTD document contains just rules .... nothing else (see later for explanations)

Declaring and using a DTD with an XML document

DTD stands for Document Type Definition. A DTD is a set of rules that constitute a grammar (also called schema) that defines the so-called XML application. For example, the file DTD/xhtml1-transitional.dtd available at through the XHTML 1.0 specification formally defines the grammar for the XHTML language.

Before we learn how to create our own grammars, let's have a look on how to use DTDs.

Document type declarations

  • A valid XML document usually includes a declaration that specifies the DTD
  • the DTD is declared on top of the file after the XML declaration.
  • XML declarations, DTD declaration etc. are part of the prologue
Example XML contents with a DTD declaration
 <?xml version="1.0" ?>
 <!DOCTYPE hello SYSTEM "hello.dtd">

 <hello>Here we <strong>go</strong> ... </hello>
There are four ways of using a DTD

(1) No DTD (XML document will just be well-formed) (2) DTD rules are defined inside the XML document

  • We get a "standalone" document (the XML document is self-sufficient)

(3) "Private/System" DTDs, the DTD is located on the system (own computer or the Internet). That's what you are going to use when you write your own DTDs.

 <!DOCTYPE hello SYSTEM "hello.dtd">

(4) Public DTDs, we use a name for the DTD. This means that both your XML editor and user software know the DTD. This is the strategy used for common Web DTDs like XHTML, SVG, MathML, etc.

Syntax of the DTD declaration in the XML document

A DTD declaration starts with the keyword "DOCTYPE ... followed by the root element:

Remember that DTDs don't know their root element, root is defined in the XML document ! Also note that DTDs must define this root element just like any other element ! (you can have more than one)
  <!DOCTYPE ''hello'' .... >
... followed by the DTD definition or a reference to a DTD file as you can see below.

There exist a few alternatives

Syntax for internal DTDs
DTD rules can be inserted within an XML document between brackets [ ... ]
    <!DOCTYPE hello ['
        <!ELEMENT hello (#PCDATA)>
        ]>
That technique is very rarely used, since the user will have to copy the DTD lines from one document to another for reusing it.
Syntax to define "private" external DTDs
Private DTDs are the ones you would create
The DTD is identified by the URL after the " SYSTEM " keyword
 <!DOCTYPE hello '' SYSTEM "hello.dtd"'' >
Syntax for public DTDs
Public DTDs are DTDs that are intended for larger use
After the " PUBLIC " keyword on has to specify an official name and a backup URL that a validator could use.
Anyone can create an "official" names, but there exit rules that we will not explain here.
 <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
  "http://my.netscape.com/publish/formats/rss-0.91.dtd"'' >

Sample XML documents with DTD declarations

Example 2 - Hello XML without DTD
 <?xml version="1.0" standalone="yes"?>
 <hello> Hello XML et hello cher lecteur ! </hello>
Example 3 - Hello XML with an internal DTD
  <?xml version="1.0" standalone="yes"?>
  <!DOCTYPE hello [
     <!ELEMENT hello (#PCDATA)>
     ]>

  <hello> Hello XML et hello dear readers ! </hello>
Example 4 - Hello XML with an external DTD
That's what you should with your own home-made DTDs
 <?xml version="1.0" ?>
 <!DOCTYPE hello SYSTEM "hello.dtd">
 <hello> This is a very simple XML document </hello>
Example 5 - XML with a public external DTD (RSS 0.91)
 <?xml version="1.0" "?>
 <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
  "http://my.netscape.com/publish/formats/rss-0.91.dtd">
 <rss version="0.91">
 <channel> ...... </channel>
 </rss>

Understanding DTDs by example

Recall that DTDs define all the elements and attributes and the way they can be combined

Example 6: Hello text with XML

A simple XML document of type <page>
 <page>
  <title>Hello friend</title>
  <content>Here is some content :)</content>
  <comment>Written by DKS/Tecfa, adapted from S.M./the Cocoon samples</comment>
 </page>
A DTD that would validate this "page" document

Xml-intro-edit-6.png

Example 7: A recipe list in XML

  • Source: Introduction to XML by Jay Greenspan (now dead URL)
  <?xml version="1.0"?>
  <!DOCTYPE list SYSTEM "simple_recipe.dtd">
  <list>
  <recipe>
    <author>Carol Schmidt</author>
    <recipe_name>Chocolate Chip Bars</recipe_name>
    <meal>Dinner</meal>
    <ingredients>
      <item>2/3 C butter</item>      <item>2 C brown sugar</item>
      <item>1 tsp vanilla</item>     <item>1 3/4 C unsifted all-purpose flour</item>
      <item>1 1/2 tsp baking powder</item>
      <item>1/2 tsp salt</item>      <item>3 eggs</item>
      <item>1/2 C chopped nuts</item>
      <item>2 cups (12-oz pkg.) semi-sweet choc. chips</item>
    </ingredients>
    <directions>
 Preheat oven to 350 degrees. Melt butter; combine with brown sugar and vanilla in  large mixing bowl. Set aside to cool. Combine flour, baking powder, and salt; set  aside. Add eggs to cooled sugar mixture; beat well. Stir in reserved dry  ingredients, nuts, and chips. Spread in greased 13-by-9-inch pan.
 Bake for 25 to 30 minutes until golden brown; cool.  Cut into squares.
    </directions>
  </recipe>
 </list>
Contents of the DTD

Xml-intro-edit-7.png

Example 8: A simple story grammar

 <?xml version="1.0" "?>
 <!-- DTD to write simple stories
      Made by Daniel K. Schneider / TECFA / University of Geneva
      VERSION 1.0
      30/10/2003 -->
 <!ELEMENT STORY (title, context, problem, goal, THREADS, moral, INFOS)>
 <!ATTLIST STORY xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink">
 <!ELEMENT THREADS (EPISODE+)>
 <!ELEMENT EPISODE (subgoal, ATTEMPT+, result) >
 <!ELEMENT ATTEMPT (action | EPISODE) >
 <!ELEMENT INFOS ( ( date | author | a )* ) >
 <!ELEMENT title (#PCDATA) >
 <!ELEMENT context (#PCDATA) >
 <!ELEMENT problem (#PCDATA) >
 <!ELEMENT goal (#PCDATA) >
 <!ELEMENT subgoal (#PCDATA) >
 <!ELEMENT result (#PCDATA) >
 <!ELEMENT moral (#PCDATA) >
 <!ELEMENT action (#PCDATA) >
 <!ELEMENT date (#PCDATA) >
 <!ELEMENT author (#PCDATA) >
 <!ELEMENT a (#PCDATA)>
 <!ATTLIST a
      xlink:href CDATA #REQUIRED
      xlink:type CDATA #FIXED "simple">
Here is a valid skeleton
  <?xml version="1.0" " ?>
  <!DOCTYPE STORY SYSTEM "story-grammar.dtd">
  <?xml-stylesheet href="story-grammar.css" type="text/css"?>
  <STORY>
   <title>The little XMLer</title>
   <context></context>
  <problem></problem>
  <goal></goal>
  <THREADS>
    <EPISODE>
      <subgoal>I have to do it ...</subgoal>
      <ATTEMPT>
        <action></action>
      </ATTEMPT>
      <result></result>
    </EPISODE>
  </THREADS>
  <moral></moral>
  <INFOS>
  </INFOS>
 </STORY>

The picture gives some extra information

Xml-intro-edit-8.png

Example 9: Lone family DTD

Xml-intro-edit-9.png

A valid XML file
 <?xml version="1.0" ?>
 <!DOCTYPE family SYSTEM "family.dtd">
 <family>
   <person name="Joe Miller" gender="male"
           type="father" id="123.456.789"/>
   <person name="Josette Miller" gender="female"
           type="girl" id="123.456.987"/>
 </family>

Example 10: RSS

  • There are several RSS standards. RSS 0.91 is Netscape's original (still being used)
 <!ELEMENT rss (channel)>
 <!ATTLIST rss version CDATA #REQUIRED>
  <!-- must be "0.91"> -->
  <!ELEMENT channel (title | description | link | language | item  | rating? | image? | textinput? | 
               copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | 
               webMaster? | skipHours? | skipDays?)*>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT description (#PCDATA)>
  <!ELEMENT link (#PCDATA)>
  <!ELEMENT image (title | url | link | width? | height? | description?)*>
  <!ELEMENT url (#PCDATA)>
  <!ELEMENT item (title | link | description)*>
  <!ELEMENT textinput (title | description | name | link)*>
  <!ELEMENT name (#PCDATA)>
  <!ELEMENT rating (#PCDATA)>
  <!ELEMENT language (#PCDATA)>
  <!ELEMENT width (#PCDATA)>
  <!ELEMENT height (#PCDATA)>
  <!ELEMENT copyright (#PCDATA)>
  <!ELEMENT pubDate (#PCDATA)>
  <!ELEMENT lastBuildDate (#PCDATA)>
  <!ELEMENT docs (#PCDATA)>
  <!ELEMENT managingEditor (#PCDATA)>
  <!ELEMENT webMaster (#PCDATA)>
  <!ELEMENT hour (#PCDATA)>
  <!ELEMENT day (#PCDATA)>
  <!ELEMENT skipHours (hour )>
  <!ELEMENT skipDays (day )>
Possible XML document for RSS
  <?xml version="1.0" " ?>
  <!DOCTYPE rss SYSTEM "rss-0.91.dtd">
  <rss version="0.91">
     <channel>
       <title>Webster University</title>
       <description>Home Page of Webster University</description>
       <link>http://www.webster.edu</link>
       <item>
	 <title>Webster Univ. Geneva</title>
	 <description>Home page of Webster University Geneva</description>
	 <link>http://www.webster.ch</link>
       </item>
       <item>
	 <title>http://www.course.com/</title>
	 <description>You can find Thomson text-books materials (exercise data) on this web site</description>
	 <link>http://www.course.com/</link>
       </item>
     </channel>
   </rss>

Definition of elements

Les recall what DTD grammars are supposed to do. DTDs define

  1. a set of elements (tags) and their attributes that can be used to create an XML document;
  2. and define how elements can be embedded;

In addition, a DTD may define different sorts of entities (reusable fragments, special characters) and attribute types for elements.

Syntax of a DTD rule to define elements:

 <!ELEMENT tag_name (child_element_specification) >

child_element_specification may contain:

  • A combination of child elements according to combination rules that we will introduce below.
 <!ELEMENT page  (title, content, comment?)>
  • Mixed contents, i.e. child elements mixed with data (#PCDATA)
 <!ELEMENT para (strong | #PCDATA )*>
  • #PCDATA (Just data)
 <!ELEMENT title (#PCDATA)>
  • ANY (only used during development)
 <!ELEMENT para (ANY)*>
  • EMPTY (the element has no contents)
 <!ELEMENT person EMPTY>

Tag names

Each tag name must start with a letter or an underscore ('_')
followed by letters, numbers or the following characters: '_' , '-', '.', ':'

BAD example:
  <!ELEMENT 1st ...>
BAD example:

 <!ELEMENT My Home ...>

Combination rules

By using the combination rules an information architect (you!) defines how elements can be combined, i.e. mandatory and optional child elements, order of elements, and repetition.

Each and every element in your language must be defined with an <!ELEMENT > rule


A and B = tags Explanation DTD examples XML examples
A , B A followed by B

Elements in that order

<!ELEMENT person ( name ,email? )>
<!ELEMENT Name (First, Middle, Last)>
 <person>
   <name>Joe</name>
   <email>x@x.x</email>
 </person>
 <Name>
   <First>D.</First><Middle>K.</Middle><Last>S.</Last>
 </Name>
A? A is optional,

(it can be present or absent)

<!ELEMENT person (name, email? )>
<!ELEMENT Name (First,Middle?,Last)>
 <person>
  <name>Joe</name>
</person>
 <Name><First>D.</First><Last>S.</Last></Name>
A At least one A <!ELEMENT person (name, email+ )>
<!ELEMENT list (movie+)
 <person> <name>Joe</name>
   <email>x@x.x</email></person>
 <person> <name>Joe</name>
   <email>x@x.x</email>
   <email>x@y.x</email>
 </person>
 <list>
    <movie>Return of ...</movie>
    <movie>Comeback of ...</movie> 
 </list>
A* Zero, one or several A <!ELEMENT person (name, email* )>
<!ELEMENT list (item*)
 <person>
   <name>Joe</name>
  </person>
  <list>
   <item>Return of ...</item>
  </list>
B Either A or B fax )>
<!ELEMENT major (economics | law)>
 <person> <name>Joe</name>
   <email>x@x.x</email>
 </person>
 <person> <name>Joe</name>
   <fax>123456789</fax>
 </person>
 <major> 
   <economics> </economics>
 </major>
(A, B) Parenthesis will group and you can apply the above combination rules to the whole group list | title)*>
  <text>
    <title>Story</title>
    <para>Once upon a time</para> 
    <title>The awakening</title> 
    <list> ... </list>
 </text>

Contents of DTD elements

Inside an element we either can find child elements only, data only or mixed data plus elements, and finally nothing.

Examples

  • Child elements only:
<ul><li></li><li></li><li></li></ul>
  • Data only:
<para>Here we go</para>
  • Mixed data:
<para>Here we go <bold>fast</bold></para>
  • Empty:
<newline/>
Special elements Explanation DTD examples XML example
#PCDATA "Parsed Character Data"

Text contents of an element. It should not contain any <,>,& etc.

<!ELEMENT email (#PCDATA)>
 <email>Daniel.Schneider@nowhere.org</email>
ANY Allows any non-specified child elements and parsed character data

(avoid this !!!)

<!ELEMENT person ANY>
 <person>
   <c>text</c>
   <a>some <b>bbb</b>inside</a>
 </person>
EMPTY No contents <!ELEMENT br EMTPY> <br/>

Mixed element contents contain both text and tags (elements)

Example:

 <para> here is a <a href="xx">link</a>. <b>Check</b> it out </para>

To allow for these mixed contents, one must use the "|" construct as shown in the good examples below.

Good examples
 <!ELEMENT para (#PCDATA|a|ul|b|i|em)* >  
 <!ELEMENT p (#PCDATA | a | abbr | acronym | br | cite | code | dfn | em | img | kbd |
           q | samp | span | strong | var )* >  
 <!ELEMENT par (#PCDATA | %font; | %phrase; | %special; | %form;)* >
Bad examples
 <!ELEMENT p (name, first_name, #PCDATA)*>
 <!ELEMENT p ( (#PCDATA) |a|ul|b|i|em)*>

Defining attributes

Usually an object is defined in terms of its properties and its contents. Translated to XML, objects are usually represented as elements and its properties as attributes. However properties also could be expressed with child elements and the other way round...

Rough syntax of Attribute rules:

<!ATTLIST element_name attr_name Attribute_type Type_Definition Default >

Here are some XML elements with attributes

<img src="picture.png"/>
<person name="Joe Miller" gender="male" type="father" id="N123456789"/>
<a href="http://no.com/bla">no company</a>

When we define such attributes we always start with the same pattern:

 <!ATTLIST name_of_element name_of_attribute ... >

for example

 <!ATTLIST person name ...>

The we must provide and attibute type with a keyword. There exist six types: normal text, single words, an ID, a (or more) reference(s) to an ID of another attribute, and a list of values. These types are summarized in the followin table.

Keyword Attribute types
CDATA "Character Data" - Text data
NMTOKEN A single word (no spaces or punctuations)
ID Unique identifier of the element.
IDREF Reference to an identifier.
IDREFS Reference to one or more identifiers
B|C|..) List of values (from which the user must choose)

Finally, we have to decide whether an attribute is mandatory, optional or fixed.

Keyword Type Definition
#IMPLIED Attribute is optional
#REQUIRED Attribute is mandatory
#FIXED Value Attribute has a fixed value (user can't change it)

Below are some illustrations of valid attribute definitions

DTD rule example XML
<!ATTLIST person first_name CDATA #REQUIRED> <person first_name="Joe">
female) #IMPLIED> <person gender="male">
<!ATTLIST form method CDATA #FIXED "POST"> <form method="POST">
ordered) "ordered"> <list type="bullets">
sister) #REQUIRED> <sibling type="brother">
<!ATTLIST person id ID #REQUIRED> <person id="N1004">

Shortcut to define multiple attributes for an element:

<!ATTLIST target_tag
         attr1_nom TypeAttribut TypeDef Defaut
         attr2_nom TypeAttribut TypeDef Defaut
...>
Shortcut illustrations
 <!ATTLIST person ident ID #REQUIRED 
      gender male|female) #IMPLIED
      nom CDATA #REQUIRED 
      prenom CDATA #REQUIRED   
      relation  brother|sister) #REQUIRED >  
 <!ATTLIST portable owner IDREF #REQUIRED >

Example: Lone family DTD (file family.dtd)

Xml-intro-edit-10.png

A valid family XML file
 <?xml version="1.0" ?>
 <!DOCTYPE family SYSTEM "family.dtd">
 <family>
   <person name="Joe Miller" gender="male"
           type="father" id="N123456789"/>
   <person name="Josette Miller" gender="female"
           type="girl" id="N123456987"/>
 </family>

Entities

General entities

Consider entities as abbreviations for some other content. An entity must be defined in the DTD and its contents are substituted when encountered in the XML file. Then, recall that XML initially only defines 5 entities and that HTML does many more...

  • Use the &lt; &amp; &gt; &aquot; &apos; entities to refer to <, &, >," and

Syntax of an internal entity definition: <!ENTITY entity_name "content">

Syntax of an external entity definition: <!ENTITY entity_name SYSTEM URI>

Syntax of using an entity: &entity_name;

Illustrations of entity definitions
DTD rule XML example Result
<!ENTITY jt "Joe Test"> <para> &jt; is here<para> <para> Joe Test is here</para>
<!ENTITY space "&#160;"> &#160;
<!ENTITY copyright "&#xA9;"> &copyright; D. Schneider © D. Schneider
<!ENTITY explanation SYSTEM "project1a.xml"> <citation> &explanation; </citation> <citation> .... contents of file project1a.xml inserted here ... </citation>

Parameter entities

Most professional DTDs use parameter entities that are used to simplify DTD writing. More complex DTD often use same structures all over. Instead of typing these several times for each element definition, one can use an ENTITY construction like these:

 <!ENTITY  % entity_name "content">
 <!ENTITY  % entity_name SYSTEM "URI">

Example DTD entities to define reusable child elements

 <!ENTITY % Content "(Para | List | Listing)*">

Later in the DTD we then can have element definitions like this:

 <!ELEMENT Intro (Title, %Content; ) >
 <!ELEMENT Goal (Title, %Content; ) >

The XML parser will then simply translate these %Content; and we get:

 <!ELEMENT Intro (Title, (Para | List | Listing)*) >
 <!ELEMENT Goal (Title, (Para | List | Listing)* ) >

Example of a DTD entity declration

The example below defines reusable attribute definitions.

 <!ENTITY % stamp '
   id ID #IMPLIED
   creation-day NMTOKEN #IMPLIED
   .......
   mod-by NMTOKEN #IMPLIED
   version NMTOKEN #IMPLIED
   status (draft|final|obsolete) #IMPLIED
   approval (ok|not-ok|so-so) #IMPLIED
   main-author CDATA #IMPLIED
   '>

ATTLIST definitions below use %stamp;

 <!ELEMENT main-goal (title, content, (after-thoughts)?, (teacher-comments)?)>
 <!ATTLIST main %stamp; >
 <!ELEMENT title (...)>
 <!ATTLIST main %stamp; >

You also can have a look at ibtwsh6.dtd which is a mini XHTML that shows good usage of entities. It's a professional made DTD that we use in production, for example in the "project document" ePBLproject11.dtd that we used in project-oriented teaching.

Some advice for designing DTDs

General advice

Don't forget elements and be liberal
  • Each element needs to be defined, but only once !
  • Only make elements mandatory if they really are wanted, else use e.g. element  ?
Plan the global structure
  • Before you start writing out DTDs, use some simple "language" to draft the structure, e.g. use a notation like:
name   ==>  family given
family ==> "text"
  • In most cases, each "object" of your "information domain" becomes an element
  • Use child elements to model components
  • Use attributes to describe properties of components
Start from the root element and work your way down
  1. Root element
  2. Child elements of root element
  3. Child elements of the other elements, etc.

Attributes vs. Elements

  • There are some design rules that may help you decide whether using an element or an attribute
  • In case of doubt, always use elements ...
Rather use child elements inside an element to represent an information block
  • if order is important (attributes can't be ordered)
  • if you plan to use the same kind of information block with different parents
  • if a future version of DTD may specify sub-components of an information block
  • if the information block represents a "thing" (an object in OO programming)
  • if the DTD is text-centric, because an author must see contents she/he edits and attributes are often hidden away in XML editors; only use attributes to qualify properties like style !
Rather use attributes to represent an information block
  • if an attribute refers to an other element:
<pet_of owner_name="lisa" pet_type="cat"> would refer to <animal category="cat">
  • to declare usage/type/etc. of an element:
 <address usage="prof"> ... </address>
  • if you wish to list all possible values a user can enter
  • if you want to restrict data type of the attribute value (e.g. require a single word)