DTD tutorial

The educational technology and digital learning wiki
Jump to navigation Jump to search

Introduction

This is a short tutorial about DTDs. It brievely shows how to read DTDs, then how to create these. (Actually, just a cut/paste from some slides, needs some editing - 16:51, 23 April 2007 (MEST))

DTD grammars are a set of rules that define:

  • a set of elements (tags) and their attributes that can be used to create an XML document;
  • how elements can be embedded ;
  • different sorts of entities (reusable fragments, special characters).
  • DTDs can't define what the contents look like, i.e. character data (element contents) and most attribute values.

Specification of a markup language

  • The most important part is usually the DTD, but in addition other constraints can be added !
  • The DTD does not identify the root element !
    • you have to tell the users what elements can be root elements
  • Since DTDs can't express data constraints, write them out in a specification document
    • e.g. "the value of length attribute is a string composed of a number plus "cm" or "inch" or "em"
 example: <size length="10cm">

Example 1: A simple DTD

<!ELEMENT page  (title, content, comment?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT content (#PCDATA)>
<!ELEMENT comment (#PCDATA)>
  • A DTD document contains just rules .... nothing else (see later for explanations)


Using a DTD with an XML document

A. Document type declarations

  • A valid XML document includes a declaration that specifies the DTD used
  • DTD is declared on top of the file after the XML declaration.
  • XML declarations, DTD declaration etc. are part of the prologue
  • So: The <!DOCTYPE...> declaration is part of the XML file, not the DTD ....

Example:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE hello SYSTEM "hello.dtd">
<hello>Here we <strong>go</strong> ... </hello>

4 ways of using a DTD

  1. No DTD (XML document will just be well-formed)
  2. DTD rules are defined inside the XML document
    • We get a "standalone" document (the XML document is self-sufficient)
  3. "Private/System" DTDs, the DTD is located on the system (own computer or the Internet)
    • ... that's what you are going to use when you write your own DTDs
<!DOCTYPE hello SYSTEM "hello.dtd">
  1. Public DTDs, we use a name for the DTD.
    • means that both your XML editor and user software know the DTD
    • strategy used for common Web DTDs like XHTML, SVG, MathML, etc.

B. Syntax of the DTD declaration in the XML document

  • A DTD declaration starts with the keyword "DOCTYPE":
  <!DOCTYPE ....  >
  • ... followed by the root element
    • Remember that DTDs don't know their root element, root is defined in the XML document !
    • Note: DTDs must define this root element just like any other element ! (you can have more than one)
 <!DOCTYPE hello .... >
  • ... followed by the DTD definition or a reference to a DTD file

Syntax for internal DTDs (only !)

    • DTD rules are inserted between brackets [ ... ]
   <!DOCTYPE hello [
       <!ELEMENT hello (#PCDATA)>
       ]>

Syntax to define "private" external DTDs:

    • DTD is identified by the URL after the " SYSTEM " keyword
<!DOCTYPE hello  SYSTEM "hello.dtd" >

Syntax for public DTDs:

  • after the " PUBLIC " keyword you have to specify an official name and a backup URL that a validator could use.
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
 "http://my.netscape.com/publish/formats/rss-0.91.dtd"
>

C. Some examples of XML documents with DTD declarations:

Example 2: Hello XML without DTD

<?xml version="1.0" standalone="yes"
?>
<hello> Hello XML et hello cher lecteur ! </hello>

Example 4-3: Hello XML with an internal DTD

<?xml version="1.0" standalone="yes"
?>
<!DOCTYPE hello [
   <!ELEMENT hello (#PCDATA)>
   ]>
<hello> Hello XML et hello ch�re lectrice ! </hello>

Example 4: Hello XML with an external DTD

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE hello SYSTEM "hello.dtd">
<hello> This is a very simple XML document </hello>
  • That's what you should with your own home-made DTDs

Example 4-5: XML with a public external DTD (RSS 0.91)

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
 "http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91">
<channel> ...... </channel>
</rss>

4.2 Understanding DTDs by example

  • Recall that DTDs define all the elements and attributes and the way they can be combined

Example 6: Hello text with XML

A simple XML document of type <page>

<page>
 <title>Hello friend</title>
 <content>Here is some content :)</content>
 <comment>Written by DKS/Tecfa, adapted from S.M./the Cocoon samples</comment>
</page>

A DTD that would validate the document

Xml-intro-edit-6.png

Example 4-7: A recipe list in XML

  • init/simple_recipe.xml
  • Source: Introduction to XML by Jay Greenspan (now dead URL)
 <?xml version="1.0"?>
 <!DOCTYPE list SYSTEM "simple_recipe.dtd">
 <list>
 <recipe>
   <author>Carol Schmidt</author>
   <recipe_name>Chocolate Chip Bars</recipe_name>
   <meal>Dinner</meal>
   <ingredients>
     <item>2/3 C butter</item>      <item>2 C brown sugar</item>
     <item>1 tsp vanilla</item>     <item>1 3/4 C unsifted all-purpose flour</item>
     <item>1 1/2 tsp baking powder</item>
     <item>1/2 tsp salt</item>      <item>3 eggs</item>
     <item>1/2 C chopped nuts</item>
     <item>2 cups (12-oz pkg.) semi-sweet choc. chips</item>
   </ingredients>
   <directions>
Preheat oven to 350 degrees. Melt butter; combine with brown sugar and vanilla in large mixing bowl. Set aside to cool.  Combine flour, baking powder, and salt; set aside. Add eggs to cooled sugar mixture; beat well. Stir in reserved dry  ingredients, nuts, and chips.
Spread in greased 13-by-9-inch pan. Bake for 25 to 30 minutes until golden brown; cool.  Cut into squares.
   </directions>
 </recipe>
</list>

Contents of the DTD

Xml-intro-edit-7.png

Example 4-8: A simple story grammar

Xml-intro-edit-8.png

Example 4-9: Lone family DTD

Xml-intro-edit-9.png

A valid XML file

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE family SYSTEM "family.dtd">
<family>
  <person name="Joe Miller" gender="male"
          type="father" id="123.456.789"/>
  <person name="Josette Miller" gender="female"
          type="girl" id="123.456.987"/>
</family>

Example 4-10: RSS

  • complex/rss-0-92.dtd
  • There are several RSS standards. RSS 0.91 is Netscape's original (still being used)
<!ELEMENT rss (channel)>
<!ATTLIST rss version CDATA #REQUIRED>
 <!-- must be "0.91">
 -->

 <!ELEMENT channel (title | description | link | language | item  | rating? | image? | textinput? | copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | webMaster? | skipHours? | skipDays?)*>
 <!ELEMENT title (#PCDATA)>
 <!ELEMENT description (#PCDATA)>
 <!ELEMENT link (#PCDATA)>
 <!ELEMENT image (title | url | link | width? | height? | description?)*>
 <!ELEMENT url (#PCDATA)>
 <!ELEMENT item (title | link | description)*>
 <!ELEMENT textinput (title | description | name | link)*>
 <!ELEMENT name (#PCDATA)>
 <!ELEMENT rating (#PCDATA)>
 <!ELEMENT language (#PCDATA)>
 <!ELEMENT width (#PCDATA)>
 <!ELEMENT height (#PCDATA)>
 <!ELEMENT copyright (#PCDATA)>
 <!ELEMENT pubDate (#PCDATA)>
 <!ELEMENT lastBuildDate (#PCDATA)>
 <!ELEMENT docs (#PCDATA)>
 <!ELEMENT managingEditor (#PCDATA)>
 <!ELEMENT webMaster (#PCDATA)>
 <!ELEMENT hour (#PCDATA)>
 <!ELEMENT day (#PCDATA)>
 <!ELEMENT skipHours (hour )>
 <!ELEMENT skipDays (day )>

Possible XML document for RSS

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE rss SYSTEM "rss-0.91.dtd">
<rss version="0.91">
   <channel>
     <title>

Webster University</title>

     <description>

Home Page of Webster University</description>

     <link>

http://www.webster.edu</link>

     <item>
       <title>

Webster Univ. Geneva</title>

       <description>

Home page of Webster University Geneva</description>

       <link>

http://www.webster.ch</link>

     </item>
     <item>
       <title>

http://www.course.com/</title>

       <description>

You can find Thomson text-books materials (exercise data) on this web site</description>

       <link>

http://www.course.com/</link>

     </item>
   </channel>
 </rss>

Summary syntax of element definitions

  • The purpose of this table is not to teach you how to write DTDs
  • To understand how to use DTDs, you just need to know how to read a DTD

syntax element

short explanation

Example Element definitions Valid XML example

,

    • elements in that order

<!ELEMENT Name (First, Middle, Last)>

    • Element Name must contain First, Middle and Last
<Name>
  <First>D.</First><Middle>K.</Middle><Last>S.</Last>
</Name>

?

    • optional element

<!ELEMENT Name (First,Middle?,Last)>

Middle is optional
<Name><First>D.</First><Last>S.</Last></Name>
    • at least one element

<!ELEMENT list (movie )

<list><movie>Return of ...</movie>
      <movie>Comeback of ...</movie> </list>

*

    • zero or more elements

<!ELEMENT list (item*)

almost as above, but list can be empty

|

    • pick one (or operator)

<!ELEMENT major (economics | law)

<major> <economics> </economics> </major>

()

    • grouping construct, e.g. one can add ? or * or to a group.

<!ELEMENT text (para | list | title)*

<text>
<title>Story</title><para>Once upon a time</para> <title>The awakening</title> <list> ... </list>
</text>

Defining elements

5.1 Definition of elements

Rough syntax of a DTD rule to define elements:

<!ELEMENT tag_name child_element_specification>

Child_element_specification may contain:

  • A combination of child elements according to combination rules
<!ELEMENT page  (title, content, comment?)>
  • Mixed contents, i.e. child elements plus #PCDATA or ANY
<!ELEMENT para (strong | #PCDATA )*>
  • #PCDATA (Just data)
<!ELEMENT title (#PCDATA)>
  • ANY (only used during development)
<!ELEMENT para (ANY)*>
  • EMPTY (the element has no contents)
<!ELEMENT person EMPTY>

Tag names

  • Each tag name must start with a letter or an underscore ('_')
    followed by letters, numbers or the following characters: '_' , '-', '.', ':'
BAD example:
  <!ELEMENT 1st ...>
BAD example:

 <!ELEMENT My Home ...>

5.2 Combination rules

A and B = tags

Explanation

DTD example

XML example

A , B

A followed by B

<!ELEMENT person

( name ,email? )>

<person>

<name>Joe</name>

<email>x@x.x</email>

</person>

A?

A is optional,

(it can be present or absent)

<!ELEMENT person

(name, email? )>

<person>

<name>Joe</name></person>

A

At least one A

<!ELEMENT person

(name, email )>

<person> <name>Joe</name>

<email>x@x.x</email></person>

<person> <name>Joe</name>

<email>x@x.x</email>

<email>x@y.x</email>

</person>

A*

Zero, one or several A

<!ELEMENT person

(name, email* )>

<person>

<name>Joe</name>

</person>

A | B

Either A or B

<!ELEMENT person

( email | fax )>

<person> <name>Joe</name>

<email>x@x.x</email></person>

<person> <name>Joe</name>

<fax>123456789</fax></person>

(A, B)

Parenthesis will group and you can apply the above combination rules to the whole group

<!ELEMENT list ( name, email ) >

<list>

<person> <name>Joe</name>

<email>x@x.x</email></person>

</list>

Special contents

Special elements

Explanation

DTD examples

XML example

#PCDATA

"Parsed Character Data"

Text contents of an element. It should not contain any <,>,& etc.

<!ELEMENT email (#PCDATA)>

<email>Daniel.Schneider@tecfa.unige.ch</email>

ANY

Allows any non-specified child elements and parsed character data

(avoid this !!!)

<!ELEMENT person ANY>

<person>

<c>text</c>

<a>some <b>bbb</b>

inside </a>

</person>

EMPTY

No contents

<!ELEMENT br EMTPY>

<br/>

Note about Mixed Content

  • Mixed element contents contain both text and tags.
<para> here is <a href="xx">link</a>. <b>Check</b> it out </para>
  • You have to use the "|" construct for these
    • Good examples:
<!ELEMENT para (#PCDATA|a|ul|b|i|em)*>  
<!ELEMENT p (#PCDATA | a | abbr | acronym | br | cite | code | dfn | em | img | kbd |                         q | samp | span | strong | var )* >  
<!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* >
    • Bad examples:
<!ELEMENT p (name, first_name, #PCDATA)*>
<!ELEMENT p ( (#PCDATA) |a|ul|b|i|em)*>

Defining attributes

Rough syntax of Attribute rules:

<!ATTLIST element_name attr_name Attribute_type Type_Def Default >

Overview:

Type

Attribute types

CDATA

"Character Data" - Text data

NMTOKEN

A single word (no spaces or punctuations)

ID

Unique identifier of the element.

IDREF

Reference to an identifier.

IDREFS

Reference to one or more identifiers

(A|B|C|..)

List of values (from which the user must choose)

Type Definition

#IMPLIED

Attribute is optional

#REQUIRED

Attribute is mandatory)

#FIXED Value

Attribute has a fixed value (user can't change it)

Illustrations:

DTD rule

example XML

<!ATTLIST person first_name CDATA #REQUIRED>

<person first_name="Joe">

<!ATTLIST person gender (male|female) #IMPLIED>

<person gender="male">

<!ATTLIST form method CDATA #FIXED "POST">

<form method="POST">

<!ATTLIST list type (bullets|ordered) "ordered">

<list type="bullets">

<!ATTLIST sibling type (brother|sister) #REQUIRED>

<sibling type="brother">

<!ATTLIST person id ID #REQUIRED>

<person id="N1004">

Shortcut to define multiple attributes for an element:

<!ATTLIST target_tag

attr1_nom TypeAttribut TypeDef Defaut

attr2_nom TypeAttribut TypeDef Defaut

...>

Shortcut illustrations:

<!ATTLIST person ident ID #REQUIRED 
                    gender male|female) #IMPLIED
                    nom CDATA            #REQUIRED   prenom       CDATA            #REQUIRED   relation     brother|sister)  #REQUIRED >  <!ATTLIST portable       owner   IDREF            #REQUIRED >


Example 5-1: Lone family DTD (file family.dtd)

Xml-intro-edit-10.png

A valid XML file

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE family SYSTEM "family.dtd">
<family>
  <person name="Joe Miller" gender="male"
          type="father" id="N123456789"/>
  <person name="Josette Miller" gender="female"
          type="girl" id="N123456987"/>
</family>

5.6 General entities

Consider entities as abbreviations for some other content. An entity must be defined in the DTD and its contents are substituted when encountered in the XML file.

    • Recall that XML initially only defines 5 entities and that HTML does many more...
    • Use the &lt; &amp; &gt; &aquot; &apos; entities to refer to <, &, >," and '

Syntax of an internal entity definition: <!ENTITY entity_name "content">

Syntax of an external entity definition: <!ENTITY entity_name SYSTEM URI>

Syntax of using an entity: &entity_name;

Illustrations of entity definitions:

DTD rule

XML example

Result

<!ENTITY jt "Joe Test">

<para> &jt; is here<para>

<para> Joe Test is here</para>

<!ENTITY space "&#160;">

<!ENTITY copyright "&#xA9;">

&copyright; D. Schneider

<!ENTITY explanation SYSTEM "project1a.xml">

<citation> &explanation; </citation>

<citation> contents of project1a.xml ... </citation>

5.7 Parameter entities

  • Most professional DTDs use parameter entities.
  • These are used to simplify DTD writing

<!ENTITY  % entity_name "content">
<!ENTITY  % entity_name SYSTEM "URI">

Example 5-2: DTD entities to define reusable child elements

  • More complex DTD often use same structures all over. Instead of typing these several times for each element definition, one can use an ENTITY construction like this:
 
<!ENTITY % Content "(Para | List | Listing)*">
 

Later in the DTD we then can have element definitions like this:

 
<!ELEMENT Intro (Title, %Content; ) >
<!ELEMENT Goal (Title, %Content; ) >
 

The XML parser will then simply translate these %Content; and we get:

 
<!ELEMENT Intro (Title, (Para | List | Listing)*) >
<!ELEMENT Goal (Title, (Para | List | Listing)* ) >
 

Example 5-3: DTD entities to define reusable attribute definitions

  • You may use the same procedure to define "bricks" for attribute definitions
  • Entity example that defines part of an attribute definition
<!ENTITY % stamp 

'
  id ID #IMPLIED
  creation-day NMTOKEN #IMPLIED
  .......
  mod-by NMTOKEN #IMPLIED
  version NMTOKEN #IMPLIED
  status (draft|final|obsolete) #IMPLIED
  approval (ok|not-ok|so-so) #IMPLIED
  main-author CDATA #IMPLIED
'
>

ATTLIST definitions below use %stamp;

<!ELEMENT main-goal (title, content, (after-thoughts)?, (teacher-comments)?)>
<!ATTLIST main %stamp;
 >
<!ELEMENT title (...)>
<!ATTLIST main %stamp;
 >


Some advice for designing DTDs

Don't forget elements and be liberal

  • Each element needs to be defined, but only once !
  • Only make elements mandatory if they really are wanted, else use e.g. element  ?

Plan the global structure

  • Before you start writing out DTDs, use some simple "language" to draft the structure, e.g. use a notation like:
name   ==>  family   given
family ==> "text"
  • In most cases, each "object" of your "information domain" becomes an element
  • Use child elements to model components
  • Use attributes to describe properties of components

Start from the root element and work your way down:

  1. Root element
  2. Child elements of root element
  3. Child elements of the other elements, etc.