DTD tutorial: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
Line 17: Line 17:
* The most important part is usually the DTD, but in addition other constraints can be added !
* The most important part is usually the DTD, but in addition other constraints can be added !
* The DTD does not identify the root element ! You have to tell the users what elements can be root elements.
* The DTD does not identify the root element ! You have to tell the users what elements can be root elements.
* Since DTDs can't express data constraints, write them out in a specification document, e.g. "the value of length attribute is a string composed of a number plus "cm" or "inch" or "em"
* Since DTDs can't express data constraints, write them out in a specification document, e.g. "the value of length attribute is a string composed of a number plus "cm" or "inch" or "em". Exemple
  example: <size length="10cm">
<size length="10cm">


=== Example 1: A simple DTD ===
=== Example 1: A simple DTD ===
 
<source lang="xml">
  &lt;!ELEMENT page  (title, content, comment?)&gt;
  <!ELEMENT page  (title, content, comment?)>
  &lt;!ELEMENT title (#PCDATA)&gt;
  <!ELEMENT title (#PCDATA)>
  &lt;!ELEMENT content (#PCDATA)&gt;
  <!ELEMENT content (#PCDATA)>
  &lt;!ELEMENT comment (#PCDATA)&gt;
  <!ELEMENT comment (#PCDATA)>
</source>


* A DTD document contains just rules .... nothing else (see later for explanations)
* A DTD document contains just rules .... nothing else (see later for explanations)
Line 37: Line 38:
* DTD is declared on top of the file after the XML declaration.
* DTD is declared on top of the file after the XML declaration.
* XML declarations, DTD declaration etc. are part of the prologue
* XML declarations, DTD declaration etc. are part of the prologue
* So: The &lt;!DOCTYPE...&gt; declaration is part of the XML file, '' not'' the DTD ....
* So: The <nowiki><!DOCTYPE...></nowiki> declaration is part of the XML file, '' not'' the DTD ....


;Example
;Example


  &lt;?xml version="1.0" encoding="ISO-8859-1" ?&gt;
<source lang="xml">
  &lt;!DOCTYPE ''hello'' SYSTEM "hello.dtd"&gt;
  <?xml version="1.0" " ?>
  &lt;''hello''&gt;Here we &lt;strong&gt;go&lt;/strong&gt; ... &lt;''/hello''&gt;
  <!DOCTYPE hello SYSTEM "hello.dtd">
  <hello>Here we <strong>go</strong> ... </hello>
</source>


;Four ways of using a DTD
;Four ways of using a DTD
Line 50: Line 53:
(2) DTD rules are defined inside the XML document
(2) DTD rules are defined inside the XML document
* We get a "standalone" document (the XML document is self-sufficient)
* We get a "standalone" document (the XML document is self-sufficient)
(3) "Private/System" DTDs, the DTD is located on the system (own computer or the Internet)
(3) "Private/System" DTDs, the DTD is located on the system (own computer or the Internet). That's what you are going to use when you write your own DTDs.
#* '' ... that's what you are going to use when you write your own DTDs''
  <source lang="xml"> <!DOCTYPE hello SYSTEM "hello.dtd"></source>
  &lt;!DOCTYPE hello SYSTEM "hello.dtd"&gt;
(4) Public DTDs, we use a name for the DTD. This means that both your XML editor and user software know the DTD. This is the strategy used for common Web DTDs like XHTML, SVG, MathML, etc.
(4) Public DTDs, we use a name for the DTD. This means that both your XML editor and user software know the DTD. This is the strategy used for common Web DTDs like XHTML, SVG, MathML, etc.


=== Syntax of the DTD declaration in the XML document ===
=== Syntax of the DTD declaration in the XML document ===


* A DTD declaration starts with the keyword "DOCTYPE":
; A DTD declaration starts with the keyword "DOCTYPE":


   &lt;!''DOCTYPE'' ....  &gt;
   <!''DOCTYPE'' ....  >


* ... followed by the root element
... followed by the root element
* Remember that DTDs don't know their root element, root is defined in the XML document ! Also note that DTDs must define this root element just like any other element ! (you can have more than one)
: Remember that DTDs don't know their root element, root is defined in the XML document ! Also note that DTDs must define this root element just like any other element ! (you can have more than one)


   &lt;!DOCTYPE ''hello'' .... &gt;
   <!DOCTYPE ''hello'' .... >


* ... followed by the DTD definition or a reference to a DTD file
; ... followed by the DTD definition or a reference to a DTD file


;Syntax for internal DTDs (only !)
;Syntax for internal DTDs (only !)
 
: DTD rules are inserted between brackets [ ... ]
* DTD rules are inserted between brackets [ ... ]
<source lang="xml">
 
     <!DOCTYPE hello ['
     &lt;!DOCTYPE hello ''[''
         <!ELEMENT hello (#PCDATA)>
         ''&lt;!ELEMENT hello (#PCDATA)&gt;''
         ]>
         '']&gt;''
</source>


;Syntax to define "private" external DTDs
;Syntax to define "private" external DTDs
Line 80: Line 82:
* DTD is identified by the URL after the "'' SYSTEM'' " keyword
* DTD is identified by the URL after the "'' SYSTEM'' " keyword


  &lt;!DOCTYPE hello '' SYSTEM "hello.dtd"'' &gt;
<source lang="xml">
  <!DOCTYPE hello '' SYSTEM "hello.dtd"'' >
</source>


;Syntax for public DTDs
;Syntax for public DTDs
Line 86: Line 90:
* after the "'' PUBLIC'' " keyword you have to specify an official name and a backup URL that a validator could use.
* after the "'' PUBLIC'' " keyword you have to specify an official name and a backup URL that a validator could use.


  &lt;!DOCTYPE rss ''PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"''
<source lang="xml">
'' "http://my.netscape.com/publish/formats/rss-0.91.dtd"''
  <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
  &gt;
  "http://my.netscape.com/publish/formats/rss-0.91.dtd"''
</source>
  >


=== Some examples of XML documents with DTD declarations ===
=== Some examples of XML documents with DTD declarations ===
Line 94: Line 100:
;Example 2: Hello XML without DTD
;Example 2: Hello XML without DTD


  &lt;?xml version="1.0" ''standalone="yes"''?&gt;
<source lang="xml">
  &lt;hello&gt; Hello XML et hello cher lecteur ! &lt;/hello&gt;
  <?xml version="1.0" standalone="yes"?>
  <hello> Hello XML et hello cher lecteur ! </hello>
</source>


;Example 3: Hello XML with an internal DTD
;Example 3: Hello XML with an internal DTD


&lt;?xml version="1.0" ''standalone="yes"''?&gt;
<source lang="xml">
''&lt;!DOCTYPE hello [''
  <?xml version="1.0" standalone="yes"?>
''  &lt;!ELEMENT hello (#PCDATA)&gt;''
  <!DOCTYPE hello [
''  ]&gt;''
    <!ELEMENT hello (#PCDATA)>
&lt;hello&gt; Hello XML et hello ch�re lectrice ! &lt;/hello&gt;
    ]>
 
  <hello> Hello XML et hello dear readers ! </hello>
</source>


;Example 4: Hello XML with an external DTD
;Example 4: Hello XML with an external DTD


  &lt;?xml version="1.0" encoding="ISO-8859-1" ?&gt;
<source lang="xml">
  ''&lt;!DOCTYPE hello SYSTEM "hello.dtd"&gt;''
  <?xml version="1.0" ?>
  &lt;hello&gt; This is a very simple XML document &lt;/hello&gt;
  <!DOCTYPE hello SYSTEM "hello.dtd">
  <hello> This is a very simple XML document </hello>
</source>


* That's what you should with your own home-made DTDs
* That's what you should with your own home-made DTDs
Line 115: Line 128:
;Example 5: XML with a public external DTD (RSS 0.91)
;Example 5: XML with a public external DTD (RSS 0.91)


  &lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
  <?xml version="1.0" "?>
  ''&lt;!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"''
  <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
'' "http://my.netscape.com/publish/formats/rss-0.91.dtd"&gt;''
  "http://my.netscape.com/publish/formats/rss-0.91.dtd">
  &lt;rss version="0.91"&gt;
  <rss version="0.91">
  &lt;channel&gt; ...... &lt;/channel&gt;
  <channel> ...... </channel>
  &lt;/rss&gt;
  </rss>


== Understanding DTDs by example ==
== Understanding DTDs by example ==
Line 128: Line 141:
=== Example 6: Hello text with XML ===
=== Example 6: Hello text with XML ===


;A simple XML document of type '' &lt;page&gt;''
;A simple XML document of type <page>


  ''&lt;page&gt;''
  <page>
   &lt;title&gt;Hello friend&lt;/title&gt;
   <title>Hello friend</title>
   &lt;content&gt;Here is some content :)&lt;/content&gt;
   <content>Here is some content :)</content>
   &lt;comment&gt;Written by DKS/Tecfa, adapted from S.M./the Cocoon samples&lt;/comment&gt;
   <comment>Written by DKS/Tecfa, adapted from S.M./the Cocoon samples</comment>
  ''&lt;/page&gt;''
  </page>


;A DTD that would validate the document
;A DTD that would validate the document
Line 144: Line 157:
* Source: Introduction to XML by Jay Greenspan (now dead URL)
* Source: Introduction to XML by Jay Greenspan (now dead URL)


   &lt;?xml version="1.0"?&gt;
   <?xml version="1.0"?>
   ''&lt;!DOCTYPE list SYSTEM "simple_recipe.dtd"&gt;''
   <!DOCTYPE list SYSTEM "simple_recipe.dtd">
   &lt;list&gt;
   <list>
   &lt;recipe&gt;
   <recipe>
     &lt;author&gt;Carol Schmidt&lt;/author&gt;
     <author>Carol Schmidt</author>
     &lt;recipe_name&gt;Chocolate Chip Bars&lt;/recipe_name&gt;
     <recipe_name>Chocolate Chip Bars</recipe_name>
     &lt;meal&gt;Dinner&lt;/meal&gt;
     <meal>Dinner</meal>
     &lt;ingredients&gt;
     <ingredients>
       &lt;item&gt;2/3 C butter&lt;/item&gt;     &lt;item&gt;2 C brown sugar&lt;/item&gt;
       <item>2/3 C butter</item>     <item>2 C brown sugar</item>
       &lt;item&gt;1 tsp vanilla&lt;/item&gt;     &lt;item&gt;1 3/4 C unsifted all-purpose flour&lt;/item&gt;
       <item>1 tsp vanilla</item>     <item>1 3/4 C unsifted all-purpose flour</item>
       &lt;item&gt;1 1/2 tsp baking powder&lt;/item&gt;
       <item>1 1/2 tsp baking powder</item>
       &lt;item&gt;1/2 tsp salt&lt;/item&gt;     &lt;item&gt;3 eggs&lt;/item&gt;
       <item>1/2 tsp salt</item>     <item>3 eggs</item>
       &lt;item&gt;1/2 C chopped nuts&lt;/item&gt;
       <item>1/2 C chopped nuts</item>
       &lt;item&gt;2 cups (12-oz pkg.) semi-sweet choc. chips&lt;/item&gt;
       <item>2 cups (12-oz pkg.) semi-sweet choc. chips</item>
     &lt;/ingredients&gt;
     </ingredients>
     &lt;directions&gt;
     <directions>
  Preheat oven to 350 degrees. Melt butter; combine with brown sugar and vanilla in large mixing bowl. Set aside to cool.  Combine flour, baking powder, and salt; set aside. Add eggs to cooled sugar mixture; beat well. Stir in reserved dry  ingredients, nuts, and chips.
  Preheat oven to 350 degrees. Melt butter; combine with brown sugar and vanilla in large mixing bowl. Set aside to cool.  Combine flour, baking powder, and salt; set aside. Add eggs to cooled sugar mixture; beat well. Stir in reserved dry  ingredients, nuts, and chips.
  Spread in greased 13-by-9-inch pan. Bake for 25 to 30 minutes until golden brown; cool.  Cut into squares.
  Spread in greased 13-by-9-inch pan. Bake for 25 to 30 minutes until golden brown; cool.  Cut into squares.
     &lt;/directions&gt;
     </directions>
   &lt;/recipe&gt;
   </recipe>
  &lt;/list&gt;
  </list>


;Contents of the DTD
;Contents of the DTD
Line 173: Line 186:


<source lang="xml">
<source lang="xml">
  <?xml version="1.0" encoding="ISO-8859-1"?>
  <?xml version="1.0" "?>
  <!-- DTD to write simple stories
  <!-- DTD to write simple stories
       Made by Daniel K. Schneider / TECFA / University of Geneva
       Made by Daniel K. Schneider / TECFA / University of Geneva
Line 202: Line 215:
;Here is a valid skeleton
;Here is a valid skeleton
<source lang="xml">
<source lang="xml">
   <?xml version="1.0" encoding="ISO-8859-1" ?>
   <?xml version="1.0" " ?>
   <!DOCTYPE STORY SYSTEM "story-grammar.dtd">
   <!DOCTYPE STORY SYSTEM "story-grammar.dtd">
   <?xml-stylesheet href="story-grammar.css" type="text/css"?>
   <?xml-stylesheet href="story-grammar.css" type="text/css"?>
Line 235: Line 248:
;A valid XML file
;A valid XML file


  &lt;?xml version="1.0" encoding="ISO-8859-1" ?&gt;
  <?xml version="1.0" " ?>
  &lt;!DOCTYPE family SYSTEM "family.dtd"&gt;
  <!DOCTYPE family SYSTEM "family.dtd">
  &lt;family&gt;
  <family>
   &lt;person name="Joe Miller" gender="male"
   <person name="Joe Miller" gender="male"
           type="father" id="123.456.789"/&gt;
           type="father" id="123.456.789"/>
   &lt;person name="Josette Miller" gender="female"
   <person name="Josette Miller" gender="female"
           type="girl" id="123.456.987"/&gt;
           type="girl" id="123.456.987"/>
  &lt;/family&gt;
  </family>


=== Example 10: RSS ===
=== Example 10: RSS ===
Line 248: Line 261:
* There are several RSS standards. RSS 0.91 is Netscape's original (still being used)
* There are several RSS standards. RSS 0.91 is Netscape's original (still being used)


  &lt;!ELEMENT rss (channel)&gt;
  <!ELEMENT rss (channel)>
  &lt;!ATTLIST rss version CDATA #REQUIRED&gt;
  <!ATTLIST rss version CDATA #REQUIRED>
   &lt;!-- must be "0.91"&gt; --&gt;
   <!-- must be "0.91"> -->
   &lt;!ELEMENT channel (title | description | link | language | item  | rating? | image? | textinput? |  
   <!ELEMENT channel (title | description | link | language | item  | rating? | image? | textinput? |  
               copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? |  
               copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? |  
               webMaster? | skipHours? | skipDays?)*&gt;
               webMaster? | skipHours? | skipDays?)*>
   &lt;!ELEMENT title (#PCDATA)&gt;
   <!ELEMENT title (#PCDATA)>
   &lt;!ELEMENT description (#PCDATA)&gt;
   <!ELEMENT description (#PCDATA)>
   &lt;!ELEMENT link (#PCDATA)&gt;
   <!ELEMENT link (#PCDATA)>
   &lt;!ELEMENT image (title | url | link | width? | height? | description?)*&gt;
   <!ELEMENT image (title | url | link | width? | height? | description?)*>
   &lt;!ELEMENT url (#PCDATA)&gt;
   <!ELEMENT url (#PCDATA)>
   &lt;!ELEMENT item (title | link | description)*&gt;
   <!ELEMENT item (title | link | description)*>
   &lt;!ELEMENT textinput (title | description | name | link)*&gt;
   <!ELEMENT textinput (title | description | name | link)*>
   &lt;!ELEMENT name (#PCDATA)&gt;
   <!ELEMENT name (#PCDATA)>
   &lt;!ELEMENT rating (#PCDATA)&gt;
   <!ELEMENT rating (#PCDATA)>
   &lt;!ELEMENT language (#PCDATA)&gt;
   <!ELEMENT language (#PCDATA)>
   &lt;!ELEMENT width (#PCDATA)&gt;
   <!ELEMENT width (#PCDATA)>
   &lt;!ELEMENT height (#PCDATA)&gt;
   <!ELEMENT height (#PCDATA)>
   &lt;!ELEMENT copyright (#PCDATA)&gt;
   <!ELEMENT copyright (#PCDATA)>
   &lt;!ELEMENT pubDate (#PCDATA)&gt;
   <!ELEMENT pubDate (#PCDATA)>
   &lt;!ELEMENT lastBuildDate (#PCDATA)&gt;
   <!ELEMENT lastBuildDate (#PCDATA)>
   &lt;!ELEMENT docs (#PCDATA)&gt;
   <!ELEMENT docs (#PCDATA)>
   &lt;!ELEMENT managingEditor (#PCDATA)&gt;
   <!ELEMENT managingEditor (#PCDATA)>
   &lt;!ELEMENT webMaster (#PCDATA)&gt;
   <!ELEMENT webMaster (#PCDATA)>
   &lt;!ELEMENT hour (#PCDATA)&gt;
   <!ELEMENT hour (#PCDATA)>
   &lt;!ELEMENT day (#PCDATA)&gt;
   <!ELEMENT day (#PCDATA)>
   &lt;!ELEMENT skipHours (hour )&gt;
   <!ELEMENT skipHours (hour )>
   &lt;!ELEMENT skipDays (day )&gt;
   <!ELEMENT skipDays (day )>


; Possible XML document for RSS
; Possible XML document for RSS


  &lt;?xml version="1.0" encoding="ISO-8859-1" ?&gt;
  <?xml version="1.0" " ?>
  &lt;!DOCTYPE rss SYSTEM "rss-0.91.dtd"&gt;
  <!DOCTYPE rss SYSTEM "rss-0.91.dtd">
  &lt;rss version="0.91"&gt;
  <rss version="0.91">
     &lt;channel&gt;
     <channel>
       &lt;title&gt;Webster University&lt;/title&gt;
       <title>Webster University</title>
       &lt;description&gt;Home Page of Webster University&lt;/description&gt;
       <description>Home Page of Webster University</description>
       &lt;link&gt;http://www.webster.edu&lt;/link&gt;
       <link>http://www.webster.edu</link>
       &lt;item&gt;
       <item>
         &lt;title&gt;Webster Univ. Geneva&lt;/title&gt;
         <title>Webster Univ. Geneva</title>
         &lt;description&gt;Home page of Webster University Geneva&lt;/description&gt;
         <description>Home page of Webster University Geneva</description>
         &lt;link&gt;http://www.webster.ch&lt;/link&gt;
         <link>http://www.webster.ch</link>
       &lt;/item&gt;
       </item>
       &lt;item&gt;
       <item>
         &lt;title&gt;http://www.course.com/&lt;/title&gt;
         <title>http://www.course.com/</title>
         &lt;description&gt;You can find Thomson text-books materials (exercise data) on this web site&lt;/description&gt;
         <description>You can find Thomson text-books materials (exercise data) on this web site</description>
         &lt;link&gt;http://www.course.com/&lt;/link&gt;
         <link>http://www.course.com/</link>
       &lt;/item&gt;
       </item>
     &lt;/channel&gt;
     </channel>
   &lt;/rss&gt;
   </rss>


== Summary syntax of element definitions ==
== Summary syntax of element definitions ==
Line 319: Line 332:


* Element Name must contain First, Middle and Last
* Element Name must contain First, Middle and Last
  &lt;Name&gt;
  <Name>
   &lt;First&gt;D.&lt;/First&gt;&lt;Middle&gt;K.&lt;/Middle&gt;&lt;Last&gt;S.&lt;/Last&gt;
   <First>D.</First><Middle>K.</Middle><Last>S.</Last>
  &lt;/Name&gt;
  </Name>
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
Line 328: Line 341:
* optional element
* optional element
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ELEMENT Name (First,Middle?,Last)&gt;
 
* Middle is optional
* Middle is optional
&lt;Name&gt;&lt;First&gt;D.&lt;/First&gt;&lt;Last&gt;S.&lt;/Last&gt;&lt;/Name&gt;
 
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
Line 337: Line 350:
* at least one element
* at least one element
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;!ELEMENT list (movie+)
  <!ELEMENT list (movie+)
  &lt;list&gt;&lt;movie&gt;Return of ...&lt;/movie&gt;
  <list><movie>Return of ...</movie>
       &lt;movie&gt;Comeback of ...&lt;/movie&gt; &lt;/list&gt;
       <movie>Comeback of ...</movie> </list>
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
Line 346: Line 359:
* zero or more elements
* zero or more elements
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;!ELEMENT list (item*)
  <!ELEMENT list (item*)
* almost as above, but list can be empty
* almost as above, but list can be empty
|-
|-
Line 354: Line 367:
* pick one (or operator)
* pick one (or operator)
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;!ELEMENT major (economics | law)
  <!ELEMENT major (economics | law)
  &lt;major&gt; &lt;economics&gt; &lt;/economics&gt; &lt;/major&gt;
  <major> <economics> </economics> </major>
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
Line 362: Line 375:
* grouping construct, e.g. one can add ? or * or to a group.
* grouping construct, e.g. one can add ? or * or to a group.
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;!ELEMENT text (para | list | title)*
  <!ELEMENT text (para | list | title)*
* Use any of these elements in random order (as many as you like)
* Use any of these elements in random order (as many as you like)
  &lt;text&gt;
  <text>
  &lt;title&gt;Story&lt;/title&gt;
  <title>Story</title>
     &lt;para&gt;Once upon a time&lt;/para&gt;
     <para>Once upon a time</para>
     &lt;title&gt;The awakening&lt;/title&gt;
     <title>The awakening</title>
     &lt;list&gt; ... &lt;/list&gt;
     <list> ... </list>
  &lt;/text&gt;
  </text>
|}
|}


Line 376: Line 389:
'''Syntax of a DTD rule to define elements:
'''Syntax of a DTD rule to define elements:


   &lt;!ELEMENT tag_name (''child_element_specification'') &gt;
   <!ELEMENT tag_name (child_element_specification) >


==== Child_element_specification may contain: ====
==== Child_element_specification may contain: ====
Line 382: Line 395:
* A combination of child elements according to combination rules
* A combination of child elements according to combination rules


  &lt;!ELEMENT page  (title, content, comment?)&gt;
  <!ELEMENT page  (title, content, comment?)>


* Mixed contents, i.e. child elements plus #PCDATA or ANY
* Mixed contents, i.e. child elements plus #PCDATA or ANY


  &lt;!ELEMENT para (strong | #PCDATA )*&gt;
  <!ELEMENT para (strong | #PCDATA )*>


* <nowiki>#PCDATA (Just data)</nowiki>
* <nowiki>#PCDATA (Just data)</nowiki>


  &lt;!ELEMENT title (#PCDATA)&gt;
  <!ELEMENT title (#PCDATA)>


* ANY (only used during development)
* ANY (only used during development)


  &lt;!ELEMENT para (ANY)*&gt;
  <!ELEMENT para (ANY)*>


* EMPTY (the element has no contents)
* EMPTY (the element has no contents)


  &lt;!ELEMENT person EMPTY&gt;
  <!ELEMENT person EMPTY>


==== Tag names ====
==== Tag names ====
Line 404: Line 417:
* Each tag name must start with a letter or an underscore ('_')<br /> followed by letters, numbers or the following characters: '_' , '-', '.', ':'
* Each tag name must start with a letter or an underscore ('_')<br /> followed by letters, numbers or the following characters: '_' , '-', '.', ':'


  ''BAD example:''
  BAD example:
   &lt;!ELEMENT 1st ...&gt;
   <!ELEMENT 1st ...>


  ''BAD example:''
  BAD example:
   
   
   &lt;!ELEMENT My Home ...&gt;
   <!ELEMENT My Home ...>


=== Combination rules ===
=== Combination rules ===
Line 430: Line 443:
Elements in that order
Elements in that order
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;!ELEMENT person ('' name ,email?'' )&gt;
  <!ELEMENT person ( name ,email? )>


  &lt;!ELEMENT Name (First, Middle, Last)&gt;
  <!ELEMENT Name (First, Middle, Last)>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;person&gt;
  <person>
   &lt;name&gt;Joe&lt;/name&gt;
   <name>Joe</name>
   &lt;email&gt;x@x.x&lt;/email&gt;
   <email>x@x.x</email>
  &lt;/person&gt;
  </person>


  &lt;Name&gt;
  <Name>
   &lt;First&gt;D.&lt;/First&gt;&lt;Middle&gt;K.&lt;/Middle&gt;&lt;Last&gt;S.&lt;/Last&gt;
   <First>D.</First><Middle>K.</Middle><Last>S.</Last>
  &lt;/Name&gt;
  </Name>
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
Line 450: Line 463:
(it can be present or absent)
(it can be present or absent)
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;!ELEMENT person  (name, '' email?'' )&gt;
  <!ELEMENT person  (name, email? )>
 
<!ELEMENT Name (First,Middle?,Last)>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;person&gt;
  <person>
  &lt;name&gt;Joe&lt;/name&gt;&lt;/person&gt;
  <name>Joe</name></person>
 
<Name><First>D.</First><Last>S.</Last></Name>
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
Line 460: Line 477:
At least one A
At least one A
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;!ELEMENT person (name, '' email '' )&gt;
  <!ELEMENT person (name, email )>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;person&gt; &lt;name&gt;Joe&lt;/name&gt;
  <person> <name>Joe</name>
   &lt;email&gt;x@x.x&lt;/email&gt;&lt;/person&gt;
   <email>x@x.x</email></person>
   &lt;person&gt; &lt;name&gt;Joe&lt;/name&gt;
   <person> <name>Joe</name>
   &lt;email&gt;x@x.x&lt;/email&gt;
   <email>x@x.x</email>
   &lt;email&gt;x@y.x&lt;/email&gt;
   <email>x@y.x</email>
  &lt;/person&gt;
  </person>
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
Line 474: Line 491:
Zero, one or several A
Zero, one or several A
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;!ELEMENT person  (name, '' email*'' )&gt;
  <!ELEMENT person  (name, email* )>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;person&gt;
  <person>
   &lt;name&gt;Joe&lt;/name&gt;
   <name>Joe</name>
   &lt;/person&gt;
   </person>
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
Line 485: Line 502:
Either A or B
Either A or B
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;!ELEMENT person ('' email | fax'' )&gt;
  <!ELEMENT person ( email | fax )>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;person&gt; &lt;name&gt;Joe&lt;/name&gt;
  <person> <name>Joe</name>
   &lt;email&gt;x@x.x&lt;/email&gt;&lt;/person&gt;
   <email>x@x.x</email></person>
   &lt;person&gt; &lt;name&gt;Joe&lt;/name&gt;
   <person> <name>Joe</name>
   &lt;fax&gt;123456789&lt;/fax&gt;&lt;/person&gt;
   <fax>123456789</fax></person>
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
Line 498: Line 515:
the above combination rules to the whole group
the above combination rules to the whole group
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;!ELEMENT list '' ('' name, email'' ) &gt;''
  <!ELEMENT list ( name, email ) >
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
  &lt;list&gt;
  <list>
   &lt;person&gt; &lt;name&gt;Joe&lt;/name&gt;
   <person> <name>Joe</name>
   &lt;email&gt;x@x.x&lt;/email&gt;&lt;/person&gt;
   <email>x@x.x</email></person>
  &lt;/list&gt;
  </list>
|}
|}


Line 523: Line 540:
"Parsed Character Data"
"Parsed Character Data"


Text contents of an element. It should not contain any &lt;,&gt;,&amp; etc.
Text contents of an element. It should not contain any <,>,&amp; etc.
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ELEMENT email (#PCDATA)&gt;
<!ELEMENT email (#PCDATA)>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;email&gt;Daniel.Schneider@tecfa.unige.ch&lt;/email&gt;
<email>Daniel.Schneider@tecfa.unige.ch</email>
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
Line 536: Line 553:
(avoid this !!!)
(avoid this !!!)
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ELEMENT person ANY&gt;
<!ELEMENT person ANY>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;person&gt;
<person>


&lt;c&gt;text&lt;/c&gt;
<c>text</c>


&lt;a&gt;some &lt;b&gt;bbb&lt;/b&gt;
<a>some <b>bbb</b>


inside &lt;/a&gt;
inside </a>


&lt;/person&gt;
</person>
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
Line 553: Line 570:
No contents
No contents
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ELEMENT br EMTPY&gt;
<!ELEMENT br EMTPY>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;br/&gt;
<br/>
|}
|}


Line 562: Line 579:
* Mixed element contents contain both text and tags.
* Mixed element contents contain both text and tags.


  &lt;para&gt; here is &lt;a href="xx"&gt;link&lt;/a&gt;. &lt;b&gt;Check&lt;/b&gt; it out &lt;/para&gt;
  <para> here is <a href="xx">link</a>. <b>Check</b> it out </para>


* You have to use the "|" construct for these
* You have to use the "|" construct for these
Line 568: Line 585:
;Good examples:
;Good examples:


  &lt;!ELEMENT para (#PCDATA|a|ul|b|i|em)*&gt;  
  <!ELEMENT para (#PCDATA|a|ul|b|i|em)*>  
  &lt;!ELEMENT p (#PCDATA | a | abbr | acronym | br | cite | code | dfn | em | img | kbd |
  <!ELEMENT p (#PCDATA | a | abbr | acronym | br | cite | code | dfn | em | img | kbd |
                   q | samp | span | strong | var )* &gt;  
                   q | samp | span | strong | var )* >  
  &lt;!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* &gt;
  <!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* >


;Bad examples:
;Bad examples:


  &lt;!ELEMENT p (name, first_name, #PCDATA)*&gt;
  <!ELEMENT p (name, first_name, #PCDATA)*>
  &lt;!ELEMENT p ( (#PCDATA) |a|ul|b|i|em)*&gt;
  <!ELEMENT p ( (#PCDATA) |a|ul|b|i|em)*>


== Defining attributes ==
== Defining attributes ==
Line 582: Line 599:
=== Rough syntax of Attribute rules: ===
=== Rough syntax of Attribute rules: ===


  &lt;!ATTLIST element_name attr_name Attribute_type Type_Def Default &gt;
  <!ATTLIST element_name attr_name Attribute_type Type_Def Default >


=== Overview: ===
=== Overview: ===
Line 654: Line 671:
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ATTLIST person first_name CDATA #REQUIRED&gt;
<!ATTLIST person first_name CDATA #REQUIRED>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;person first_name="Joe"&gt;
<person first_name="Joe">
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ATTLIST person gender (male|female) #IMPLIED&gt;
<!ATTLIST person gender (male|female) #IMPLIED>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;person gender="male"&gt;
<person gender="male">
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ATTLIST form method CDATA #FIXED "POST"&gt;
<!ATTLIST form method CDATA #FIXED "POST">
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;form method="POST"&gt;
<form method="POST">
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ATTLIST list type (bullets|ordered) "ordered"&gt;
<!ATTLIST list type (bullets|ordered) "ordered">
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;list type="bullets"&gt;
<list type="bullets">
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ATTLIST sibling type (brother|sister) #REQUIRED&gt;
<!ATTLIST sibling type (brother|sister) #REQUIRED>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;sibling type="brother"&gt;
<sibling type="brother">
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ATTLIST person id ID #REQUIRED&gt;
<!ATTLIST person id ID #REQUIRED>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;person id="N1004"&gt;
<person id="N1004">
|}
|}


=== Shortcut to define multiple attributes for an element: ===
=== Shortcut to define multiple attributes for an element: ===


&lt;!ATTLIST target_tag
<!ATTLIST target_tag


attr1_nom TypeAttribut TypeDef Defaut
attr1_nom TypeAttribut TypeDef Defaut
Line 692: Line 709:
attr2_nom TypeAttribut TypeDef Defaut
attr2_nom TypeAttribut TypeDef Defaut


...&gt;
...>


; Shortcut illustrations:
; Shortcut illustrations:


  &lt;!ATTLIST person ident ID #REQUIRED  
  <!ATTLIST person ident ID #REQUIRED  
       gender male|female) #IMPLIED
       gender male|female) #IMPLIED
       nom CDATA #REQUIRED  
       nom CDATA #REQUIRED  
       prenom CDATA #REQUIRED   
       prenom CDATA #REQUIRED   
       relation  brother|sister) #REQUIRED &gt;  
       relation  brother|sister) #REQUIRED >  


&lt;!ATTLIST portable owner IDREF #REQUIRED &gt;
<!ATTLIST portable owner IDREF #REQUIRED >


=== Example 11: Lone family DTD (file family.dtd) ===
=== Example 11: Lone family DTD (file family.dtd) ===
Line 710: Line 727:
; A valid family XML file
; A valid family XML file


  &lt;?xml version="1.0" encoding="ISO-8859-1" ?&gt;
  <?xml version="1.0" ?>
  &lt;!DOCTYPE family SYSTEM "family.dtd"&gt;
  <!DOCTYPE family SYSTEM "family.dtd">
  &lt;family&gt;
  <family>
   &lt;person name="Joe Miller" gender="male"
   <person name="Joe Miller" gender="male"
           type="father" id="N123456789"/&gt;
           type="father" id="N123456789"/>
   &lt;person name="Josette Miller" gender="female"
   <person name="Josette Miller" gender="female"
           type="girl" id="N123456987"/&gt;
           type="girl" id="N123456987"/>
  &lt;/family&gt;
  </family>


== Entities ==
== Entities ==
Line 725: Line 742:
Consider entities as abbreviations for some other content. An entity must be defined in the DTD and its contents are substituted when encountered in the XML file. Then, recall that XML initially only defines 5 entities and that HTML does many more...
Consider entities as abbreviations for some other content. An entity must be defined in the DTD and its contents are substituted when encountered in the XML file. Then, recall that XML initially only defines 5 entities and that HTML does many more...


* Use the '' &amp;lt; &amp;amp; &amp;gt; &amp;aquot; &amp;apos;'' entities to refer to '' &lt;, &amp;, &gt;,"'' '' '' and'' '' '' '''
* Use the '' &amp;lt; &amp;amp; &amp;gt; &amp;aquot; &amp;apos;'' entities to refer to <, &amp;, >,"   and   '''


Syntax of an internal entity definition: '' &lt;!ENTITY entity_name "content"&gt;''
Syntax of an internal entity definition: <!ENTITY entity_name "content">


Syntax of an external entity definition: '' &lt;!ENTITY entity_name SYSTEM URI&gt;''
Syntax of an external entity definition: <!ENTITY entity_name SYSTEM URI>


Syntax of using an entity:'' &amp;entity_name;''
Syntax of using an entity: &amp;entity_name;


;Illustrations of entity definitions
;Illustrations of entity definitions
Line 744: Line 761:
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ENTITY jt "Joe Test"&gt;
<!ENTITY jt "Joe Test">
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
'' &lt;para&gt; &amp;jt; is here&lt;para&gt;''
<para> &amp;jt; is here<para>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
'' &lt;para&gt; Joe Test is here&lt;/para&gt;''
<para> Joe Test is here</para>
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ENTITY space "&amp;#160;"&gt;
<!ENTITY space "&amp;#160;">
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |


Line 758: Line 775:
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ENTITY copyright "&amp;#xA9;"&gt;
<!ENTITY copyright "&amp;#xA9;">
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&amp;copyright; D. Schneider
&amp;copyright; D. Schneider
Line 765: Line 782:
|-
|-
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;!ENTITY explanation SYSTEM "project1a.xml"&gt;
<!ENTITY explanation SYSTEM "project1a.xml">
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
'' &lt;citation&gt; &amp;explanation; &lt;/citation&gt;''
<citation> &amp;explanation; </citation>
| rowspan="1" colspan="1" |
| rowspan="1" colspan="1" |
&lt;citation&gt; '' contents of project1a.xml ...'' &lt;/citation&gt;
<citationcontents of project1a.xml ... </citation>
|}
|}


Line 777: Line 794:
* These are used to simplify DTD writing
* These are used to simplify DTD writing


&lt;!ENTITY '' %'' entity_name "content"&gt;<br /> &lt;!ENTITY '' %'' entity_name SYSTEM "URI"&gt;
<!ENTITY % entity_name "content"><br /> <!ENTITY % entity_name SYSTEM "URI">


=== Example 12 DTD entities to define reusable child elements ===
=== Example 12 DTD entities to define reusable child elements ===
Line 783: Line 800:
* More complex DTD often use same structures all over. Instead of typing these several times for each element definition, one can use an ENTITY construction like this:
* More complex DTD often use same structures all over. Instead of typing these several times for each element definition, one can use an ENTITY construction like this:


  &lt;!ENTITY % Content "(Para | List | Listing)*"&gt;
  <!ENTITY % Content "(Para | List | Listing)*">


Later in the DTD we then can have element definitions like this:
Later in the DTD we then can have element definitions like this:


  &lt;!ELEMENT Intro (Title, %Content; ) &gt;
  <!ELEMENT Intro (Title, %Content; ) >
  &lt;!ELEMENT Goal (Title, %Content; ) &gt;
  <!ELEMENT Goal (Title, %Content; ) >


The XML parser will then simply translate these '' %Content;'' and we get:
The XML parser will then simply translate these %Content; and we get:


  &lt;!ELEMENT Intro (Title, (Para | List | Listing)*) &gt;
  <!ELEMENT Intro (Title, (Para | List | Listing)*) >
  &lt;!ELEMENT Goal (Title, (Para | List | Listing)* ) &gt;
  <!ELEMENT Goal (Title, (Para | List | Listing)* ) >


=== Example 13 DTD entities to define reusable attribute definitions ===
=== Example 13 DTD entities to define reusable attribute definitions ===
Line 800: Line 817:
* Entity example that defines part of an attribute definition
* Entity example that defines part of an attribute definition


  &lt;!ENTITY ''% stamp ''  '
  <!ENTITY % stamp   '
   id ID #IMPLIED
   id ID #IMPLIED
   creation-day NMTOKEN #IMPLIED
   creation-day NMTOKEN #IMPLIED
Line 809: Line 826:
   approval (ok|not-ok|so-so) #IMPLIED
   approval (ok|not-ok|so-so) #IMPLIED
   main-author CDATA #IMPLIED
   main-author CDATA #IMPLIED
  ' &gt;
  ' >


ATTLIST definitions below use %stamp;
ATTLIST definitions below use %stamp;
  &lt;!ELEMENT main-goal (title, content, (after-thoughts)?, (teacher-comments)?)&gt;
  <!ELEMENT main-goal (title, content, (after-thoughts)?, (teacher-comments)?)>
  &lt;!ATTLIST main ''%stamp;'' &gt;
  <!ATTLIST main %stamp; >
  &lt;!ELEMENT title (...)&gt;
  <!ELEMENT title (...)>
  &lt;!ATTLIST main ''%stamp;'' &gt;
  <!ATTLIST main %stamp; >


== Some advice for designing DTDs ==
== Some advice for designing DTDs ==
Line 824: Line 841:


* Each element needs to be defined, but only once !
* Each element needs to be defined, but only once !
* Only make elements mandatory if they really are wanted, else use e.g. '' element'' '' ?''
* Only make elements mandatory if they really are wanted, else use e.g. element ?


; Plan the global structure
; Plan the global structure
Line 830: Line 847:
* Before you start writing out DTDs, use some simple "language" to draft the structure, e.g. use a notation like:
* Before you start writing out DTDs, use some simple "language" to draft the structure, e.g. use a notation like:


  name  ==&gt; family  given
  name  ==> family  given
  family ==&gt; "text"
  family ==> "text"


* In most cases, each "object" of your "information domain" becomes an element
* In most cases, each "object" of your "information domain" becomes an element
Line 861: Line 878:


* if an attribute refers to an other element
* if an attribute refers to an other element
** &lt;pet_of owner_name="lisa" pet_type="cat") would refer to &lt;animal category="cat"&gt;
** <pet_of owner_name="lisa" pet_type="cat") would refer to <animal category="cat">
* to declare usage/type/etc. of an element:<br /> &lt;address usage="prof"&gt; ... &lt;/address&gt;
* to declare usage/type/etc. of an element:<br /> <address usage="prof"> ... </address>
* if you wish to list all possible values a user can enter
* if you wish to list all possible values a user can enter
* if you want to restrict data type of the attribute value (e.g. require a single word)
* if you want to restrict data type of the attribute value (e.g. require a single word)

Revision as of 16:01, 1 November 2010

Introduction

This is a short tutorial about DTDs. It brievely shows how to read DTDs, then how to create these.

A note on its production: This spring I am teaching a little course at Webster, Geneva. I usually do my teaching slides with Framemaker (e.g. see my french slides. Now I wanted to know how long it takes to translate them into wiki. Answer: about 90 minutes (way too long!). I first export to HTML and then apply a html2wiki filter and then hand edit. Of course, this result also needs some editing, slides are not meant for self-study .... - 18:46, 23 April 2007 (MEST).

DTD grammars are a set of rules that define:

  • a set of elements (tags) and their attributes that can be used to create an XML document;
  • how elements can be embedded ;
  • different sorts of entities (reusable fragments, special characters).
  • DTDs can't define what the contents look like, i.e. character data (element contents) and most attribute values.

Specification of a markup language

  • The most important part is usually the DTD, but in addition other constraints can be added !
  • The DTD does not identify the root element ! You have to tell the users what elements can be root elements.
  • Since DTDs can't express data constraints, write them out in a specification document, e.g. "the value of length attribute is a string composed of a number plus "cm" or "inch" or "em". Exemple
<size length="10cm">

Example 1: A simple DTD

 <!ELEMENT page  (title, content, comment?)>
 <!ELEMENT title (#PCDATA)>
 <!ELEMENT content (#PCDATA)>
 <!ELEMENT comment (#PCDATA)>
  • A DTD document contains just rules .... nothing else (see later for explanations)


Using a DTD with an XML document

Document type declarations

  • A valid XML document includes a declaration that specifies the DTD used
  • DTD is declared on top of the file after the XML declaration.
  • XML declarations, DTD declaration etc. are part of the prologue
  • So: The <!DOCTYPE...> declaration is part of the XML file, not the DTD ....
Example
 <?xml version="1.0" " ?>
 <!DOCTYPE hello SYSTEM "hello.dtd">
 <hello>Here we <strong>go</strong> ... </hello>
Four ways of using a DTD

(1) No DTD (XML document will just be well-formed) (2) DTD rules are defined inside the XML document

  • We get a "standalone" document (the XML document is self-sufficient)

(3) "Private/System" DTDs, the DTD is located on the system (own computer or the Internet). That's what you are going to use when you write your own DTDs.

 <!DOCTYPE hello SYSTEM "hello.dtd">

(4) Public DTDs, we use a name for the DTD. This means that both your XML editor and user software know the DTD. This is the strategy used for common Web DTDs like XHTML, SVG, MathML, etc.

Syntax of the DTD declaration in the XML document

A DTD declaration starts with the keyword "DOCTYPE"
  <!DOCTYPE ....  >
... followed by the root element
Remember that DTDs don't know their root element, root is defined in the XML document ! Also note that DTDs must define this root element just like any other element ! (you can have more than one)
 <!DOCTYPE hello .... >
... followed by the DTD definition or a reference to a DTD file
Syntax for internal DTDs (only !)
DTD rules are inserted between brackets [ ... ]
    <!DOCTYPE hello ['
        <!ELEMENT hello (#PCDATA)>
        ]>
Syntax to define "private" external DTDs
  • DTD is identified by the URL after the " SYSTEM " keyword
 <!DOCTYPE hello '' SYSTEM "hello.dtd"'' >
Syntax for public DTDs
  • after the " PUBLIC " keyword you have to specify an official name and a backup URL that a validator could use.
 <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
  "http://my.netscape.com/publish/formats/rss-0.91.dtd"''
>

Some examples of XML documents with DTD declarations

Example 2
Hello XML without DTD
 <?xml version="1.0" standalone="yes"?>
 <hello> Hello XML et hello cher lecteur ! </hello>
Example 3
Hello XML with an internal DTD
  <?xml version="1.0" standalone="yes"?>
  <!DOCTYPE hello [
     <!ELEMENT hello (#PCDATA)>
     ]>

  <hello> Hello XML et hello dear readers ! </hello>
Example 4
Hello XML with an external DTD
 <?xml version="1.0" ?>
 <!DOCTYPE hello SYSTEM "hello.dtd">
 <hello> This is a very simple XML document </hello>
  • That's what you should with your own home-made DTDs
Example 5
XML with a public external DTD (RSS 0.91)
<?xml version="1.0" "?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
 "http://my.netscape.com/publish/formats/rss-0.91.dtd">
Extension:RSS -- Error:

"

<channel> ...... </channel>
" is not in the list of allowed feeds. There are no allowed feed URLs in the list.

Understanding DTDs by example

  • Recall that DTDs define all the elements and attributes and the way they can be combined

Example 6: Hello text with XML

A simple XML document of type <page>
<page>
 <title>Hello friend</title>
 <content>Here is some content :)</content>
 <comment>Written by DKS/Tecfa, adapted from S.M./the Cocoon samples</comment>
</page>
A DTD that would validate the document
Xml-intro-edit-6.png

Example 7: A recipe list in XML

  • Source: Introduction to XML by Jay Greenspan (now dead URL)
 <?xml version="1.0"?>
 <!DOCTYPE list SYSTEM "simple_recipe.dtd">
 <list>
 <recipe>
   <author>Carol Schmidt</author>
   <recipe_name>Chocolate Chip Bars</recipe_name>
   <meal>Dinner</meal>
   <ingredients>
     <item>2/3 C butter</item>      <item>2 C brown sugar</item>
     <item>1 tsp vanilla</item>     <item>1 3/4 C unsifted all-purpose flour</item>
     <item>1 1/2 tsp baking powder</item>
     <item>1/2 tsp salt</item>      <item>3 eggs</item>
     <item>1/2 C chopped nuts</item>
     <item>2 cups (12-oz pkg.) semi-sweet choc. chips</item>
   </ingredients>
   <directions>
Preheat oven to 350 degrees. Melt butter; combine with brown sugar and vanilla in large mixing bowl. Set aside to cool.  Combine flour, baking powder, and salt; set aside. Add eggs to cooled sugar mixture; beat well. Stir in reserved dry  ingredients, nuts, and chips.
Spread in greased 13-by-9-inch pan. Bake for 25 to 30 minutes until golden brown; cool.  Cut into squares.
   </directions>
 </recipe>
</list>
Contents of the DTD

Xml-intro-edit-7.png

Example 8: A simple story grammar

 <?xml version="1.0" "?>
 <!-- DTD to write simple stories
      Made by Daniel K. Schneider / TECFA / University of Geneva
      VERSION 1.0
      30/10/2003 -->
 <!ELEMENT STORY (title, context, problem, goal, THREADS, moral, INFOS)>
 <!ATTLIST STORY xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink">
 <!ELEMENT THREADS (EPISODE+)>
 <!ELEMENT EPISODE (subgoal, ATTEMPT+, result) >
 <!ELEMENT ATTEMPT (action | EPISODE) >
 <!ELEMENT INFOS ( ( date | author | a )* ) >
 <!ELEMENT title (#PCDATA) >
 <!ELEMENT context (#PCDATA) >
 <!ELEMENT problem (#PCDATA) >
 <!ELEMENT goal (#PCDATA) >
 <!ELEMENT subgoal (#PCDATA) >
 <!ELEMENT result (#PCDATA) >
 <!ELEMENT moral (#PCDATA) >
 <!ELEMENT action (#PCDATA) >
 <!ELEMENT date (#PCDATA) >
 <!ELEMENT author (#PCDATA) >
 <!ELEMENT a (#PCDATA)>
 <!ATTLIST a
      xlink:href CDATA #REQUIRED
      xlink:type CDATA #FIXED "simple">
Here is a valid skeleton
  <?xml version="1.0" " ?>
  <!DOCTYPE STORY SYSTEM "story-grammar.dtd">
  <?xml-stylesheet href="story-grammar.css" type="text/css"?>
  <STORY>
   <title>The little XMLer</title>
   <context></context>
  <problem></problem>
  <goal></goal>
  <THREADS>
    <EPISODE>
      <subgoal>I have to do it ...</subgoal>
      <ATTEMPT>
        <action></action>
      </ATTEMPT>
      <result></result>
    </EPISODE>
  </THREADS>
  <moral></moral>
  <INFOS>
  </INFOS>
 </STORY>

The picture gives some extra information

Xml-intro-edit-8.png

Example 9: Lone family DTD

Xml-intro-edit-9.png

A valid XML file
<?xml version="1.0" " ?>
<!DOCTYPE family SYSTEM "family.dtd">
<family>
  <person name="Joe Miller" gender="male"
          type="father" id="123.456.789"/>
  <person name="Josette Miller" gender="female"
          type="girl" id="123.456.987"/>
</family>

Example 10: RSS

  • There are several RSS standards. RSS 0.91 is Netscape's original (still being used)
<!ELEMENT rss (channel)>
<!ATTLIST rss version CDATA #REQUIRED>
 <!ELEMENT channel (title | description | link | language | item  | rating? | image? | textinput? | 
              copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | 
              webMaster? | skipHours? | skipDays?)*>
 <!ELEMENT title (#PCDATA)>
 <!ELEMENT description (#PCDATA)>
 <!ELEMENT link (#PCDATA)>
 <!ELEMENT image (title | url | link | width? | height? | description?)*>
 <!ELEMENT url (#PCDATA)>
 <!ELEMENT item (title | link | description)*>
 <!ELEMENT textinput (title | description | name | link)*>
 <!ELEMENT name (#PCDATA)>
 <!ELEMENT rating (#PCDATA)>
 <!ELEMENT language (#PCDATA)>
 <!ELEMENT width (#PCDATA)>
 <!ELEMENT height (#PCDATA)>
 <!ELEMENT copyright (#PCDATA)>
 <!ELEMENT pubDate (#PCDATA)>
 <!ELEMENT lastBuildDate (#PCDATA)>
 <!ELEMENT docs (#PCDATA)>
 <!ELEMENT managingEditor (#PCDATA)>
 <!ELEMENT webMaster (#PCDATA)>
 <!ELEMENT hour (#PCDATA)>
 <!ELEMENT day (#PCDATA)>
 <!ELEMENT skipHours (hour )>
 <!ELEMENT skipDays (day )>
Possible XML document for RSS
<?xml version="1.0" " ?>
<!DOCTYPE rss SYSTEM "rss-0.91.dtd">
Extension:RSS -- Error:

"

   <channel>
     <title>Webster University</title>
     <description>Home Page of Webster University</description>
     <link>http://www.webster.edu</link>
     <item>
       <title>Webster Univ. Geneva</title>
       <description>Home page of Webster University Geneva</description>
       <link>http://www.webster.ch</link>
     </item>
     <item>
       <title>http://www.course.com/</title>
       <description>You can find Thomson text-books materials (exercise data) on this web site</description>
       <link>http://www.course.com/</link>
     </item>
   </channel>
 " is not in the list of allowed feeds. There are no allowed feed URLs in the list.

Summary syntax of element definitions

  • The purpose of this table is not to teach you how to write DTDs
  • To understand how to use DTDs, you just need to know how to read a DTD

syntax element

short explanation

Example element definition
Valid XML example

,

  • elements in that order
  • Element Name must contain First, Middle and Last
<Name>
  <First>D.</First><Middle>K.</Middle><Last>S.</Last>
</Name>

?

  • optional element
  • Middle is optional

+

  • at least one element
<!ELEMENT list (movie+)
<list><movie>Return of ...</movie>
      <movie>Comeback of ...</movie> </list>

*

  • zero or more elements
<!ELEMENT list (item*)
  • almost as above, but list can be empty

|

  • pick one (or operator)
<!ELEMENT major (economics | law)
<major> <economics> </economics> </major>

()

  • grouping construct, e.g. one can add ? or * or to a group.
<!ELEMENT text (para | list | title)*
  • Use any of these elements in random order (as many as you like)
<text>
<title>Story</title>
   <para>Once upon a time</para> 
   <title>The awakening</title> 
   <list> ... </list>
</text>

Definition of elements

Syntax of a DTD rule to define elements:

 <!ELEMENT tag_name (child_element_specification) >

Child_element_specification may contain:

  • A combination of child elements according to combination rules
<!ELEMENT page  (title, content, comment?)>
  • Mixed contents, i.e. child elements plus #PCDATA or ANY
<!ELEMENT para (strong | #PCDATA )*>
  • #PCDATA (Just data)
<!ELEMENT title (#PCDATA)>
  • ANY (only used during development)
<!ELEMENT para (ANY)*>
  • EMPTY (the element has no contents)
<!ELEMENT person EMPTY>

Tag names

  • Each tag name must start with a letter or an underscore ('_')
    followed by letters, numbers or the following characters: '_' , '-', '.', ':'
BAD example:
  <!ELEMENT 1st ...>
BAD example:

 <!ELEMENT My Home ...>

Combination rules

A and B = tags

Explanation

DTD examples

XML examples

A , B

A followed by B

Elements in that order

<!ELEMENT person ( name ,email? )>
<!ELEMENT Name (First, Middle, Last)>
<person>
  <name>Joe</name>
  <email>x@x.x</email>
</person>
<Name>
  <First>D.</First><Middle>K.</Middle><Last>S.</Last>
</Name>

A?

A is optional,

(it can be present or absent)

<!ELEMENT person  (name,  email? )>
<!ELEMENT Name (First,Middle?,Last)>
<person>
<name>Joe</name></person>
<Name><First>D.</First><Last>S.</Last></Name>

A

At least one A

<!ELEMENT person (name,  email  )>
<person> <name>Joe</name>
  <email>x@x.x</email></person>
  <person> <name>Joe</name>
  <email>x@x.x</email>
  <email>x@y.x</email>
</person>

A*

Zero, one or several A

<!ELEMENT person  (name,  email* )>
<person>
  <name>Joe</name>
  </person>

A | B

Either A or B

<!ELEMENT person ( email | fax )>
<person> <name>Joe</name>
  <email>x@x.x</email></person>
  <person> <name>Joe</name>
  <fax>123456789</fax></person>

(A, B)

Parenthesis will group and you can apply the above combination rules to the whole group

<!ELEMENT list  ( name, email ) >
<list>
  <person> <name>Joe</name>
  <email>x@x.x</email></person>
</list>

Special contents

Special elements

Explanation

DTD examples

XML example

#PCDATA

"Parsed Character Data"

Text contents of an element. It should not contain any <,>,& etc.

<!ELEMENT email (#PCDATA)>

<email>Daniel.Schneider@tecfa.unige.ch</email>

ANY

Allows any non-specified child elements and parsed character data

(avoid this !!!)

<!ELEMENT person ANY>

<person>

<c>text</c>

<a>some bbb

inside </a>

</person>

EMPTY

No contents

<!ELEMENT br EMTPY>


Note about Mixed Content

  • Mixed element contents contain both text and tags.
<para> here is <a href="xx">link</a>. Check it out </para>
  • You have to use the "|" construct for these
Good examples
<!ELEMENT para (#PCDATA|a|ul|b|i|em)*>  
<!ELEMENT p (#PCDATA | a | abbr | acronym | br | cite | code | dfn | em | img | kbd |
                 q | samp | span | strong | var )* >  
<!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* >
Bad examples
<!ELEMENT p (name, first_name, #PCDATA)*>
<!ELEMENT p ( (#PCDATA) |a|ul|b|i|em)*>

Defining attributes

Rough syntax of Attribute rules:

<!ATTLIST element_name attr_name Attribute_type Type_Def Default >

Overview:

Keyword

Attribute types

CDATA

"Character Data" - Text data

NMTOKEN

A single word (no spaces or punctuations)

ID

Unique identifier of the element.

IDREF

Reference to an identifier.

IDREFS

Reference to one or more identifiers

(A|B|C|..)

List of values (from which the user must choose)

Type Definition

#IMPLIED

Attribute is optional

#REQUIRED

Attribute is mandatory)

#FIXED Value

Attribute has a fixed value (user can't change it)

Illustrations

DTD rule

example XML

<!ATTLIST person first_name CDATA #REQUIRED>

<person first_name="Joe">

<!ATTLIST person gender (male|female) #IMPLIED>

<person gender="male">

<!ATTLIST form method CDATA #FIXED "POST">

<form method="POST">

<!ATTLIST list type (bullets|ordered) "ordered">

<list type="bullets">

<!ATTLIST sibling type (brother|sister) #REQUIRED>

<sibling type="brother">

<!ATTLIST person id ID #REQUIRED>

<person id="N1004">

Shortcut to define multiple attributes for an element:

<!ATTLIST target_tag

attr1_nom TypeAttribut TypeDef Defaut

attr2_nom TypeAttribut TypeDef Defaut

...>

Shortcut illustrations
<!ATTLIST person ident ID #REQUIRED 
     gender male|female) #IMPLIED
     nom CDATA #REQUIRED 
     prenom CDATA #REQUIRED   
     relation  brother|sister) #REQUIRED >  

<!ATTLIST portable owner IDREF #REQUIRED >

Example 11: Lone family DTD (file family.dtd)

Xml-intro-edit-10.png

A valid family XML file
<?xml version="1.0" ?>
<!DOCTYPE family SYSTEM "family.dtd">
<family>
  <person name="Joe Miller" gender="male"
          type="father" id="N123456789"/>
  <person name="Josette Miller" gender="female"
          type="girl" id="N123456987"/>
</family>

Entities

General entities

Consider entities as abbreviations for some other content. An entity must be defined in the DTD and its contents are substituted when encountered in the XML file. Then, recall that XML initially only defines 5 entities and that HTML does many more...

  • Use the &lt; &amp; &gt; &aquot; &apos; entities to refer to <, &, >," and

Syntax of an internal entity definition: <!ENTITY entity_name "content">

Syntax of an external entity definition: <!ENTITY entity_name SYSTEM URI>

Syntax of using an entity: &entity_name;

Illustrations of entity definitions

DTD rule

XML example

Result

<!ENTITY jt "Joe Test">

<para> &jt; is here<para>
<para> Joe Test is here</para>

<!ENTITY space "&#160;">

<!ENTITY copyright "&#xA9;">

&copyright; D. Schneider

<!ENTITY explanation SYSTEM "project1a.xml">

<citation> &explanation; </citation>

<citation> contents of project1a.xml ... </citation>

Parameter entities

  • Most professional DTDs use parameter entities.
  • These are used to simplify DTD writing

<!ENTITY  % entity_name "content">
<!ENTITY  % entity_name SYSTEM "URI">

Example 12 DTD entities to define reusable child elements

  • More complex DTD often use same structures all over. Instead of typing these several times for each element definition, one can use an ENTITY construction like this:
<!ENTITY % Content "(Para | List | Listing)*">

Later in the DTD we then can have element definitions like this:

<!ELEMENT Intro (Title, %Content; ) >
<!ELEMENT Goal (Title, %Content; ) >

The XML parser will then simply translate these %Content; and we get:

<!ELEMENT Intro (Title, (Para | List | Listing)*) >
<!ELEMENT Goal (Title, (Para | List | Listing)* ) >

Example 13 DTD entities to define reusable attribute definitions

  • You may use the same procedure to define "bricks" for attribute definitions
  • Entity example that defines part of an attribute definition
<!ENTITY % stamp   '
  id ID #IMPLIED
  creation-day NMTOKEN #IMPLIED
  .......
  mod-by NMTOKEN #IMPLIED
  version NMTOKEN #IMPLIED
  status (draft|final|obsolete) #IMPLIED
  approval (ok|not-ok|so-so) #IMPLIED
  main-author CDATA #IMPLIED
' >

ATTLIST definitions below use %stamp;

<!ELEMENT main-goal (title, content, (after-thoughts)?, (teacher-comments)?)>
<!ATTLIST main %stamp; >
<!ELEMENT title (...)>
<!ATTLIST main %stamp; >

Some advice for designing DTDs

General advice

Don't forget elements and be liberal
  • Each element needs to be defined, but only once !
  • Only make elements mandatory if they really are wanted, else use e.g. element  ?
Plan the global structure
  • Before you start writing out DTDs, use some simple "language" to draft the structure, e.g. use a notation like:
name   ==>  family   given
family ==> "text"
  • In most cases, each "object" of your "information domain" becomes an element
  • Use child elements to model components
  • Use attributes to describe properties of components
Start from the root element and work your way down
  1. Root element
  2. Child elements of root element
  3. Child elements of the other elements, etc.

Attributes vs. Elements

  • There are some design rules that may help you decide whether using an element or an attribute
  • In case of doubt, always use elements ...
Rather use child elements inside an element to represent an information block
  • if order is important (attributes can't be ordered)
  • if you plan to use the same kind of information block with different parents
  • if a future version of DTD may specify sub-components of an information block
  • if the information block represents a "thing" (an object in OO programming)
  • if the DTD is text-centric, because an author must see contents she/he edits and attributes are often hidden away in XML editors; only use attributes to qualify properties like style !
Rather use attributes to represent an information block
  • if an attribute refers to an other element
    • <pet_of owner_name="lisa" pet_type="cat") would refer to <animal category="cat">
  • to declare usage/type/etc. of an element:
    <address usage="prof"> ... </address>
  • if you wish to list all possible values a user can enter
  • if you want to restrict data type of the attribute value (e.g. require a single word)