DTD tutorial
Introduction
This is a short tutorial about DTDs. It brievely shows how to read DTDs, then how to create these.
DTD grammars are a set of rules that define:
- a set of elements (tags) and their attributes that can be used to create an XML document;
- how elements can be embedded ;
- different sorts of entities (reusable fragments, special characters).
- DTDs can't define what the contents look like, i.e. character data (element contents) and most attribute values.
Specification of a markup language
- The most important part is usually the DTD, but in addition other constraints can be added !
- The DTD does not identify the root element !
- you have to tell the users what elements can be root elements
- Since DTDs can't express data constraints, write them out in a specification document
- e.g. "the value of length attribute is a string composed of a number plus "cm" or "inch" or "em"
example: <size length="10cm">
Example 1: A simple DTD
<!ELEMENT page (title, content, comment?)> <!ELEMENT title (#PCDATA)> <!ELEMENT content (#PCDATA)> <!ELEMENT comment (#PCDATA)>
- A DTD document contains just rules .... nothing else (see later for explanations)
Using a DTD with an XML document
A. Document type declarations
- A valid XML document includes a declaration that specifies the DTD used
- DTD is declared on top of the file after the XML declaration.
- XML declarations, DTD declaration etc. are part of the prologue
- So: The <!DOCTYPE...> declaration is part of the XML file, not the DTD ....
Example:
<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE hello SYSTEM "hello.dtd"> <hello>Here we <strong>go</strong> ... </hello>
4 ways of using a DTD
- No DTD (XML document will just be well-formed)
- DTD rules are defined inside the XML document
- We get a "standalone" document (the XML document is self-sufficient)
- "Private/System" DTDs, the DTD is located on the system (own computer or the Internet)
- ... that's what you are going to use when you write your own DTDs
<!DOCTYPE hello SYSTEM "hello.dtd">
- Public DTDs, we use a name for the DTD.
- means that both your XML editor and user software know the DTD
- strategy used for common Web DTDs like XHTML, SVG, MathML, etc.
B. Syntax of the DTD declaration in the XML document
- A DTD declaration starts with the keyword "DOCTYPE":
<!DOCTYPE .... >
- ... followed by the root element
- Remember that DTDs don't know their root element, root is defined in the XML document !
- Note: DTDs must define this root element just like any other element ! (you can have more than one)
<!DOCTYPE hello .... >
- ... followed by the DTD definition or a reference to a DTD file
Syntax for internal DTDs (only !)
- DTD rules are inserted between brackets [ ... ]
<!DOCTYPE hello [ <!ELEMENT hello (#PCDATA)> ]>
Syntax to define "private" external DTDs:
- DTD is identified by the URL after the " SYSTEM " keyword
<!DOCTYPE hello SYSTEM "hello.dtd" >
Syntax for public DTDs:
- after the " PUBLIC " keyword you have to specify an official name and a backup URL that a validator could use.
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd" >
C. Some examples of XML documents with DTD declarations:
Example 2: Hello XML without DTD
<?xml version="1.0" standalone="yes" ?> <hello> Hello XML et hello cher lecteur ! </hello>
Example 4-3: Hello XML with an internal DTD
<?xml version="1.0" standalone="yes" ?> <!DOCTYPE hello [ <!ELEMENT hello (#PCDATA)> ]> <hello> Hello XML et hello ch�re lectrice ! </hello>
Example 4: Hello XML with an external DTD
<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE hello SYSTEM "hello.dtd"> <hello> This is a very simple XML document </hello>
- That's what you should with your own home-made DTDs
Example 4-5: XML with a public external DTD (RSS 0.91)
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> ...... </channel> </rss>
4.2 Understanding DTDs by example
- Recall that DTDs define all the elements and attributes and the way they can be combined
Example 6: Hello text with XML
A simple XML document of type <page>
<page> <title>Hello friend</title> <content>Here is some content :)</content> <comment>Written by DKS/Tecfa, adapted from S.M./the Cocoon samples</comment> </page>
A DTD that would validate the document
Example 4-7: A recipe list in XML
- init/simple_recipe.xml
- Source: Introduction to XML by Jay Greenspan (now dead URL)
<?xml version="1.0"?> <!DOCTYPE list SYSTEM "simple_recipe.dtd"> <list> <recipe> <author>Carol Schmidt</author> <recipe_name>Chocolate Chip Bars</recipe_name> <meal>Dinner</meal> <ingredients> <item>2/3 C butter</item> <item>2 C brown sugar</item> <item>1 tsp vanilla</item> <item>1 3/4 C unsifted all-purpose flour</item> <item>1 1/2 tsp baking powder</item> <item>1/2 tsp salt</item> <item>3 eggs</item> <item>1/2 C chopped nuts</item> <item>2 cups (12-oz pkg.) semi-sweet choc. chips</item> </ingredients> <directions> Preheat oven to 350 degrees. Melt butter; combine with brown sugar and vanilla in large mixing bowl. Set aside to cool. Combine flour, baking powder, and salt; set aside. Add eggs to cooled sugar mixture; beat well. Stir in reserved dry ingredients, nuts, and chips. Spread in greased 13-by-9-inch pan. Bake for 25 to 30 minutes until golden brown; cool. Cut into squares. </directions> </recipe> </list>
Contents of the DTD
Example 4-8: A simple story grammar
Example 4-9: Lone family DTD
A valid XML file
<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE family SYSTEM "family.dtd"> <family> <person name="Joe Miller" gender="male" type="father" id="123.456.789"/> <person name="Josette Miller" gender="female" type="girl" id="123.456.987"/> </family>
Example 4-10: RSS
- complex/rss-0-92.dtd
- There are several RSS standards. RSS 0.91 is Netscape's original (still being used)
<!ELEMENT rss (channel)> <!ATTLIST rss version CDATA #REQUIRED> <!-- must be "0.91"> --> <!ELEMENT channel (title | description | link | language | item | rating? | image? | textinput? | copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | webMaster? | skipHours? | skipDays?)*>
<!ELEMENT title (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT link (#PCDATA)> <!ELEMENT image (title | url | link | width? | height? | description?)*> <!ELEMENT url (#PCDATA)> <!ELEMENT item (title | link | description)*> <!ELEMENT textinput (title | description | name | link)*> <!ELEMENT name (#PCDATA)> <!ELEMENT rating (#PCDATA)> <!ELEMENT language (#PCDATA)> <!ELEMENT width (#PCDATA)> <!ELEMENT height (#PCDATA)> <!ELEMENT copyright (#PCDATA)> <!ELEMENT pubDate (#PCDATA)> <!ELEMENT lastBuildDate (#PCDATA)> <!ELEMENT docs (#PCDATA)> <!ELEMENT managingEditor (#PCDATA)> <!ELEMENT webMaster (#PCDATA)> <!ELEMENT hour (#PCDATA)> <!ELEMENT day (#PCDATA)> <!ELEMENT skipHours (hour )> <!ELEMENT skipDays (day )>
Possible XML document for RSS
<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE rss SYSTEM "rss-0.91.dtd"> <rss version="0.91"> <channel> <title>Webster University</title> <description>Home Page of Webster University</description> <link>http://www.webster.edu</link> <item> <title>Webster Univ. Geneva</title> <description>Home page of Webster University Geneva</description> <link>http://www.webster.ch</link> </item> <item> <title>http://www.course.com/</title> <description>You can find Thomson text-books materials (exercise data) on this web site</description> <link>http://www.course.com/</link> </item> </channel> </rss>
Summary syntax of element definitions
- The purpose of this table is not to teach you how to write DTDs
- To understand how to use DTDs, you just need to know how to read a DTD
syntax element |
short explanation |
Example Element definitions Valid XML example |
---|---|---|
, |
|
<!ELEMENT Name (First, Middle, Last)>
<Name> <First>D.</First><Middle>K.</Middle><Last>S.</Last> </Name> |
? |
|
<!ELEMENT Name (First,Middle?,Last)> Middle is optional<Name><First>D.</First><Last>S.</Last></Name> |
|
<!ELEMENT list (movie ) <list><movie>Return of ...</movie> <movie>Comeback of ...</movie> </list> | |
* |
|
<!ELEMENT list (item*) almost as above, but list can be empty |
| |
|
<!ELEMENT major (economics | law) <major> <economics> </economics> </major> |
() |
|
<!ELEMENT text (para | list | title)* <text> <title>Story</title><para>Once upon a time</para> <title>The awakening</title> <list> ... </list> </text> |
Defining elements
5.1 Definition of elements
Rough syntax of a DTD rule to define elements:
<!ELEMENT tag_name child_element_specification>
Child_element_specification may contain:
- A combination of child elements according to combination rules
<!ELEMENT page (title, content, comment?)>
- Mixed contents, i.e. child elements plus #PCDATA or ANY
<!ELEMENT para (strong | #PCDATA )*>
- #PCDATA (Just data)
<!ELEMENT title (#PCDATA)>
- ANY (only used during development)
<!ELEMENT para (ANY)*>
- EMPTY (the element has no contents)
<!ELEMENT person EMPTY>
Tag names
- Each tag name must start with a letter or an underscore ('_')
followed by letters, numbers or the following characters: '_' , '-', '.', ':'
BAD example: <!ELEMENT 1st ...>
BAD example: <!ELEMENT My Home ...>
5.2 Combination rules
A and B = tags |
Explanation |
DTD example |
XML example |
---|---|---|---|
A , B |
A followed by B |
<!ELEMENT person ( name ,email? )> |
<person> <name>Joe</name> <email>x@x.x</email> </person> |
A? |
A is optional, (it can be present or absent) |
<!ELEMENT person (name, email? )> |
<person> <name>Joe</name></person> |
A |
At least one A |
<!ELEMENT person (name, email )> |
<person> <name>Joe</name> <email>x@x.x</email></person> <person> <name>Joe</name> <email>x@x.x</email> <email>x@y.x</email> </person> |
A* |
Zero, one or several A |
<!ELEMENT person (name, email* )> |
<person> <name>Joe</name> </person> |
A | B |
Either A or B |
<!ELEMENT person ( email | fax )> |
<person> <name>Joe</name> <email>x@x.x</email></person> <person> <name>Joe</name> <fax>123456789</fax></person> |
(A, B) |
Parenthesis will group and you can apply the above combination rules to the whole group |
<!ELEMENT list ( name, email ) > |
<list> <person> <name>Joe</name> <email>x@x.x</email></person> </list> |
5.3 Combination rules for elements
A and B = tags |
Explanation |
DTD example |
XML example |
---|---|---|---|
A , B |
A followed by B |
<!ELEMENT person ( name ,email? )> |
<person> <name>Joe</name> <email>x@x.x</email> </person> |
A? |
A is optional, (it can be present or absent) |
<!ELEMENT person (name, email? )> |
<person> <name>Joe</name></person> |
A |
At least one A |
<!ELEMENT person (name, email )> |
<person> <name>Joe</name> <email>x@x.x</email></person> <person> <name>Joe</name> <email>x@x.x</email> <email>x@y.x</email> </person> |
A* |
Zero, one or several A |
<!ELEMENT person (name, email* )> |
<person> <name>Joe</name> </person> |
A | B |
Either A or B |
<!ELEMENT person ( email | fax )> |
<person> <name>Joe</name> <email>x@x.x</email></person> <person> <name>Joe</name> <fax>123456789</fax></person> |
(A, B) |
Parenthesis will group and you can apply the above combination rules to the whole group |
<!ELEMENT list ( name, email ) > |
<list> <person> <name>Joe</name> <email>x@x.x</email></person> </list> |
Special contents
Special elements |
Explanation |
DTD examples |
XML example |
---|---|---|---|
#PCDATA |
"Parsed Character Data" Text contents of an element. It should not contain any <,>,& etc. |
<!ELEMENT email (#PCDATA)> |
<email>Daniel.Schneider@tecfa.unige.ch</email> |
ANY |
Allows any non-specified child elements and parsed character data (avoid this !!!) |
<!ELEMENT person ANY> |
<person> <c>text</c> <a>some <b>bbb</b> inside </a> </person> |
EMPTY |
No contents |
<!ELEMENT br EMTPY> |
<br/> |
Note about Mixed Content
- Mixed element contents contain both text and tags.
<para> here is <a href="xx">link</a>. <b>Check</b> it out </para>
- You have to use the "|" construct for these
- Good examples:
<!ELEMENT para (#PCDATA|a|ul|b|i|em)*>
<!ELEMENT p (#PCDATA | a | abbr | acronym | br | cite | code | dfn | em | img | kbd |
q | samp | span | strong | var )* >
<!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* >
- Bad examples:
<!ELEMENT p (name, first_name, #PCDATA)*>
<!ELEMENT p ( (#PCDATA) |a|ul|b|i|em)*>
Defining attributes
Rough syntax of Attribute rules:
<!ATTLIST element_name attr_name Attribute_type Type_Def Default >
Overview:
Type |
Attribute types |
---|---|
CDATA |
"Character Data" - Text data |
NMTOKEN |
A single word (no spaces or punctuations) |
ID |
Unique identifier of the element. |
IDREF |
Reference to an identifier. |
IDREFS |
Reference to one or more identifiers |
(A|B|C|..) |
List of values (from which the user must choose) |
Type Definition | |
---|---|
#IMPLIED |
Attribute is optional |
#REQUIRED |
Attribute is mandatory) |
#FIXED Value |
Attribute has a fixed value (user can't change it) |
Illustrations:
DTD rule |
example XML |
---|---|
<!ATTLIST person first_name CDATA #REQUIRED> |
<person first_name="Joe"> |
<!ATTLIST person gender (male|female) #IMPLIED> |
<person gender="male"> |
<!ATTLIST form method CDATA #FIXED "POST"> |
<form method="POST"> |
<!ATTLIST list type (bullets|ordered) "ordered"> |
<list type="bullets"> |
<!ATTLIST sibling type (brother|sister) #REQUIRED> |
<sibling type="brother"> |
<!ATTLIST person id ID #REQUIRED> |
<person id="N1004"> |
Shortcut to define multiple attributes for an element:
<!ATTLIST target_tag
attr1_nom TypeAttribut TypeDef Defaut
attr2_nom TypeAttribut TypeDef Defaut
...>
Shortcut illustrations:
<!ATTLIST person ident ID #REQUIRED gender male|female) #IMPLIED nom CDATA #REQUIRED prenom CDATA #REQUIRED relation brother|sister) #REQUIRED > <!ATTLIST portable owner IDREF #REQUIRED >
Example 5-1: Lone family DTD (file family.dtd)
A valid XML file
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE family SYSTEM "family.dtd">
<family>
<person name="Joe Miller" gender="male"
type="father" id="N123456789"/>
<person name="Josette Miller" gender="female"
type="girl" id="N123456987"/>
</family>
5.6 General entities
Consider entities as abbreviations for some other content. An entity must be defined in the DTD and its contents are substituted when encountered in the XML file.
- Recall that XML initially only defines 5 entities and that HTML does many more...
- Use the < & > &aquot; ' entities to refer to <, &, >," and '
Syntax of an internal entity definition: <!ENTITY entity_name "content">
Syntax of an external entity definition: <!ENTITY entity_name SYSTEM URI>
Syntax of using an entity: &entity_name;
Illustrations of entity definitions:
DTD rule |
XML example |
Result |
---|---|---|
<!ENTITY jt "Joe Test"> |
<para> &jt; is here<para> |
<para> Joe Test is here</para> |
<!ENTITY space " "> |
||
<!ENTITY copyright "©"> |
©right; D. Schneider |
|
<!ENTITY explanation SYSTEM "project1a.xml"> |
<citation> &explanation; </citation> |
<citation> contents of project1a.xml ... </citation> |
5.7 Parameter entities
- Most professional DTDs use parameter entities.
- These are used to simplify DTD writing
<!ENTITY % entity_name "content">
<!ENTITY % entity_name SYSTEM "URI">
Example 5-2: DTD entities to define reusable child elements
- More complex DTD often use same structures all over. Instead of typing these several times for each element definition, one can use an ENTITY construction like this:
Â
<!ENTITY % Content "(Para | List | Listing)*">
Â
Later in the DTD we then can have element definitions like this:
Â
<!ELEMENT Intro (Title, %Content; ) >
<!ELEMENT Goal (Title, %Content; ) >
Â
The XML parser will then simply translate these %Content; and we get:
Â
<!ELEMENT Intro (Title, (Para | List | Listing)*) >
<!ELEMENT Goal (Title, (Para | List | Listing)* ) >
Â
Example 5-3: DTD entities to define reusable attribute definitions
- You may use the same procedure to define "bricks" for attribute definitions
- Entity example that defines part of an attribute definition
<!ENTITY % stamp '
id ID #IMPLIED
creation-day NMTOKEN #IMPLIED
.......
mod-by NMTOKEN #IMPLIED
version NMTOKEN #IMPLIED
status (draft|final|obsolete) #IMPLIED
approval (ok|not-ok|so-so) #IMPLIED
main-author CDATA #IMPLIED
'
>
ATTLIST definitions below use %stamp;
<!ELEMENT main-goal (title, content, (after-thoughts)?, (teacher-comments)?)>
<!ATTLIST main %stamp; >
<!ELEMENT title (...)>
<!ATTLIST main %stamp; >
Â
5.8 Some advice for designing DTDs
Don't forget elements and be liberal
- Each element needs to be defined, but only once !
- Only make elements mandatory if they really are wanted, else use e.g. element ?
Plan the global structure
- Before you start writing out DTDs, use some simple "language" to draft the structure, e.g. use a notation like:
name ==> family given
family ==> "text"
- In most cases, each "object" of your "information domain" becomes an element
- Use child elements to model components
- Use attributes to describe properties of components
Start from the root element and work your way down:
- Root element
- Child elements of root element
- Child elements of the other elements, etc.