Editing XML tutorial: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
m (Created page with "{{under construction}} This is a beginners tutorial for XML editing made from slides ;Objectives * Understand the necessity of using an XML editor * Be able to edit XML withou...")
 
Line 1: Line 1:
{{web technology tutorial|beginner}}
{{under construction}}
{{under construction}}
== Introduction ==


This is a beginners tutorial for XML editing made from slides
This is a beginners tutorial for XML editing made from slides
Line 5: Line 8:
;Objectives
;Objectives


* Be able to read schemas and find other documentation
* Understand the necessity of using an XML editor
* Understand the necessity of using an XML editor
* Be able to edit XML without hand-editing tags, profit from DTD and Schema awareness
* Be able to edit XML without hand-editing tags, profit from DTD and Schema awareness
* Check well-formedness and validate
* Be able to check well-formedness and validate
* Be able to fix errors
* Be able to fix errors


Line 23: Line 27:
* [[XQuery tutorial - basics]] (if you have interest in XML databases)
* [[XQuery tutorial - basics]] (if you have interest in XML databases)
* [[PHP - MySQL - XML tutorial - basics]] (shows how to display an XML result-set retrieved form MySQL with XSLT)
* [[PHP - MySQL - XML tutorial - basics]] (shows how to display an XML result-set retrieved form MySQL with XSLT)
'''An XML document includes:'''
: Processing instructions (at least an XML declaration on top !)
: declarations (in particular a DTD)
: marked up contents (mandatory): elements
: marked up contents (optionally): attributes and entities
: comments: '''<!-- .... -->'''
'''XML documents are trees'''
: For a computer person, a XML document is a tree (“boxes within boxes”)
: ... and inside a browser (i.e. the DOM) the document is a tree-based data structure
: '''XML data as tree (CALS table example)'''
<TABLE> 
  <TBODY>
    <TR> <TD>Pierre Muller</TD> <TD>http://pm.com/</TD> </TR>
    <TR> <TD>Elisabeth Dupont</TD> <TD></TD> </TR>
  </TBODY>
</TABLE>
<center>[[Image:]]</center>
: '''“well-formed” and "valid XML documents"'''
: '''“Well-formed” XML documents'''
: A document must start with an XML declaration (including version number !)
<?xml version="1.0"?>
: You may specify encoding (default is utf-8) and you have to stick to an encoding !
<?xml version="1.0" encoding="ISO-8859-1"?>
: Structure must be hierarchical:
: start-tags and end-tags must match
: no cross-overs: <i>...<b>...</i> .... </b>
: case sensitivity, e.g. "LI" is not "li"
: "EMPTY" tags must use "self-closing" syntax:
: e.g. <br></br> should be written as <br/>, a lonely "<br>" would be illegal
: Attributes must have values and values are quoted:
: e.g. <a href='''''"'''''http://scholar.google.com'''''"'''''> or <person status="employed">
: e.g. <input type="radio" checked="checked">
: A single root element per document
: Root element opens and closes content
: The root element should not appear in any other element
: Special characters (!!) : <, &, >,", ’
: Use &lt; &amp; &gt; &aquot; &apos; instead of '''<, &, >,", ’'''
: Applies also to URLs !!
'''bad:'''  http://truc.unige.ch/programme?bla&machin
'''good:''' http://truc.unige.ch/programme?bla&amp;machin
: '''A minimal well-formed XML document'''
<?xml version="1.0" ?>
<page updated="jan 2007">
  <title>Hello friend</title>
  <content> Here is some content :) </content>
  <comment> Written by DKS/Tecfa </comment>
</page>
<hello> Hello <important>dear</important> reader ! </hello>
: It has an XML declaration on top
: It has a root element (i.e. '''page''')
: Elements are nested and tags are closed
: Attribute has quoted value
: '''XML names and CDATA Sections'''
: Names used for elements should start with a letter and only use letters, numbers, the underscore, the hyphen and the period (no other punctuation marks) !
: Good: <driversLicenceNo> <drivers_licence_no>
: Bad: <driver’s_licence_number> <driver’s_licence_#> <drivers licence number>
: When you want to display data that includes "XMLish" things that should not be interpreted you can use so called CDATA Sections:
<example>
  '''<!CDATA[ ''' (x < y) is an expression
    <svg xmls="http://www.w3.org/2000/svg">
''']]> '''</example>
: '''Valid XML documents'''
'''Un valid document must be:'''
: “well-formed” (see above)
: conform to a grammar, .e.g.
: only use tags defined by the grammar
: respect nesting, ordering and other constraints ....
'''Kinds of XML grammars'''
: '''DTD'''s are part of the XML standard
: '''XML Schema''' is a more recent W3C standard, used to express stronger constraints
: '''Relax NG''' is a OASIS standard (made by well known XML experts and who don’t like XML Schema ...)
: '''Schematron''' (yet another alternative)
Daniel Schneider likes Relax NG best (it’s relatively elegant and powerful)
: '''Name spaces'''
: It is possible to use several vocabularies within a document if the markup language says so:
: E.g. XHtml + SVG + MathML + XLink.
: In order to avoid naming conflicts (e.g. "title" does not means the same thing in XHTML and SVG), one can prefix element and attribute names with a name space.
: '''Declaring additional vocabularies'''
: The "'''xmlns:'''''name_space''" attribute allows to introduce a new vocabulary. It tells that all elements or attributes prefixed by "''name_space''" belong to a different vocabulary
:  xmlns:''name_space''="URL_name_of_name_space"
: '''SVG within XHTML'''
<html '''xmlns:svg'''="http://www.w3.org/2000/svg">
    <svg:rect x="50" y="50" rx="5" ry="5" width="200" height="100" ....
: '''xmlns:svg''' = "..." means that '''svg:''' prefixed elements are part of SVG
: '''Xlink'''
: XLink is a language to define links (only works with Firefox-based browsers)
<RECIT '''xmlns:xlink'''="http://www.w3.org/1999/xlink">
<INFOS>
  <Date>30 octobre 2003 - </Date><Auteur>DKS - </Auteur>
  <A '''xlink:href'''="http://jigsaw.w3.org/css-validator/check/referer"
      '''xlink:type'''="simple">CSS Validator</A>
  </INFOS>
: '''Declaring the main vocabulary '''
: The main vocabulary can be introduced by an attribute like:
:  xmlns="URL_name_of_name_space"
: Note: some specifications (e.g. SVG) require a name space declaration in any case (even if you do not use any other vocabulary) !
: '''SVG example'''
<svg xmlns="http://www.w3.org/2000/svg">
    <rect x="50" y="50" rx="5" ry="5" width="200" height="100" ....
: '''Namespace URLs'''
: URLs that define namespaces are '''just names''', there doesn’t need to be a real link
: E.g. for your own puporses you can very well make up something like:
<account xmls:pein = "http://joe.miller.com/pein">
  <pein:name>Joe</pein:name>
</account>
... and the URL http://joe.miller.com/pein doesn’t need to exist.
: '''DTDs (Document Type Definitions)'''
'''DTD grammars are just a set of rules that define:'''
: a set of elements (tags) and their attributes that can be used;
: how elements can be embedded;
: different sorts of entities (reusable fragments, special characters).
: DTDs can’t define what the character data (element contents) and most attribute values look like.
'''Specification of a markup language'''
: The most important part is usually the DTD, but in addition other constraints can be added !
: The DTD does not identify the root element !
: you have to tell the users what elements can be root elements
: Since DTDs can’t express data constraints, you may write out additional ones in a specification document
: e.g. "the value of length attribute is a string composed of a number one of "inch", "em"
<size length="10cm">
: '''A simple DTD'''
<!ELEMENT page  (title, content, comsment?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT content (#PCDATA)>
<!ELEMENT comment (#PCDATA)>
: A DTD document contains just definition of rules .... nothing else (see later for explanations)
: '''Using a DTD with an XML document'''
: '''Document type declarations'''
: A valid XML document includes a declaration that identifies the DTD
: So: The <!DOCTYPE...> declaration is part of the XML file, '''not''' the DTD ....
'''Example:'''
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE hello SYSTEM "hello.dtd">
'''4 ways of using a DTD'''
: No DTD (XML document will just be well-formed)
: DTD rules are defined inside the XML document
: We get a "standalone" document (the XML document is self-sufficient)
: "Private/System" DTDs, the DTD is located on the system (own computer or the Internet)
: ... that’s what '''you''' are going to use when you write your own DTDs
: Public DTDs, we use a name for the DTD.
: means that both your XML editor and user software know the DTD
: strategy used for common Web DTDs like XHTML, SVG, MathML, etc.
'''Place'''
: DTD is declared on top of the file after the XML declaration.
: XML declarations, DTD declaration etc. are part of the prologue
: '''Syntax of the DTD declaration in the XML document'''
: Start of a DTD declaration:
&nbsp;&nbsp;<!DOCTYPE ....&nbsp;&nbsp;>
: The root element must be specified first
: Remember that DTDs don’t know their root element, root is defined in the XML document !
: Note: DTDs must define this root element just like any other element ! (you can have more than one)
  <!DOCTYPE hello .... >
: Syntax for internal DTDs (only !)
: DTD rules are inserted between brackets [ ... ]
&nbsp;&nbsp;&nbsp;<!DOCTYPE hello '''['''
    &nbsp;&nbsp;&nbsp;&nbsp;<!ELEMENT hello (#PCDATA)>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;''']'''>
: Syntax to define "private" external DTDs:
: DTD is identified by the URL after the "'''SYSTEM'''" keyword
<!DOCTYPE hello SYSTEM "hello.dtd">
: Syntax for public DTDs:
: after the "'''PUBLIC'''" keyword you have to specify an official name and a backup URL that a validator could use.
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
  "http://my.netscape.com/publish/formats/rss-0.91.dtd">
'''Recall'''
: The DTD file itself does not contain any DTD declaration, just rules !!
: '''Some examples of XML documents with DTD declarations:'''
: '''Hello XML without DTD'''
<?xml version="1.0" '''standalone="yes"'''?>
<hello> Hello XML et hello cher lecteur ! </hello>
: '''Hello XML with an internal DTD'''
<?xml version="1.0" '''standalone="yes"'''?>
'''<!DOCTYPE hello ['''
    <!ELEMENT hello (#PCDATA)>
    ''']>'''
<hello> Hello XML et hello chère lectrice ! </hello>
: '''Hello XML with an external DTD'''
<?xml version="1.0" encoding="ISO-8859-1" ?>
'''<!DOCTYPE hello SYSTEM "hello.dtd">'''
<hello> Hello XèMèLè et hello cher lectrice ! </hello>
: That’s what you should with your own home-made DTDs
: '''XML with a public external DTD (RSS 0.91)'''
<?xml version="1.0" encoding="ISO-8859-1"?>
'''<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"'''
  '''"http://my.netscape.com/publish/formats/rss-0.91.dtd">'''
<rss version="0.91">
<channel> ...... </channel>
</rss>
: '''Understanding DTDs by example'''
: '''Hello text with XML'''
: [http://tecfa.unige.ch/guides/xml/examples/simple/ http://tecfa.unige.ch/guides/xml/examples/simple/]
'''A simple XML document of type <page>'''
<page>
  <title>Hello friend</title>
  <content>
      Here is some content :)
  </content>
  <comment>
      Written by DKS/Tecfa, adapted from S.M./the Cocoon samples
  </comment>
</page>
'''A DTD that would validate the document'''[[Image:]]
: '''A recipe list in XML'''
: Source: Introduction to XML by Jay Greenspan (now dead URL)
<?xml version="1.0"?>
'''<!DOCTYPE list SYSTEM "simple_recipe.dtd"> '''
<list>
  <recipe>
    <author>Carol Schmidt</author>
    <recipe_name>Chocolate Chip Bars</recipe_name>
    <meal>Dinner
      <course>Dessert</course>
    </meal>
    <ingredients>
      <item>2/3 C butter</item>      <item>2 C brown sugar</item>
      <item>1 tsp vanilla</item>    <item>1 3/4 C unsifted all-purpose flour</item>
      <item>1 1/2 tsp baking powder</item>
      <item>1/2 tsp salt</item>      <item>3 eggs</item>
      <item>1/2 C chopped nuts</item>
      <item>2 cups (12-oz pkg.) semi-sweet choc. chips</item>
    </ingredients>
    <directions>
Preheat oven to 350 degrees. Melt butter; combine with brown sugar and vanilla in large mixing bowl. Set aside to cool.  Combine flour, baking powder, and salt; set aside. Add eggs to cooled sugar mixture; beat well. Stir in reserved dry  ingredients, nuts, and chips.
Spread in greased 13-by-9-inch pan. Bake for 25 to 30 minutes until golden brown; cool.  Cut into squares.
    </directions>
  </recipe>
</list>
'''Contents of the DTD (simple_recipe.dtd)'''
<center>[[Image:]]</center>
: '''A simple story grammar'''
<center>[[Image:]]</center>
: '''Lone family DTD'''
'''family.dtd'''
<center>[[Image:]]</center>
'''A valid XML file'''
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE family SYSTEM "family.dtd">
<family>
  <person name="Joe Miller" gender="male"
          type="father" id="123.456.789"/>
  <person name="Josette Miller" gender="female"
          type="girl" id="123.456.987"/>
</family>
: '''RSS '''
: There are several RSS standards. RSS 0.91 is Netscape’s original (still being used)
<!ELEMENT rss (channel)>
<!ATTLIST rss version CDATA #REQUIRED> <!-- must be "0.91"> -->
<!ELEMENT channel (title | description | link | language | item+ | rating? | image? | textinput? | copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | webMaster? | skipHours? | skipDays?)*>
<!ELEMENT title (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT link (#PCDATA)>
<!ELEMENT image (title | url | link | width? | height? | description?)*>
<!ELEMENT url (#PCDATA)>
<!ELEMENT item (title | link | description)*>
<!ELEMENT textinput (title | description | name | link)*>
<!ELEMENT name (#PCDATA)>
<!ELEMENT rating (#PCDATA)>
<!ELEMENT language (#PCDATA)>
<!ELEMENT width (#PCDATA)>
<!ELEMENT height (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT pubDate (#PCDATA)>
<!ELEMENT lastBuildDate (#PCDATA)>
<!ELEMENT docs (#PCDATA)>
<!ELEMENT managingEditor (#PCDATA)>
<!ELEMENT webMaster (#PCDATA)>
<!ELEMENT hour (#PCDATA)>
<!ELEMENT day (#PCDATA)>
<!ELEMENT skipHours (hour+)>
<!ELEMENT skipDays (day+)>
'''Possible XML document for RSS'''
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE rss SYSTEM "rss-0.91.dtd">
<rss version="0.91">
  <channel>
    <title>Webster University</title>
    <description>Home Page of Webster University</description>
    <link>http://www.webster.edu</link>
    <item>
      <title>Webster Univ. Geneva</title>
      <description>Home page of Webster University Geneva</description>
      <link>http://www.webster.ch</link>
    </item>
    <item>
      <title>http://www.course.com/</title>
      <description>You can find Thomson text-books materials (exercise data) on this web site</description>
      <link>http://www.course.com/</link>
    </item>
  </channel>
</rss>
: '''Summary syntax of element definitions'''
: We will come back to this when we will learn how to write our own DTDs (don’t worry too much about unexplained details ....)
{| class="prettytable"
| <center>'''syntax'''</center>
<center>'''element'''</center>
| <center>'''short explanation'''</center>
| <center>'''Example'''</center>
|-
| <center>, </center>
| : order of elements
| <!ELEMENT Name (First, Middle, Last)>
|-
| <center>?</center>
| : optional element
| MiddleName?
|-
| <center>+</center>
| : at least one element
| movie+
|-
| <center>*</center>
| : zero or more elements
| item*
|-
| <center>|</center>
| : pick one (or operator)
| economics | law
|-
| <center>() </center>
| : grouping construct
| (A,B,C)
|}
:
: '''Entities'''
: Most professional DTDs use entities.
: Entities are just symbols that contain some information which substitutes when the symbol is used ...
: 2 kinds: XML entities and DTD entities
: '''DTD entities'''
: Some more complex DTD use the same structures all over. Instead of typing these several times one can use a ENTITY construction like this:
<!ENTITY % Content "(Para | List | Listing)*">
Later in the DTD we then can have Element definitions like this:
<!ELEMENT Intro (Title, %Content; ) >
<!ELEMENT Goal (Title, %Content; ) >
The computer will then simply translate these into:
<!ELEMENT Intro (Title, (Para | List | Listing)*) >
<!ELEMENT Goal (Title, (Para | List | Listing)* ) >
... think of these entities as shortcuts.
: '''Choosing and using an XML Editor'''
: There a lots of XML editors and there is no easy choice !
: Depending on your needs you may choose a different editor:
: To edit strongly structured data (i.e. data-centric XML) a sort of "tree" or "boxed" view is practical
: To edit text-centric data (e.g. an article) you either want a text-processor like tool are a structure editor.
: Really good XML editors cost a lot ...
'''Here is my own little comparison of XML editors:'''
: [http://edutechwiki.unige.ch/en/XML_editor http://edutechwiki.unige.ch/en/XML_editor] '''
: '''Minimal things your XML editor should be able to do'''
: Check for XML well-formedness
: Check for validity against several kinds of XML grammars (DTD, Relax NG, XML Schema)
: Highlight errors (of all sorts)
: Suggest available XML tags (in a given context). Also clearly show which ones are mandatory and which ones are optional, and display them in the right order.
: Allow the user to move/split/join elements in a more or less ergonomic way (although it is admitted that these operations need some training)
: Include support for XSLT and XQuery (However, if you have installation skills you can easily compensate lack of support by installing a processor like Saxon
: '''Additional criteria depending on the kind of XML:'''
'''For data-centric XML:'''
: Allow viewing and editing of XML documents in a tree view or boxed view (or both together)
: Provide a context-dependent choice of XML tags and attributes (DTD/XSD awareness)
'''For text-centric XML:'''
: Allow editing of XML documents in a structure view
: Allow editing of XML documents in somewhat WYSIWYG view. Such a view can be based on an associated CSS (most common solution) or XSLFO (I am dreaming here) or use some proprietary format (which is not very practical for casual users!). Also allow users to switch on/off tags or element boundary markers.
: Provide a context-dependent choice of XML tags and attributes (DTD/XSD awareness). The user should be able to right-click within the XML text and not in some distant tree representation.
: Automatically insert all mandatory sub-elements when an element is created.
: Automatically complete XML Tags when working without a DTD or other schema.
: Indent properly (and assist users to indent single lines as well as the whole document)
: '''Suggested free editors'''
: '''Exchanger XML Lite V3.2:'''
: [http://www.freexmleditor.com/ http://www.freexmleditor.com/]
: I suggest to try this editor first, try the other one if you are unhappy with it or if you plan to edit "data-centric" XML documents.
'''Hints for editing'''
: To insert an element or attribute:
: In the contents window press Ctrl-T to insert an element.
: Pressing "<" in the editing window gives more options and you can do it in any place.
: To insert an attribute, position the cursor after the element name and press the space bar
: Alternatively (and better if you don't know your DTD): Select the Helper pane to the left. Then (in the editing window) click on the element tag you wish to edit or put your cursor in a location between child elements. The helper pane will then display the structure of the current parent element and list available elements on which you can click to insert.
: '''XMLmind Standard Edition:'''
: [http://www.xmlmind.com/xmleditor/download.shtml http://www.xmlmind.com/xmleditor/download.shtml] '''
'''Hints for editing'''
: Element manipulation is trough the "tree view"
: After selecting an element
: you can insert elements either by selecting (tiny) before/after/within buttons in the top right elements pane
: or use shortcuts: (ctrl-h = insert before, ctrl-i = insert within, ctrl-j = insert after). Same principle for the attributes pane.
: '''Alternatives'''
: Firstly, any XML editor is difficult to learn (because XML editing is not so easy). So make an effort to learn the interface, e.g. read the help !
: Programmers also may consider using a programmer’s editor. However make sure:
: that there is an XML plugin
: that the editor is "DTD aware" (can show elements to insert in a given context)
: that it can validate.
... otherwise forget it !!
: '''About Java'''
: Most XML editors are written in Java an rely on the "Java RunTime engine".
: Both websites give you a choice: Download an editor with or without Java. If you don't have Java installed on your own PC, I suggest taking it '''first''' from:
[http://www.java.com/ http://www.java.com/] ... and always download the "no java vm" versions
: To test if you have java, open a command terminal and type "Java".
: To open a command terminal: Start Menu -> Execute and then type "cmd".
: '''Next steps'''
: '''Reading'''
: These slides may not be enough to understand, so please read:
Carey (pp. 1-21, pp. 28-39)
Optional: 1 or 2 case problems
: '''Next modules'''
'''Module 2'''
: Display XML data with CSS in a web browser
'''Module 3'''
: How to write a DTD
: '''Homework: mini-project 1'''
Due: Monday March 29 9:00h
: '''Task'''
'''Edit an XML document with the suggested DTDs below'''
: Respect the semantics of the elements and the attributes
: Validate your document
: Try to use as many different elements as you can (if appropriate)
: Follow additional directions for each suggested DTD
: Add comments in the DTD or the XML file
You can choose among the DTDs availabe at:
: '''http://tecfa.unige.ch/guides/xml/examples/dtd-examples/'''
: '''Some DTDs commented'''
{| class="prettytable"
| <center>'''DTD (difficulty)'''</center>
| <center>'''Purpose'''</center>
| <center>'''file name'''</center>
| <center>'''Additional directions'''</center>
|-
| Recipe DTD
(easy)
| Write simple recipes
| recipe.dtd
| Use all tags. Write at least one recipe. Make sure that there is enough information to really use it.
|-
| Recipe Markup Language
(medium)
| Write complex recipes
| recipeml.dtd
| As above, but only use appropriate tags. Hint: find the website of its creator
|-
| RSS 0.92
(medium)
| News syndication (usually machine generated)
| rss-0-92.dtd
| Use enough tags to display this in an aggregator. Enter at least 4 URLs. Hint: look at a RSS news feed first !
|-
| Simple Docbook
(hard)
| Write "real" articles
| sdocbook.dtd
| Do not use all tags, only the needed ones. Copy/paste from a text you already have.
|-
| StepbyStep
(medium)
| Write "how-to" instructions
| stepbystep03.dtd
| Make up a good "how-to problem". Only use tags you need..
|-
| Story grammar
(medium)
| Write simple fairy tales
| story-grammar.dtd
| Write a nice fairy tale. Doesn’t need to be your own.
|}
:
: '''Approximate evaluation grid'''
{| class="prettytable"
| <center>'''Minimal work required:'''</center>
| <center>'''Probable grade'''</center>
|-
| Wellformed (but not valid) document using the DTD’s elements
| <center>D</center>
|-
| Valid (minimalistic) document
| <center>C</center>
|-
| Valid document with an interesting content
| <center>B</center>
|-
| <center>'''Extra features:'''</center>
| <center>'''Probable bonus'''</center>
<center>'''(depends on quality)'''</center>
|-
| Inserted useful comments <!-- ... --> in the XML and/or the DTD
| <center>+</center>
|-
| You produce some interesting content
| <center>+ .. ++</center>
|-
| Respect of the DTDs semantics
| <center>+</center>
|-
| Write a 1-2 page report that discusses the architecture of the DTD and your opinion of it, e.g.
: describe architecture of the DTD (without going into detailed description of every element !)
: discuss what you would like to improve, what you liked/disliked, your difficulties, etc.
| <center>+ .. +++</center>
|}
Examples:
: To get a B+ / A-: Firstly produce a valid document, then (a) either write a nice report or (b) document XML/DTD code and produce a nicely filled-in XML document
: To get an A: do all of the above very well
: '''Submission format and procedure.'''
: Electronic copies: '''https://worldclassroom.webster.edu/'''Please make sure to name files according to the following rules:
{| class="prettytable"
| <center>'''File name'''</center>
| <center>'''Example'''</center>
| <center>'''when ?'''</center>
|-
| your_name.xml
| vasta.xml
| This is a mandatory file !
|-
| xxx_your_name.dtd
| vasta-sdocbook.dtd
| only if you add comments to the DTD
|-
| your_name.{doc|pdf|html}
| vasta.pdf
| only if you decide to write a report
|}
:


[[Category: XML]]
[[Category: XML]]

Revision as of 17:43, 26 October 2010

This article or section is currently under construction

In principle, someone is working on it and there should be a better version in a not so distant future.
If you want to modify this page, please discuss it with the person working on it (see the "history")

Introduction

This is a beginners tutorial for XML editing made from slides

Objectives
  • Be able to read schemas and find other documentation
  • Understand the necessity of using an XML editor
  • Be able to edit XML without hand-editing tags, profit from DTD and Schema awareness
  • Be able to check well-formedness and validate
  • Be able to fix errors
Prerequisites
  • Some idea what XML is about
  • XML namespaces (some, have a look at the XML namespace article)
  • HTML and CSS (some)
Next steps

An XML document includes:

Processing instructions (at least an XML declaration on top !)
declarations (in particular a DTD)
marked up contents (mandatory): elements
marked up contents (optionally): attributes and entities
comments: '

XML documents are trees

For a computer person, a XML document is a tree (“boxes within boxes”)
... and inside a browser (i.e. the DOM) the document is a tree-based data structure
XML data as tree (CALS table example)
<TBODY>
  </TBODY> 
Pierre Muller http://pm.com/
Elisabeth Dupont
[[Image:]]
“well-formed” and "valid XML documents"
“Well-formed” XML documents
A document must start with an XML declaration (including version number !)
<?xml version="1.0"?>
You may specify encoding (default is utf-8) and you have to stick to an encoding !
<?xml version="1.0" encoding="ISO-8859-1"?> 
Structure must be hierarchical:
start-tags and end-tags must match
no cross-overs: ...... ....
case sensitivity, e.g. "LI" is not "li"
"EMPTY" tags must use "self-closing" syntax:
e.g.

should be written as
, a lonely "
" would be illegal
Attributes must have values and values are quoted:
e.g. <a href="http://scholar.google.com"> or <person status="employed">
e.g. <input type="radio" checked="checked">
A single root element per document
Root element opens and closes content
The root element should not appear in any other element
Special characters (!!) : <, &, >,", ’
Use < & > &aquot; ' instead of <, &, >,", ’
Applies also to URLs !!
bad:  http://truc.unige.ch/programme?bla&machin
good: http://truc.unige.ch/programme?bla&machin
A minimal well-formed XML document
<?xml version="1.0" ?>
<page updated="jan 2007">
 <title>Hello friend</title>
 <content> Here is some content :) </content> 
 <comment> Written by DKS/Tecfa </comment>
</page>
<hello> Hello <important>dear</important> reader ! </hello>
It has an XML declaration on top
It has a root element (i.e. page)
Elements are nested and tags are closed
Attribute has quoted value
XML names and CDATA Sections
Names used for elements should start with a letter and only use letters, numbers, the underscore, the hyphen and the period (no other punctuation marks) !
Good: <driversLicenceNo> <drivers_licence_no>
Bad: <driver’s_licence_number> <driver’s_licence_#> <drivers licence number>
When you want to display data that includes "XMLish" things that should not be interpreted you can use so called CDATA Sections:
<example> 
 <!CDATA[  (x < y) is an expression
   <svg xmls="http://www.w3.org/2000/svg">
]]> </example>
Valid XML documents

Un valid document must be:

“well-formed” (see above)
conform to a grammar, .e.g.
only use tags defined by the grammar
respect nesting, ordering and other constraints ....

Kinds of XML grammars

DTDs are part of the XML standard
XML Schema is a more recent W3C standard, used to express stronger constraints
Relax NG is a OASIS standard (made by well known XML experts and who don’t like XML Schema ...)
Schematron (yet another alternative)

Daniel Schneider likes Relax NG best (it’s relatively elegant and powerful)

Name spaces
It is possible to use several vocabularies within a document if the markup language says so:
E.g. XHtml + SVG + MathML + XLink.
In order to avoid naming conflicts (e.g. "title" does not means the same thing in XHTML and SVG), one can prefix element and attribute names with a name space.
Declaring additional vocabularies
The "xmlns:name_space" attribute allows to introduce a new vocabulary. It tells that all elements or attributes prefixed by "name_space" belong to a different vocabulary
xmlns:name_space="URL_name_of_name_space"
SVG within XHTML
<html xmlns:svg="http://www.w3.org/2000/svg">
   <svg:rect x="50" y="50" rx="5" ry="5" width="200" height="100" ....
xmlns:svg = "..." means that svg: prefixed elements are part of SVG
Xlink
XLink is a language to define links (only works with Firefox-based browsers)
<RECIT xmlns:xlink="http://www.w3.org/1999/xlink">
<INFOS>
  <Date>30 octobre 2003 - </Date><Auteur>DKS - </Auteur>
  <A xlink:href="http://jigsaw.w3.org/css-validator/check/referer"
     xlink:type="simple">CSS Validator</A>
 </INFOS>
Declaring the main vocabulary
The main vocabulary can be introduced by an attribute like:
xmlns="URL_name_of_name_space"
Note: some specifications (e.g. SVG) require a name space declaration in any case (even if you do not use any other vocabulary) !
SVG example
<svg xmlns="http://www.w3.org/2000/svg">
   <rect x="50" y="50" rx="5" ry="5" width="200" height="100" ....
Namespace URLs
URLs that define namespaces are just names, there doesn’t need to be a real link
E.g. for your own puporses you can very well make up something like:
<account xmls:pein = "http://joe.miller.com/pein">
  <pein:name>Joe</pein:name>
</account>

... and the URL http://joe.miller.com/pein doesn’t need to exist.

DTDs (Document Type Definitions)

DTD grammars are just a set of rules that define:

a set of elements (tags) and their attributes that can be used;
how elements can be embedded;
different sorts of entities (reusable fragments, special characters).
DTDs can’t define what the character data (element contents) and most attribute values look like.

Specification of a markup language

The most important part is usually the DTD, but in addition other constraints can be added !
The DTD does not identify the root element !
you have to tell the users what elements can be root elements
Since DTDs can’t express data constraints, you may write out additional ones in a specification document
e.g. "the value of length attribute is a string composed of a number one of "inch", "em"
<size length="10cm">
A simple DTD
<!ELEMENT page  (title, content, comsment?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT content (#PCDATA)>
<!ELEMENT comment (#PCDATA)>
A DTD document contains just definition of rules .... nothing else (see later for explanations)
Using a DTD with an XML document
Document type declarations
A valid XML document includes a declaration that identifies the DTD
So: The <!DOCTYPE...> declaration is part of the XML file, not the DTD ....

Example:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE hello SYSTEM "hello.dtd">

4 ways of using a DTD

No DTD (XML document will just be well-formed)
DTD rules are defined inside the XML document
We get a "standalone" document (the XML document is self-sufficient)
"Private/System" DTDs, the DTD is located on the system (own computer or the Internet)
... that’s what you are going to use when you write your own DTDs
Public DTDs, we use a name for the DTD.
means that both your XML editor and user software know the DTD
strategy used for common Web DTDs like XHTML, SVG, MathML, etc.

Place

DTD is declared on top of the file after the XML declaration.
XML declarations, DTD declaration etc. are part of the prologue
Syntax of the DTD declaration in the XML document
Start of a DTD declaration:
  <!DOCTYPE ....  >
The root element must be specified first
Remember that DTDs don’t know their root element, root is defined in the XML document !
Note: DTDs must define this root element just like any other element ! (you can have more than one)
 <!DOCTYPE hello .... >
Syntax for internal DTDs (only !)
DTD rules are inserted between brackets [ ... ]
   <!DOCTYPE hello [
       <!ELEMENT hello (#PCDATA)>
                   ]>
Syntax to define "private" external DTDs:
DTD is identified by the URL after the "SYSTEM" keyword
<!DOCTYPE hello SYSTEM "hello.dtd">
Syntax for public DTDs:
after the "PUBLIC" keyword you have to specify an official name and a backup URL that a validator could use.
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
 "http://my.netscape.com/publish/formats/rss-0.91.dtd">

Recall

The DTD file itself does not contain any DTD declaration, just rules !!
Some examples of XML documents with DTD declarations:
Hello XML without DTD
<?xml version="1.0" standalone="yes"?>
<hello> Hello XML et hello cher lecteur ! </hello>
Hello XML with an internal DTD
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE hello [
   <!ELEMENT hello (#PCDATA)>
   ]>
<hello> Hello XML et hello chère lectrice ! </hello>
Hello XML with an external DTD
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE hello SYSTEM "hello.dtd">
<hello> Hello XèMèLè et hello cher lectrice ! </hello>
That’s what you should with your own home-made DTDs
XML with a public external DTD (RSS 0.91)
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
 "http://my.netscape.com/publish/formats/rss-0.91.dtd">
Extension:RSS -- Error:

"

<channel> ...... </channel> 
" is not in the list of allowed feeds. There are no allowed feed URLs in the list.
Understanding DTDs by example
Hello text with XML
http://tecfa.unige.ch/guides/xml/examples/simple/

A simple XML document of type <page>

<page>
 <title>Hello friend</title>
 <content>
      Here is some content :)
 </content> 
 <comment>
      Written by DKS/Tecfa, adapted from S.M./the Cocoon samples
 </comment>
</page>

A DTD that would validate the document[[Image:]]

A recipe list in XML
Source: Introduction to XML by Jay Greenspan (now dead URL)
<?xml version="1.0"?>
<!DOCTYPE list SYSTEM "simple_recipe.dtd"> 
<list>
 <recipe>
   <author>Carol Schmidt</author>
   <recipe_name>Chocolate Chip Bars</recipe_name>
   <meal>Dinner
     <course>Dessert</course>
   </meal>
   <ingredients>
     <item>2/3 C butter</item>      <item>2 C brown sugar</item>
     <item>1 tsp vanilla</item>     <item>1 3/4 C unsifted all-purpose flour</item>
     <item>1 1/2 tsp baking powder</item>
     <item>1/2 tsp salt</item>      <item>3 eggs</item>
     <item>1/2 C chopped nuts</item>
     <item>2 cups (12-oz pkg.) semi-sweet choc. chips</item>
   </ingredients>
   <directions>
Preheat oven to 350 degrees. Melt butter; combine with brown sugar and vanilla in large mixing bowl. Set aside to cool.  Combine flour, baking powder, and salt; set aside. Add eggs to cooled sugar mixture; beat well. Stir in reserved dry  ingredients, nuts, and chips.
Spread in greased 13-by-9-inch pan. Bake for 25 to 30 minutes until golden brown; cool.  Cut into squares.
   </directions>
 </recipe>
</list>

Contents of the DTD (simple_recipe.dtd)

[[Image:]]


A simple story grammar
[[Image:]]
Lone family DTD

family.dtd

[[Image:]]

A valid XML file

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE family SYSTEM "family.dtd">
<family>
  <person name="Joe Miller" gender="male" 
          type="father" id="123.456.789"/>
  <person name="Josette Miller" gender="female" 
          type="girl" id="123.456.987"/>
</family>
RSS
There are several RSS standards. RSS 0.91 is Netscape’s original (still being used)
<!ELEMENT rss (channel)>
<!ATTLIST rss version CDATA #REQUIRED> 
<!ELEMENT channel (title | description | link | language | item+ | rating? | image? | textinput? | copyright? | pubDate? | lastBuildDate? | docs? | managingEditor? | webMaster? | skipHours? | skipDays?)*>
<!ELEMENT title (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT link (#PCDATA)>
<!ELEMENT image (title | url | link | width? | height? | description?)*>
<!ELEMENT url (#PCDATA)>
<!ELEMENT item (title | link | description)*>
<!ELEMENT textinput (title | description | name | link)*>
<!ELEMENT name (#PCDATA)>
<!ELEMENT rating (#PCDATA)>
<!ELEMENT language (#PCDATA)>
<!ELEMENT width (#PCDATA)>
<!ELEMENT height (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT pubDate (#PCDATA)>
<!ELEMENT lastBuildDate (#PCDATA)>
<!ELEMENT docs (#PCDATA)>
<!ELEMENT managingEditor (#PCDATA)>
<!ELEMENT webMaster (#PCDATA)>
<!ELEMENT hour (#PCDATA)>
<!ELEMENT day (#PCDATA)>
<!ELEMENT skipHours (hour+)>
<!ELEMENT skipDays (day+)>

Possible XML document for RSS

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE rss SYSTEM "rss-0.91.dtd">
Extension:RSS -- Error:

"

  <channel>
    <title>Webster University</title>
    <description>Home Page of Webster University</description>
    <link>http://www.webster.edu</link>
    <item>
      <title>Webster Univ. Geneva</title>
      <description>Home page of Webster University Geneva</description>
      <link>http://www.webster.ch</link>
    </item>
    <item>
      <title>http://www.course.com/</title>
      <description>You can find Thomson text-books materials (exercise data) on this web site</description>
      <link>http://www.course.com/</link>
    </item>
  </channel>
" is not in the list of allowed feeds. There are no allowed feed URLs in the list.
Summary syntax of element definitions
We will come back to this when we will learn how to write our own DTDs (don’t worry too much about unexplained details ....)


syntax
element
short explanation
Example
,
: order of elements


<!ELEMENT Name (First, Middle, Last)>
?
: optional element


MiddleName?
+
: at least one element


movie+
*
: zero or more elements


item*
: pick one (or operator)


law
()
: grouping construct


(A,B,C)
Entities
Most professional DTDs use entities.
Entities are just symbols that contain some information which substitutes when the symbol is used ...
2 kinds: XML entities and DTD entities
DTD entities
Some more complex DTD use the same structures all over. Instead of typing these several times one can use a ENTITY construction like this:
<!ENTITY % Content "(Para | List | Listing)*">

Later in the DTD we then can have Element definitions like this:


<!ELEMENT Intro (Title, %Content; ) >
<!ELEMENT Goal (Title, %Content; ) >

The computer will then simply translate these into:


<!ELEMENT Intro (Title, (Para | List | Listing)*) >
<!ELEMENT Goal (Title, (Para | List | Listing)* ) >

... think of these entities as shortcuts.

Choosing and using an XML Editor
There a lots of XML editors and there is no easy choice !
Depending on your needs you may choose a different editor:
To edit strongly structured data (i.e. data-centric XML) a sort of "tree" or "boxed" view is practical
To edit text-centric data (e.g. an article) you either want a text-processor like tool are a structure editor.
Really good XML editors cost a lot ...

Here is my own little comparison of XML editors:

http://edutechwiki.unige.ch/en/XML_editor
Minimal things your XML editor should be able to do
Check for XML well-formedness
Check for validity against several kinds of XML grammars (DTD, Relax NG, XML Schema)
Highlight errors (of all sorts)
Suggest available XML tags (in a given context). Also clearly show which ones are mandatory and which ones are optional, and display them in the right order.
Allow the user to move/split/join elements in a more or less ergonomic way (although it is admitted that these operations need some training)
Include support for XSLT and XQuery (However, if you have installation skills you can easily compensate lack of support by installing a processor like Saxon
Additional criteria depending on the kind of XML:

For data-centric XML:

Allow viewing and editing of XML documents in a tree view or boxed view (or both together)
Provide a context-dependent choice of XML tags and attributes (DTD/XSD awareness)

For text-centric XML:

Allow editing of XML documents in a structure view
Allow editing of XML documents in somewhat WYSIWYG view. Such a view can be based on an associated CSS (most common solution) or XSLFO (I am dreaming here) or use some proprietary format (which is not very practical for casual users!). Also allow users to switch on/off tags or element boundary markers.
Provide a context-dependent choice of XML tags and attributes (DTD/XSD awareness). The user should be able to right-click within the XML text and not in some distant tree representation.
Automatically insert all mandatory sub-elements when an element is created.
Automatically complete XML Tags when working without a DTD or other schema.
Indent properly (and assist users to indent single lines as well as the whole document)
Suggested free editors
Exchanger XML Lite V3.2:
http://www.freexmleditor.com/
I suggest to try this editor first, try the other one if you are unhappy with it or if you plan to edit "data-centric" XML documents.

Hints for editing

To insert an element or attribute:
In the contents window press Ctrl-T to insert an element.
Pressing "<" in the editing window gives more options and you can do it in any place.
To insert an attribute, position the cursor after the element name and press the space bar
Alternatively (and better if you don't know your DTD): Select the Helper pane to the left. Then (in the editing window) click on the element tag you wish to edit or put your cursor in a location between child elements. The helper pane will then display the structure of the current parent element and list available elements on which you can click to insert.
XMLmind Standard Edition:
http://www.xmlmind.com/xmleditor/download.shtml

Hints for editing

Element manipulation is trough the "tree view"
After selecting an element
you can insert elements either by selecting (tiny) before/after/within buttons in the top right elements pane
or use shortcuts: (ctrl-h = insert before, ctrl-i = insert within, ctrl-j = insert after). Same principle for the attributes pane.
Alternatives
Firstly, any XML editor is difficult to learn (because XML editing is not so easy). So make an effort to learn the interface, e.g. read the help !
Programmers also may consider using a programmer’s editor. However make sure:
that there is an XML plugin
that the editor is "DTD aware" (can show elements to insert in a given context)
that it can validate.

... otherwise forget it !!

About Java
Most XML editors are written in Java an rely on the "Java RunTime engine".
Both websites give you a choice: Download an editor with or without Java. If you don't have Java installed on your own PC, I suggest taking it first from:

http://www.java.com/ ... and always download the "no java vm" versions

To test if you have java, open a command terminal and type "Java".
To open a command terminal: Start Menu -> Execute and then type "cmd".
Next steps
Reading
These slides may not be enough to understand, so please read:

Carey (pp. 1-21, pp. 28-39)

Optional: 1 or 2 case problems

Next modules

Module 2

Display XML data with CSS in a web browser

Module 3

How to write a DTD
Homework: mini-project 1

Due: Monday March 29 9:00h

Task

Edit an XML document with the suggested DTDs below

Respect the semantics of the elements and the attributes
Validate your document
Try to use as many different elements as you can (if appropriate)
Follow additional directions for each suggested DTD
Add comments in the DTD or the XML file

You can choose among the DTDs availabe at:

http://tecfa.unige.ch/guides/xml/examples/dtd-examples/
Some DTDs commented


DTD (difficulty)
Purpose
file name
Additional directions
Recipe DTD

(easy)

Write simple recipes recipe.dtd Use all tags. Write at least one recipe. Make sure that there is enough information to really use it.
Recipe Markup Language

(medium)

Write complex recipes recipeml.dtd As above, but only use appropriate tags. Hint: find the website of its creator
RSS 0.92

(medium)

News syndication (usually machine generated) rss-0-92.dtd Use enough tags to display this in an aggregator. Enter at least 4 URLs. Hint: look at a RSS news feed first !
Simple Docbook

(hard)

Write "real" articles sdocbook.dtd Do not use all tags, only the needed ones. Copy/paste from a text you already have.
StepbyStep

(medium)

Write "how-to" instructions stepbystep03.dtd Make up a good "how-to problem". Only use tags you need..
Story grammar

(medium)

Write simple fairy tales story-grammar.dtd Write a nice fairy tale. Doesn’t need to be your own.
Approximate evaluation grid


Minimal work required:
Probable grade
Wellformed (but not valid) document using the DTD’s elements
D
Valid (minimalistic) document
C
Valid document with an interesting content
B
Extra features:
Probable bonus
(depends on quality)
Inserted useful comments in the XML and/or the DTD
+
You produce some interesting content
+ .. ++
Respect of the DTDs semantics
+
Write a 1-2 page report that discusses the architecture of the DTD and your opinion of it, e.g.
describe architecture of the DTD (without going into detailed description of every element !)
discuss what you would like to improve, what you liked/disliked, your difficulties, etc.


+ .. +++

Examples:

To get a B+ / A-: Firstly produce a valid document, then (a) either write a nice report or (b) document XML/DTD code and produce a nicely filled-in XML document
To get an A: do all of the above very well
Submission format and procedure.
Electronic copies: https://worldclassroom.webster.edu/Please make sure to name files according to the following rules:


File name
Example
when ?
your_name.xml vasta.xml This is a mandatory file !
xxx_your_name.dtd vasta-sdocbook.dtd only if you add comments to the DTD
pdf|html} vasta.pdf only if you decide to write a report