XML Schema tutorial - Basics
<pageby nominor="false" comments="false"/>
The following text was made from slides and should be expanded some day - Daniel K. Schneider 23:10, 5 December 2010 (CET)
Introduction
This is a beginners tutorial for XML Schema (of called XSD in reference to the file name extension *.xsd) made from slides
- Objectives
- Understand the purpose of XSD
- Be able to cope with XSD editing
- Translat DTDs to XSD with a conversion tool
- Modify data types of a given XSD
- Write very simple XSD grammars
- Prerequisites
- Editing XML (being able to use a simple DTD). Catch up with the Editing XML tutorial
- Be somewhat familiar with DTD's (see the DTD tutorial)
- XML namespaces (some, have a look at the XML namespace article. At least you should know why the XSD prefix could be "xs" or "xsd" or "banana"....)
- HTML and CSS (some)
- Next steps
- ...
- Warning
XSD is a rather complex schema definition language. For one problem there always exist several good solutions.
These slides have been prepared with the help of
- The W3C XML Schema primer: http://www.w3.org/TR/xmlschema-0/
- Roger Costello s extensive XML Schema tutorial: http://www.xfront.com/
Introduction
We may distinguish between several kinds of XML grammars
- A grammar-based schema specifies what elements may be used in an XML document, the order of the elements, the number of occurrences of each element, and finally the content and datatype of each element and attribute.
- An assertion-based schema makes assertions about the relationships that must hold between the elements and attributes in an XML instance document.
Comparison between grammar-based schemas
Adoption | wide spread | Data-centric applications like web services | R&D mostly |
Complexity of structure | Medium | Powerful (e.g. sets, element occurrence constraints) | Powerful |
Data types | Little (10, mostly attribute values) | Powerful (44 + your own derived data types) | Powerful (same as XSD) |
Overall complexity | low | high | medium |
XML-based formalism | no | yes | yes
(also a short notation) |
Association with XML document | DOCTYPE declaration | Namespace declaration | No standard solution |
Browser support | IE (not Firefox) | no | no |
File suffix | *.dtd | *.xsd | *.rng / *.rnc |
Entities | yes | no (use xinclude instead) | no |
- XML Schemas were created to define more precise grammars than with DTDs, in particular one can define Data Types and more sophisticated element structures
- DTD supports 10 datatypes; XML Schemas supports 44+ datatypes
- Relax NG was a reaction by people who didn t like this new format. It is about as powerful as XSD but not as complicated
Resources
- XML Schema (also called XSD or simply Schema) is difficult
- A good way to learn XSD is to translate your own DTDs with a tool and then study the code
- See also chapter 3. From DTDs to XSDs [30]
W3C websites:
url: http://www.w3.org/XML/Schema (W3C Overview Page)
url: http://www.w3.org/TR/xmlschema-0/ The W3C XML Schema primer
Specifications:
url: http://www.w3.org/TR/xmlschema-1/ XML Schema Part 1: Structures Second Edition 2004
url: http://www.w3.org/TR/xmlschema-2/ XML Schema Part 2: Datatypes Second Edition 2004
Tools:
Exchanger XML Editor can handle XML Schema
- Support for XSD editing
- Validation of XSD file
- Validation of XML against XSD
- DTD/XSD/Relax NG translation
XSD bare bones
The structure and namespace of an XSD file
- As any XML file, an XSD file must start with an XML declaration
- Root of an XSD is <schema> ... </schema>
- Attributes of schema are used to declare certain things (see later)
- XSD makes use of namespaces since we have to make a distinction between code that belongs to XSD and code that refers to the defined elements and attributes (same principle as in XSLT).
- Complex XSD files refer to more than one "Schema" namespace (see later)
Namespaces and namespace prefixes
Since XSD is XML, one must be able to dinguish XSD elements from the language you are defining.
- You either can define a prefix for the XSD elements or one for your own XML elements. See solution 1 and 2 below
- You then can decide whether your XML elements are namespaced or not
Solution 1: Give a namespace prefix to the XSD code
- We define the xs: prefix for the XSD namespace
- Doesn't matter what prefix we use (usually xs: but often xsd:)
- elementFormDefault="qualified" means that your target XML files will not have namespaces
Example: XSD definition for a simple recipe (ignore the details, and just look at the namespace declaration and prefix)
<?xml version="1.0" encoding="UTF-8"?>
<!-- Simple recipe Schema -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="list">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="recipe"/>
</xs:sequence>
</xs:complexType>
</xs:element>
.....
</xs:schema>
Solution 2: Give a namespace to target code and prefix it
The following solution is less often used.
- We use a prefixed namespace for our XML elements
- The XML Schema namespace becomes default namespace, i.e. XSD elements will not be prefixed as shown in the next example.
Example: XSD definition for a simple recipe
<schema
xmlns='http://www.w3.org/2000/10/XMLSchema'
targetNamespace='http://yourdomain.org/namespace/'
xmlns:t='http://yourdomain.org/namespace/'
<element name='list'>
<complexType>
<sequence>
<element ref='t:recipe' maxOccurs='unbounded'/>
</sequence>
</complexType>
</element>
Association of and XSD with an XML file - validation
An XML document described by a XSD is called an instance document.
Also, as with DTDs, in order to validate an XML file you do not need to create an association in the XML file. As with DTDs one can "manually" validate an XML against an XSD and most XML editors will allow you to do so.
- In XML Exchanger, simply click the validate icon, then select the XSD file when asked....
Association of XSD with XML, Solution 1
- You must declare the XMLSchema-instance namespace. It's a little extra XML language that allows to link XSDs to XML files.
- The xsi:noNamespaceSchemaLocation attribute defines the URL of your XSD
- Warning: Make sure you get its spelling and case right !!!
Example:
- XML: recipe-no-ns.xml
<?xml version="1.0" encoding="ISO-8859-1" ?>
<list
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="recipe-no-ns.xsd">
<recipe> ....
</list>
XSD file: recipe-no-ns.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="list">
Association of XSD with XML, Solution 2
This solution is more popular for various reasons (e.g. most XML languages require a namespace declaration anyhow).
1.Both the XML and the XSD file must contain a namespace declaration for your domain
2. The XML file must contain in addition:
- a namespace declaration for XMLSchema-instance
- a xsi:schemaLocation attribute that tells where to find the XSDs. This attribute can have as many "namespace-URL" pairs as you like
Example: XML for a simple recipe with an associated XSD
<?xml version="1.0" encoding="ISO-8859-1" ?>
<list
xmlns="http://myrecipes.org/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://myrecipes.org/ recipe.xsd"
>
<recipe>
<meta> .....</meta>
......
</recipe>
</list>
If you wish to reuse this code fragment for your own XML: You must make two changes in the code above, i.e. define
- A namespace for your own tags, e.g.
- xmlns="http://your_domain/something/"
- Tell where to find the XSD file, e.g.
- xsi:schemaLocation="http://yourdomain/something/ some-schema.xsd"
Example XSD file:
<?xml version="1.0" encoding="UTF-8"?>
<!-- Simple recipe Schema -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://myrecipes.org/"
xmlns="http://myrecipes.org/"
elementFormDefault="qualified">
....
</xs:schema>
- This XSD defines a default namespace (no prefixes) for your tags
- Again, in your XML, you should substitute http://myrecipes.org/ by an URL of your own, preferably an URL over which you have control, e.g. a blog or a home page.
Example: IMS Content Packaging 1.1.4 and IMS/LOM Metadata
This XML file uses two XML vocabularies: imscp and imsmd
<manifest
xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"
xmlns:imsmd="http://www.imsglobal.org/xsd/imsmd_v1p2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
identifier="MANIFEST-1"
xsi:schemaLocation=
"http://www.imsglobal.org/xsd/imscp_v1p imscp_v1p1.xsd
http://www.imsglobal.org/xsd/imsmd_v1p2 imsmd_v1p2p2.xsd">
<metadata>
<imsmd:lom> ...... </imsmd:lom>
</metadata>
<organizations default="learning_sequence_1">
.....
- imscp_v1p1 is the default namespace (no prefix)
- imsmd_v1p1 is the namespace for metadata.
Extract of ims_v1p1.xsd
<xsd:schema
xmlns = "http://www.imsglobal.org/xsd/imscp_v1p1"
targetNamespace = "http://www.imsglobal.org/xsd/imscp_v1p1"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd = "http://www.w3.org/2001/XMLSchema"
version = "IMS CP 1.1.4" elementFormDefault = "qualified">
Element definitions
Recall that XML structure has to do with nesting elements, so we firstly need to learn how to define elements.
Elements are defined with xs:element,
<xs:element>
Example of a simple element without children and attributes:
<xs:element name="author" type="xs:string"/>
It's DTD equivalent would be:
<!ELEMENT author (#PCDATA)>
Element children can be defined in two ways:(1) with a complexType child element or (2) with a type attribute. Let's examine both ways:
The Russan puppet model: <xs:complexType> (1)
complexType can be a child element of xs:element and we will define children and/or attributes inside this element.
<xs:element name="recipe">
<xs:complexType>
<xs:sequence>
<xs:element ref="meta"/>
<xs:element minOccurs="0" ref="recipe_author"/>
<xs:element ref="recipe_name"/>
<xs:element ref="ingredients"/>
<xs:element ref="directions"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The Salami model: <xs:complexType> (2)
Alternatively, one can declare a complex type by itself and then "use it" in an element declaration.
Example XSD: recipe2.xsd
- Referring to a complex type:
<xs:element name="recipe" type="recipe_contents" />
- Defining the complex type:
<xs:complexType name="recipe_contents">
<xs:sequence>
<xs:element ref="meta"/>
<xs:element minOccurs="0" ref="recipe_author"/>
<xs:element ref="recipe_name"/>
<xs:element ref="meal"/>
<xs:element ref="ingredients"/>
<xs:element ref="directions"/>
</xs:sequence>
</xs:complexType>
Data types
Simple data types allow to define what kind of data elements and attributes can contain
Examples:
string | Confirm this is electric | A text string |
base64Binary | GpM7 | Base86 encoded binary data |
hexBinary | 0FB7 | HEX encoded binary data |
integer | ...-1, 0, 1, ... | |
positiveInteger | 1, 2, ... | |
negativeInteger | ... -2, -1 | |
nonNegativeInteger | 0, 1, 2, ... | |
long | -9223372036854775808, ... -1, 0, 1, ... 9223372036854775807 | |
decimal | -1.23, 0, 123.4, 1000.00 | |
float | -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN | |
boolean | true, false, 1, 0 | |
duration | P1Y2M3DT10H30M12.3S | 1 year, 2 months, 3 days, 10 hours, 30 minutes, and 12.3 seconds |
dataTime | 1999-05-31T13:20:00.000-05:00 | May 31st 1999 at 1.20pm Eastern Standard Time |
date | 1999-05-31 | |
time | 13:20:00.000, 13:20:00.000-05:00 | |
gYear | 1999 | |
Name | shipTo | XML 1.0 Name type |
QName | po:USAddress | XML Namespace QName |
anyURI | http://www.example.com/ | |
language | en-GB, en-US, fr | valid values for xml:lang as defined in XML 1.0 |
In addition one can define list types, union types and complex types
Simple user-defined types
Example: A list of numbers
XSD:
<xsd:element name="listOfMyInt" type="listOfMyIntType"/>
<xsd:simpleType name="listOfMyIntType">
<xsd:list itemType="xsd:integer"/>
</xsd:simpleType>
XML:
<listOfMyInt>20003 15037 95977 95945</listOfMyInt>
Example: Restricted lists of words to choose from (in two variants)
XSD:
<!-- A modular solution -->
<xsd:element name="theory" type="list_theories"/>
<xsd:simpleType name="list_theories">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="constructivism"/>
<xsd:enumeration value="behaviorism"/>
<xsd:enumeration value="cognitivism"/>
</xsd:restriction>
</xsd:simpleType>
<!-- A russian puppet solution -->
<xsd:element name="Country">
<xsd:simpleType>
<xs:restriction base="xsd:string">
<xsd:enumeration value="FR" />
<xsd:enumeration value="DE" />
<xsd:enumeration value="ES" />
<xsd:enumeration value="UK" />
<xsd:enumeration value="CH" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
XML:
<theory>constructivism</theory>
<country>CH</country>
Example: Restrictions of a single number
XSD (using russin puppet style):
<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="120"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
XML:
<age>100</age>
Organization of elements
- XSD allows for quite sophisticated occurrence constraints, i.e. how child elements can be used within an element. Here we only cover a few basic design patterns
Salami vs. russian puppet style
As already mentioned, it is usually best to define all elements in a flat list and then refer to these when you define how child elements are to be inserted
Defining elements within elements (not so good)
<xs:element name="meta">
<xs:complexType>
<xs:sequence>
<xs:element name="author" type="xs:string"/>
<xs:element name="version" type="xs:string"/>
<xs:element name="date" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Defining child elements with a reference (generally a better solution)
<xs:element name="meta">
<xs:complexType>
<xs:sequence>
<xs:element ref="author"/>
.....
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="author" type="xs:string"/>
....
Sequences
- Number of times a child element can occur is defined with minOccurs and maxOccurs attributes.
Example: A list of ordered child elements
<xs:element name="meta">
<xs:complexType>
<xs:sequence>
<xs:element ref="author"/>
<xs:element ref="date"/>
<xs:element ref="version"/>
</xs:sequence>
</xs:complexType>
<xs:element name="version" type="xs:string"/>
<xs:element name="date" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
Example: A list with one more recipe child elements
<xs:element name="list">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="recipe"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Example: A list of ordered child elements
<xs:element name="meta">
<xs:complexType>
<xs:sequence>
<xs:element ref="author"/>
<xs:element ref="date"/>
<xs:element ref="version"/>
</xs:sequence>
</xs:complexType>
Example: A list with an optional email element - repeatable
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element ref="name"/>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="email"/>
<xs:element ref="link"/>
</xs:sequence>
<xs:attributeGroup ref="attlist.person"/>
</xs:complexType>
</xs:element>
Choice
Example: Optional repeatable child elements
XSD:
<xs:element name="INFOS">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="date"/>
<xs:element ref="author"/>
<xs:element ref="a"/>
</xs:choice>
</xs:complexType>
</xs:element>
Example: Either - or child elements
XSD:
<xs:element name="ATTEMPT">
<xs:complexType>
<xs:choice>
<xs:element ref="action"/>
<xs:element ref="EPISODE"/>
</xs:choice>
</xs:complexType>
</xs:element>
Mixed contents (tags and text)
XSD:
<xs:element name="para">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="strong"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="strong" type="xs:string"/>
XML:
<para> XML is <strong>so</strong> cool ! </para>
Empty elements
- Simply define an element and do not define any child elements
<xs:element name="author" type="xs:string"/>
Of course this also applies to complex elements:
Attributes
To declare attributes, you must define complexTypes, since simplement elements cannot have attributes.
We will not cover all possibilities here, but just demonstrate with examples
A typical attribute definition looks like this:
<xs:element name="Name">
<xs:complexType>
<xs:attribute name="lang" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
The use parameter: can be either optional, prohibited or required. The default is "optional"
The above code is actually a short hand notation for:
<xs:element name="Name">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="lang" type="xs:string" use="required"/>
</xs:extension
</xs:simpleContent>
</xs:complexType>
</xs:element>
XML example
<Name lang="English"/>
Attribute groups
- More complex attributes are better declared with attribute groups
- Attribute groups are reusable, i.e. the equivalent of DTD s parameter entities.
Example: Attribute groups
url: family.xsd
<xs:element name="person">
<xs:complexType>
<xs:attributeGroup ref="attlist.person"/>
</xs:complexType>
</xs:element>
The element definition above refers to a named attribute group (defined below)
<xs:attributeGroup name="attlist.person">
<xs:attribute name="name" use="required"/>
<xs:attribute name="gender">
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value="male"/>
<xs:enumeration value="female"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="type" default="mother">
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value="mother"/>
<xs:enumeration value="father"/>
<xs:enumeration value="boy"/>
<xs:enumeration value="girl"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="id" use="required" type="xs:ID"/>
</xs:attributeGroup>
Valid XML fragment:
url: family.xml
<family>
<person name="Joe Miller" gender="male" type="father" id="I123456789"/>
<person name="Josette Miller" type="girl" id="I123456987"/>
</family>
Value constraints
- One can put restraints on values that a user can enter in several ways
Example: Restrict values for an age element
<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="120"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
A quite powerful method is to use regular expressions with the pattern element (I'll have to test this ...)
<xs:element name="Email">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b"/>
</xsd:restriction>
</xs:element>
Some Design patterns
(needs to be expanded over the years .... - Daniel K. Schneider 18:02, 9 December 2010 (CET))
In the meantime see also:
- Introducing Design Patterns in XML Schemas
- Global versus Local Recommended reading !
Mixed contents with typed elements inside
<?xml version="1.0" encoding="UTF-8"?>
<!-- A mixed type -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://mymix.org/"
targetNamespace="http://mymix.org/" elementFormDefault="qualified">
<xs:element name="list">
<xs:complexType>
<xs:sequence>
<xs:element ref="TextAndNumbers" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="TextAndNumbers" type="TextNumberMix"/>
<xs:complexType name="TextNumberMix">
<xs:complexContent mixed="true">
<xs:restriction base="xs:anyType">
<xs:sequence>
<xs:element name="number1" type="xs:integer"/>
<xs:element name="number2" type="xs:integer"/>
<xs:element name="number3" type="xs:integer"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:schema>
XML file
<?xml version="1.0"?>
<list xmlns="http://mymix.org/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://mymix.org/ mixed-text-with-numbers.xsd">
<TextAndNumbers>
I am <number1>44</number1> years old and I like <number2>4</number2> times the number <number3>11</number3>
</TextAndNumbers>
<TextAndNumbers>
He is <number1>10</number1> meters tall.
And he weights <number2>1000</number2> kilos.
You can earn <number3>10</number3> cents if you figure out who he is.
</TextAndNumbers>
</list>
From DTDs to XSDs
Below we shall present a few typical translation patterns
Most decent XML editors have a built-in translator that will do most of the work. Wowever, generated XSD code is not necessarily the most pretty ...
- e.g. in Exchanger XML Editor: Use Menu Schema -> Convert Schema
Encoding elements
In the examples below we use a namespace prefix for the XML and none for the Schema. Therefore an *.xsd file would look like this:
<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/XMLSchema file:/usr/local/xngr/types/XML%20Schema/Validation/XMLSchema.xsd"
xmlns:t="http://testing.org/"
targetNamespace="http://testing.org/" >
<element name="ROOT">
<complexType>
<sequence>
<element ref="t:A"/>
<element ref="t:B"/>
</sequence>
</complexType>
</element>
<element name="A" type="string"/>
<element name="B" type="string"/>
</schema>
Below we show some DTD to XSD examples taken from http://www.w3.org/2000/04/schema_hack/
<!ELEMENT ROOT (A,B) >
|
<element name="ROOT">
<complexType>
<sequence>
<element ref="t:A"/>
<element ref="t:B"/>
</sequence>
</complexType>
</element>
|
<!ELEMENT ROOT (A|B) >
|
<element name="ROOT">
<complexType>
<choice>
<element ref="t:A"/>
<element ref="t:B"/>
</choice>
</complexType>
</element>
|
<!ELEMENT ROOT (A|(B,C)) >
|
<element name="ROOT">
<complexType>
<choice>
<element ref="t:A"/>
<sequence>
<element ref="t:B"/>
<element ref="t:C"/>
</sequence>
</choice>
</complexType>
</element>
|
<!ELEMENT ROOT (A?,B+,C*) >
|
<element name="ROOT">
<complexType>
<sequence>
<element ref="t:A" minOccurs="0"/>
<element ref="t:B" maxOccurs="unbounded"/>
<element ref="t:C" minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
<element>
|
3.2Attribute definitions
<!ATTLIST ROOT a CDATA #REQUIRED>
|
<element name="ROOT">
<complexType content="elementOnly">
<attribute name="a" type="string" use="required"/>
</complexType>
</element>
|
<!ATTLIST ROOT a CDATA #IMPLIED>
|
<element name="ROOT">
<complexType content="elementOnly">
<attribute name="a" type="string" use="optional"/>
</complexType>
</element>
|
<!ATTLIST ROOT a (x|y|z)#REQUIRED;>
|
<element name="ROOT">
<complexType content="elementOnly">
<attribute name="a">
<simpleType base="string">
<enumeration value="x"/>
<enumeration value="y"/>
<enumeration value="z"/>
</simpleType>
</attribute>
</complexType>
</element>
|
<!ATTLIST ROOT a CDATA #FIXED "x">
|
<element name="ROOT">
<complexType content="elementOnly">
<attribute name="a" type="string"
use="fixed" value="x"/>
</complexType>
</element>
|
Links
- XML Schema Part 0: Primer Second Edition (W3C)
- XML Schemas: Best Practices (last updated 2006, when retrieved 18:58, 9 December 2010 (CET))
- Global versus Local (A Collectively Developed Set of Schema Design Guidelines)
- XML Schema Tutorial (W3CSchools)