XML: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
 
(69 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{Stub}}
{{Incomplete}}
{{Comment | This page is juste a beginning. Several sub-topics will be factored out to other pages.}}
{{web technology tutorial|overview}}


== Definition ==
== Introduction ==


* '''XML''' means "Extended markup language". It allows to define all sorts of languages that describe information contents (e.g. web pages, vector graphics, programming languages). In technical terms, such languages are called ''XML applications'' or ''XML vocabularies''.
This article provides a short and rather non-technical overview of XML. See also the [[:category:XML|XML category]] for all XML-related topics (many) or follow up links in this overview.


* XML is designed as a machine readable self describing text editable persistent store for data. XML is a formalism or a '''meta-language''' (not to be confounded with HTML, a language to describe the structure of Web pages)
<div class="tut_goals">


== History ==
'''Learning goals'''


* XML is a subset of SGML (Standardized Generalized Markup Language). SGML has been used to define HTML whereas [[XHTML]] is defined with XML (This is why empty tags are not allowed anymore in XHTML).
* Understand the role of XML in IT
* Be able to identify major roles and XML languages made for the Web
 
'''Prerequisites'''
 
* none
 
'''Next steps'''
 
* [[Tour de XML]] or equivalent (having seen some real world applications would be good for motivation)
* [[XML principles]]
* [[Editing XML tutorial]]
* [[DTD tutorial]]
* [[XML Schema tutorial - Basics]]
 
</div>
 
=== Definition of XML ===
 
* '''XML''' means "Extended markup language". XML is a formalism that allows to define all sorts of languages that describe a wide range of "information contents" (e.g. web pages, vector graphics, programming languages). In technical terms, such languages are called ''XML applications'' or ''XML vocabularies''.
 
* XML is designed as a machine readable self describing text editable persistent store for data, but it can be read (somewhat) by humans. XML is a formalism or a '''meta-language'''. Such a metalanguage is not to be confounded with HTML, a language to describe the structure of Web pages. [[XHTML]], for example, is one out of the thousands existing XML applications.
 
See also: [[Editing XML tutorial]]
 
=== History ===
 
* XML is a subset of the '''Standardized Generalized Markup Language''' (SGML). SGML has been used to define HTML, whereas [[XHTML]] is defined with XML (This is why empty tags are not allowed anymore in XHTML). [[HTML5]] on the other hand, is neither based on SMGL nor on XML.  


* XML was formally defined in 1998 as [http://www.w3.org/TR/xml/ W3C's XML Recommendation 1.0]
* XML was formally defined in 1998 as [http://www.w3.org/TR/xml/ W3C's XML Recommendation 1.0]


* Since then, hundreds of XML languages have been defined and few dozens are popular and in production. Ken Sall's famous [http://kensall.com/ Big Picture] only lists some, e.g. he misses out all the [[IMS]] e-learning standards.
* Since then, hundreds of XML languages have been defined and a few dozen are popular and in production. Ken Sall's famous [http://kensall.com/ Big Picture] only lists some, e.g. none of the many [[IMS]] e-learning standards are mentioned.


== The XML planet ==
== XML and web standards ==


One may look at XML from different angles.
Currently, there are hundreds of more or less popular XML languages. Within the more narrow area of web standards there are less and we shall shortly introduce the most important ones that non-programmers like content developers or web designers should know about


=== XML for better Web contents ===
=== XML for richer Web contents ===


Look at this nice picture drawn by [[User:DSchneider|DSchneider]] (needs translation and updating). It shows how future web documents will be composed:
Initially, XML was thought to redefine the way contents are delivered. After that, it turned out that  XHTML was (almost) never used as XML, e.g. in the form HTML combined with other XML contents. This "XML" vision of HTML still exists in the mind of some people, but the death of XHTML 2 put a provisional end to this. The current mainstream, represented by [[HTML5]], is a computer application-centered model, i.e. HTML is seen as a delivery platform for interactive contents and not as a document format.


[[Image: xml-document-french.png|frame|none|HTML vs. XML Web contents]]
The picture below shows the idea that web documents could be composed of several components: In the case of HTML, there is HTML + CSS, in the case of HTML5 there is HTML + built-in SVG and MATHML + plus CSS. In the case of XHTML 1 or XHTML 5 a document can include any other XML language, provided that these are identified by so-called namespaces. Although it is not longer popular, we also included SGML in the picture, since it is the "mother" of all tag-based markup languages.


=== XML as the foundation for the future semantic Web ===
[[Image: xml-document.png|thumb|724px|none|HTML vs. XML Web contents ... made many years ago ([[User:DSchneider|DKS]])]]


* Essentially the [[RDF]] framework.
Just to make sure: The death of XHTML does by no means mean that XML is not being used on the Internet. It's just dead as web page format. Other formats like SVG (vector graphics), MathML (mathematical formula), RSS (content syndication) are very much in use today and will be so in the future.
 
=== XML as the foundation for the future semantic Web===
 
The semantic web is essentially defined by the [[RDF]] framework. While RDF itself is used in some areas (e.g. [[Metadata]] formalisms), the global [[semantic web]] project seems to be somewhat stalled, except for occasional flares. Web 2.0 was supposed to be semantic but web 2.0 became all the opposite, i.e. it is based on simple micro-formats. Then it became web 3. Then the anti-semantic HTML 5 initiative became dominant and the "semantic web" remains a "smaller island" of interest and applications.
* Topic Maps (ISO standard) used to organize collections of resources in the form of semantic network (so you do not find just the trees, but there is a "map" of the forest.
* RDF is a language used for describing relationships between objects and it can be used for adding "metadata" describing the content of a resource.
 
* OWL ("Web Ontology Language", created with RDF), is a formalism that allows the description of the relationships between things. There is a conceptual link with Syndication of news and the social web
 
* During the last 15 years internet has been the subject of a profound change regarding the organization of its "information spaces".


=== XML for machine to machine talk ===
=== XML for machine to machine talk ===


* Web Services
The exist several protocols for machine-to-machine interaction like SOAP and XML-RPC. See the [[web service]] article for more details.


=== XML as formalism to define other information structures ===
In addition we can identify:
* Specialized search engines that extract contents from various XML documents;
* Formats like [[RSS]] or [[FOAF]] that are meant to help organizing networked information spaces, like content syndication;
** [http://edutechwiki.unige.ch/fr/RSS RSS] (in its various forms) allows automatic exchange of "titles" and "summaries" between portals and weblogs.
** FOAF (and other formats) are used to define profiles of individuals which are then used to organize online social networks.


=== XML as formalism to define "grammars" ===


In a more general perspective, XML is currently one of the most popular standards to define various kinds of data structures. One could define three kinds:
# XML accessories (e.g. XML Schema)
#* Extend the capabilities specified in XML
#* Intended for wide, general use
# XML transducers (e.g. [[XSLT]])
#* Convert XML input data into output
#* Associated with a processing model
#XML applications in the narrow sense (e.g XHTML)
#* Define grammars, constraints for a class of XML data
#* Intended for a specific application area as diverse as for example word processors, e-learning, banking, multimedia, translation. Well know examples are Microsoft Office contents (e.g. .docx files) or Adobe Flash *.fla files. These files are in fact zip files composed of a series of XML documents. Unzip on of these and you can see.


== Some technical XML concepts ==
== Some technical XML concepts ==


An XML document can refer to a physical file, a database entry, a datastream (any appropriate "text" that is delimited).
An XML document can refer to a physical file, a database entry, a datastream. In other words, technically speaking an XML document is any sort of delimited "text" defined as a string and that has XML markup inside.


=== Wellformedness ===
=== Wellformedness ===


An XML document is well formed if and only if
An XML document is well formed if and only if:


; There is an appropriate XML declaration at the beginning
; There is an appropriate XML declaration at the beginning
* The document starts with an XML declaration that includes a version number (currently 1.0).
* The document starts with an XML declaration that includes a version number (currently 1.0).
<?xml version="1.0"?>
<source lang="XML"><?xml version="1.0"?></source>
:This declaration can also contain encoding information. By default encoding isUTF-8):
:This declaration can also contain [[encoding]] information. By default encoding isUTF-8):
<?xml version="1.0" encoding="ISO-8859-1"?>
<source lang="XML"> <?xml version="1.0" encoding="UTF-8"?></source>


;XML documents are hierarchical
;XML documents are hierarchical, i.e. each element must be inside an other one (except the first one, the so-called root tag).
* begin-tags and end-tags that match
* begin-tags and end-tags that match
* No tags crossing like
* No tags crossing like
<nowiki> <i>...<b>...</i> .... </b> </nowiki>
<source lang="XML"><i>...<b>...</i> .... </b></source>
* There must be single root
* There must be single root
** It can only appear once and can not be used within other elements
** It can only appear once and can not be used within other elements
Line 62: Line 114:
* XML is case sensitive, "LI" is '''not''' "li" for example
* XML is case sensitive, "LI" is '''not''' "li" for example
* "Empty" tags must be self closing, e.g.
* "Empty" tags must be self closing, e.g.
</br>
<source lang="XML"><br /></source>
* Attribute values are quoted
* Attribute values are quoted
<a href= " http://tecfa.unige.ch:8080/xml.html " >)
<source lang="XML"> <a href="http://tecfa.unige.ch:8080/xml.html"></source>
* Special caracters: <, &, >,", '
* Special caracters: <, &, >," and '. Use instead (and that include URLs !):
** Use &lt; &amp; &gt; &aquot; &apos; instead of <, &, >,", '
<source lang="XML">&amp; &amp; &amp; &aquot; &apos;</source>
** Including URLs !!


=== Valid ===
=== Valid ===
Line 74: Line 125:


The most popular ones are in this order:
The most popular ones are in this order:
* DTD
* [[DTD]]s (Document Type Definitions)
* [[XML Schema]]
* [[XML Schema]]
* [[Relax NG]]
* [[Relax NG]]


XML applications in addition to DTDs may include other constraints. Some XML applications may include languages that are not XML-based (e.g. CSS or XPath).
XML applications in addition to DTDs may include other constraints. Some XML applications may include languages that are not XML-based (e.g. CSS or XPath).
The most popular grammars are DTDs. Below we just include a picture of a little grammar (read the details in [[DTD tutorial]]
[[Image:xml-simple-dtd.png|thumb|750px|none|Simple DTD]]


=== Text-centric vs. data-centric XML ===
=== Text-centric vs. data-centric XML ===
Line 84: Line 139:
Data-centric XML as opposed to the text-centric XML refers to XML whose primary audience is not a human reader, but a computer program which will process the information, respond to it, store data items in a database, and so on.
Data-centric XML as opposed to the text-centric XML refers to XML whose primary audience is not a human reader, but a computer program which will process the information, respond to it, store data items in a database, and so on.


== XML Applications ==
== A list of XML applications ==
 
{{comment | Again, this is just a stub, a lot is missing .... }}


=== W3C applications ===
See also [[Tour de XML]], a selection of links demonstrating various uses of XML.


This is just a popular subset ....
=== Accessories ===


* XSL/FO (application XML): XML Style language
Extend the power of XML
** http://www.w3.org/TR/xsl/
 
* XSLT (application XML): XML Transformation language
** http://www.w3.org/TR/xslt


* [[XLink]]: Hypertext links
* [[XLink]]: Hypertext links
Line 104: Line 153:
** http://www.w3.org/TR/xptr/
** http://www.w3.org/TR/xptr/


* XPath (XML fragments identification)
* [[XPath]] (XML fragments identification)
** http://www.w3.org/TR/xpath
** http://www.w3.org/TR/xpath
** (used by XSLT, XInclude, XLink, XQuery, XPointer etc.)
** (used by XSLT, XInclude, XLink, XQuery, XPointer etc.)
** See the [[XPath tutorial - basics]]
=== Transducers ===
Various style sheet and query languages
* XSL/FO (application XML): XML Style language
** http://www.w3.org/TR/xsl/
* [[XSLT]](application XML): XML Transformation language
** http://www.w3.org/TR/xslt
** See the [[XSLT Tutorial - Basics]]
* [[XQuery]]: XML Query Language
** http://www.w3.org/TR/xquery/
** See the [[XQuery tutorial - basics]]
=== Metadata and semantic web ===


* RDF applications (a whole lot of languages for the semantic web)
* RDF applications (a whole lot of languages for the semantic web)
Line 113: Line 180:
* PICS 2.0: Platform for Internet Content Selection
* PICS 2.0: Platform for Internet Content Selection
** http://www.w3.org/PICS/
** http://www.w3.org/PICS/
* SMIL: Synchronized Multimedia Integration Language
** http://www.w3.org/AudioVideo/


* P3P: Platform for Privacy Preferences
* P3P: Platform for Privacy Preferences
** http://www.w3.org/P3P/
** http://www.w3.org/P3P/


* SVG: Scalable Vector Graphics
=== Graphics and multimedia ===
 
* [[SMIL]]: Synchronized Multimedia Integration Language
** http://www.w3.org/AudioVideo/
 
* [[SVG]]: Scalable Vector Graphics
** http://www.w3.org/TR/SVG/
** http://www.w3.org/TR/SVG/


* MathML: Mathematical Markup Language
* [[MathML]]: Mathematical Markup Language
** http://www.w3.org/Math/
** http://www.w3.org/Math/


* XHMTL (several variants for new generation HTML)
=== Contents ===
 
* [[XHTML|XHTML:]] this now less popular variant of HTML is an XML application
** http://www.w3.org/TR/html/
** http://www.w3.org/TR/html/


=== Non W3C Document standards ===
* [http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=docbook Docbook]: the Most popular standard for writing large documents.
 
* [[DITA]]. A more flexible module-based approach to documents, originally made by IBM.
 
* [[ePub]], a popular open [[e-book]] standard.
 
These document standards (as well others) can intervene all stages of the document production/delivery pipeline. XML in the documentation world appears as:
=== Application development ===
 
* [[MXML]] is an XML-based user interface markup language used in [[Adobe Flex]], a software development kit to create Flash-based Internet and desk-top applications.
* [[XML User Interface Language]] (XUL) is the XML user interface markup language developed by the Mozilla project. It operates in cross-platform applications such as Firefox.
 
==XML and documentation languages ==
 
Any XML document can directly be put on the web together with a CSS stylesheet or an XSLT transformation. Specialized formats like SVG (vector graphics), X3D (3d vector graphics), MathML (formulas) can be added to XML-compatible browsers. Larger documents are often produces with specialized vocabularies such as DITA or DocBook. Contents can be written either with an [[XML editor]] or an XML-aware word processor. Such documents can then be either directly "saved as" or sent through various cusom output filters.
[[File:_xml-document-pipeline.png|none|thumb|700x700px| XML document production pipelines ]]
 
=== Presenting XML documents with style sheets===
 
Today one can directly display information encoded in XML (of any grammar) in a browser, by using a style-sheet.
'''The style-sheets allow to:'''
* Prepare / arrange content for a "presentation".
* Define the “layout” (format, formatting) of a text written in Xml.
* Alter contents, e.g. by adding a table of contents or page headers
 
'''The utility of style-sheets is therefore'''
* Separation of content and presentation
* Simplification of work (one style-sheet for many documents, single source publishing, etc.)


* [http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=docbook Docbook]. Most popular standard for writing large documents.
==== XSL - The Extensible Stylesheet Language====


* [http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita DITA]. A more flexible module-based approach to documents, originally made by IBM.
XSL refers to two languages recommended by W3C
* http://www.w3.org/TR/xsl


=== In education ===
''' XSL either refers to XSLT or XSL/FO and they provide two principal functions:'''
 
(1) XSLT is a transformation language for Xml elements.
 
For example: XSLT allow for the creation of table of contents or the translation XML to HTML
 
* http://www.w3.org/TR/xslt
 
(2) XSL/FO is formatting language that allow to create high quality print documents
 
* http://www.w3.org/TR/xsl/
 
'''Formatting with XSL-FO'''
* sophisticated formatting, also by inheritance, position etc..
* generation of text and graphics
* possibility to define macros
* everything that can be found in [http://edutechwiki.unige.ch/fr/CSS CSS], and more…
 
====[http://edutechwiki.unige.ch/fr/CSS CSS](Cascading Style Sheets)====
 
CSS also can be used for style XML contents. However since its transformation capabilities are rather poor, the XML already should include all the data to be published.
 
* XML support is included since CSS2.
 
 
===XLink - Towards a better hypertext?===
 
Xlink allows inserting a link in XML document, where a link expresses a relationship between two or more objects. XLink remains a proposal, there’s no complete implementation for the moment. However, subsets of Xlink are used in various other XML languages.
 
* a subset is used in SVG, X3D etc..
* http://www.w3.org/TR/xlink
 
'''XLink is based on other standards (and that are also shared with XSLT)'''
* XPointer = how to identify an XML fragment (used by links). XPointer is based on Xpath
* XPath = how to identify a path towards a resource
:* http://www.w3.org/TR/xptr
:* http://www.w3.org/TR/xpath
 
'''Principal characteristics of XLink'''
* Multi-directional links, links towards multiple destinations
* Occlusions, inclusions, content replacements in a document
 
'''Where does this standard come from?'''
* HTML
:* Anchors: href (attribute A), src (attribute of IMG and NOTE) ...
:* Targets: name attributes (A), id (attribute in HTML 4.x)
* mainly: SGML HyTime and TEI Extended Pointers (extension to HyTime) languages.
 
== XML in education ==
 
On has to make a distinction between languages specifically developped for the education section (see below) and all the rest of XML technology, most of which can be useful to education


* See the various [[Educational modeling language]]s, in particular the ones produced by [[IMS]]
* See the various [[Educational modeling language]]s, in particular the ones produced by [[IMS]]
Line 143: Line 292:
** [[Learning Object Metadata Standard]]
** [[Learning Object Metadata Standard]]


* In addition, there are languages to sell education, to exchange student data, curricula data etc.
* In addition, there are languages for selling education, to exchange student data, curricula data etc.


== XML Software ==
== XML Software ==
Line 164: Line 313:


* [http://xmlsoft.org/xmllint.html xmllint], a command line tool which is distributed as part of the [http://xmlsoft.org/ libxml2 C parser] developed for the Gnome project. This means that it ships with most Linux installations, but there also distributions for Windows and other OSs.
* [http://xmlsoft.org/xmllint.html xmllint], a command line tool which is distributed as part of the [http://xmlsoft.org/ libxml2 C parser] developed for the Gnome project. This means that it ships with most Linux installations, but there also distributions for Windows and other OSs.
* [http://www.isogen.com/downloads/cool_tools/xml_tester.jsp xmlTester.jar]. This tools is based on the Xerxes parser.
* You can find several tools on [http://sourceforge.net/directory/freshness:recently-updated/?q=xml%20validation source forge]
* [http://xmlnanny.com/ XML Nanny]. XML Nanny is a Free Mac OS X developer tool that provides an Aqua interface for checking XHTML and XML documents for Well-Formedness and Validity either locally or across the network. (Tiger OS X 10.4) [sept 2005]


; On-line validation
; On-line validation


Note: You may need to change DTD's local system identifier. These programs must be able to get the DTD. I rather suggest installing a local program on your machine (like xmllint or xmlTester).
Note: You may need to change [[DTD]]'s or Schema's local system identifier. These programs must be able to get the DTD. I rather suggest installing a local program on your machine (like xmllint or xmlTester).


* [http://www.stg.brown.edu/service/xmlvalid/ STG XML Validation Form], curtosy of Scholarly Technology Group, Brown University
* [http://www.stg.brown.edu/service/xmlvalid/ STG XML Validation Form], curtosy of Scholarly Technology Group, Brown University
Line 180: Line 328:


* [http://feeds.archive.org/validator/ FEED Validator]. Validates various RSS Formats plus [http://www.intertwingly.net/wiki/pie/ PIE]
* [http://feeds.archive.org/validator/ FEED Validator]. Validates various RSS Formats plus [http://www.intertwingly.net/wiki/pie/ PIE]
=== Online tools ===
Some websites offer functionality to perform simple xml tasks like formatting, diffing, transforming, validating, querying XML.
<table border="1">
<tr>
<td>'''Website'''</td>
<td>'''Features'''</td>
</tr>
<tr>
<td>http://www.shancarter.com/data_converter/</td>
<td>Conversion from Excel and csv to XML</td>
</tr>
<tr>
<td>http://www.shell-tools.net/index.php?op=xml_format</td>
<td>Format and validation (dtd and xsd)</td>
</tr>
<tr>
<td>http://tools.decisionsoft.com/xmldiff.html</td>
<td>Diff (compare XML files)</td>
</tr>
<tr>
<td>http://tools.decisionsoft.com/schemaValidate/</td>
<td>Validation (XSD)</td>
</tr>
<tr>
<td>http://chris.photobooks.com/xml/</td>
<td>Format, transformation (XSLT) and query (Xpath)
</td>
</tr>
<tr>
<td>http://www.xmltools.dk/</td>
<td>Query (Xpath)</td>
</tr>
<tr>
<td>http://xslt.online-toolz.com/tools/xslt-transformation.php</td>
<td>Format, transformation (XSLT) and Validation (XSD)</td>
</tr>
<tr>
<td>http://www.w3schools.com/xsl/tryxslt.asp?xmlfile=cdcatalog&xsltfile=cdcatalog</td>
<td>Transformation (XSLT)</td>
</tr>
<tr>
<td>http://www.qutoric.com/xslt/analyser/xpathtool.html</td>
<td>Query (Xpath)</td>
</tr>
</table>


== Links ==
== Links ==
=== Overviews ===
* [http://www.w3.org/XML/1999/XML-in-10-points XML in 10 points] (W3C, 2001)
* [http://users.jyu.fi/~airi/xmlfamily.html XML Family of Languages], Overview and Classification of W3C Specifications, by Airi Salminen, 07 September 2012


=== Tutorials ===
=== Tutorials ===
(this sections needs to be expanded some day)


* See [[:Category: XML|XML]]-related entries in this wiki. There are some tutorials.
* [http://www.scriptorium.com/structure.pdf whitepaper on structured authoring], by Sarah O’Keefe.
* [http://en.wikibooks.org/wiki/XML XML:Managing Data Exchange]. Thi Wikibook project introduced XML from a Data exchange perspective.
* [http://en.wikibooks.org/wiki/XML XML:Managing Data Exchange]. Thi Wikibook project introduced XML from a Data exchange perspective.


Line 191: Line 395:
* [http://www.ibiblio.org/xml/ Cafe con Leche XML News and Resources] (Best resource to keep in touch with XML-related news)
* [http://www.ibiblio.org/xml/ Cafe con Leche XML News and Resources] (Best resource to keep in touch with XML-related news)


* [http://tecfa.unige.ch/guides/xml/pointers.html TECFA's XML Page] ([[User:DSchneider|DSchneider]]'s "old" XML pointers page).
=== Lists of XML applications (schemas) ===
 
* [http://schemas.liquid-technologies.com/ XML Standards Library] by Liquid technologies. Good list, indexed by category. Added 15:54, 28 July 2010 (UTC).
* [http://en.wikipedia.org/wiki/List_of_XML_markup_languages List of XML markup languages] Good list, but as of 17:11, 9 June 2010 (UTC) by no means complete, e.g. most e-learning [[standard]]s are missing.
* [http://www.oasis-open.org/ OASIS] (an XML standardization organization) publishes [http://xml.coverpages.org/ Coverpages] (newsletter and XML resource). It includes [http://xml.coverpages.org/xmlApplications.html XML Applications and Initiatives] (an ugly unsorted but interesting list).
* [http://xml.gov/ XML.gov] is a US government web site that coordinates XML initiatives. Helps to find various standards they are interested in, e.g. a [http://xml.gov/efforts.asp list of efforts].


== References ==
== References ==
Line 197: Line 406:
* Elliotte Rusty Harold, (2004). XML in a Nutshell, O'Reilly, [http://www.ibiblio.org/xml/books/xian3/ Abstract/TOC] ISBN 0-596-00764-7 (Best buy according to [[User:DSchneider|DSchneider]]).
* Elliotte Rusty Harold, (2004). XML in a Nutshell, O'Reilly, [http://www.ibiblio.org/xml/books/xian3/ Abstract/TOC] ISBN 0-596-00764-7 (Best buy according to [[User:DSchneider|DSchneider]]).


[[Category: Technologies]]
 
[[Category: XML]]
[[Category: XML]][[Category:web standards]]
[[fr:XML]]

Latest revision as of 12:36, 14 January 2020

Introduction

This article provides a short and rather non-technical overview of XML. See also the XML category for all XML-related topics (many) or follow up links in this overview.

Learning goals

  • Understand the role of XML in IT
  • Be able to identify major roles and XML languages made for the Web

Prerequisites

  • none

Next steps

Definition of XML

  • XML means "Extended markup language". XML is a formalism that allows to define all sorts of languages that describe a wide range of "information contents" (e.g. web pages, vector graphics, programming languages). In technical terms, such languages are called XML applications or XML vocabularies.
  • XML is designed as a machine readable self describing text editable persistent store for data, but it can be read (somewhat) by humans. XML is a formalism or a meta-language. Such a metalanguage is not to be confounded with HTML, a language to describe the structure of Web pages. XHTML, for example, is one out of the thousands existing XML applications.

See also: Editing XML tutorial

History

  • XML is a subset of the Standardized Generalized Markup Language (SGML). SGML has been used to define HTML, whereas XHTML is defined with XML (This is why empty tags are not allowed anymore in XHTML). HTML5 on the other hand, is neither based on SMGL nor on XML.
  • Since then, hundreds of XML languages have been defined and a few dozen are popular and in production. Ken Sall's famous Big Picture only lists some, e.g. none of the many IMS e-learning standards are mentioned.

XML and web standards

Currently, there are hundreds of more or less popular XML languages. Within the more narrow area of web standards there are less and we shall shortly introduce the most important ones that non-programmers like content developers or web designers should know about

XML for richer Web contents

Initially, XML was thought to redefine the way contents are delivered. After that, it turned out that XHTML was (almost) never used as XML, e.g. in the form HTML combined with other XML contents. This "XML" vision of HTML still exists in the mind of some people, but the death of XHTML 2 put a provisional end to this. The current mainstream, represented by HTML5, is a computer application-centered model, i.e. HTML is seen as a delivery platform for interactive contents and not as a document format.

The picture below shows the idea that web documents could be composed of several components: In the case of HTML, there is HTML + CSS, in the case of HTML5 there is HTML + built-in SVG and MATHML + plus CSS. In the case of XHTML 1 or XHTML 5 a document can include any other XML language, provided that these are identified by so-called namespaces. Although it is not longer popular, we also included SGML in the picture, since it is the "mother" of all tag-based markup languages.

HTML vs. XML Web contents ... made many years ago (DKS)

Just to make sure: The death of XHTML does by no means mean that XML is not being used on the Internet. It's just dead as web page format. Other formats like SVG (vector graphics), MathML (mathematical formula), RSS (content syndication) are very much in use today and will be so in the future.

XML as the foundation for the future semantic Web

The semantic web is essentially defined by the RDF framework. While RDF itself is used in some areas (e.g. Metadata formalisms), the global semantic web project seems to be somewhat stalled, except for occasional flares. Web 2.0 was supposed to be semantic but web 2.0 became all the opposite, i.e. it is based on simple micro-formats. Then it became web 3. Then the anti-semantic HTML 5 initiative became dominant and the "semantic web" remains a "smaller island" of interest and applications.

  • Topic Maps (ISO standard) used to organize collections of resources in the form of semantic network (so you do not find just the trees, but there is a "map" of the forest.
  • RDF is a language used for describing relationships between objects and it can be used for adding "metadata" describing the content of a resource.
  • OWL ("Web Ontology Language", created with RDF), is a formalism that allows the description of the relationships between things. There is a conceptual link with Syndication of news and the social web
  • During the last 15 years internet has been the subject of a profound change regarding the organization of its "information spaces".

XML for machine to machine talk

The exist several protocols for machine-to-machine interaction like SOAP and XML-RPC. See the web service article for more details.

In addition we can identify:

  • Specialized search engines that extract contents from various XML documents;
  • Formats like RSS or FOAF that are meant to help organizing networked information spaces, like content syndication;
    • RSS (in its various forms) allows automatic exchange of "titles" and "summaries" between portals and weblogs.
    • FOAF (and other formats) are used to define profiles of individuals which are then used to organize online social networks.

XML as formalism to define "grammars"

In a more general perspective, XML is currently one of the most popular standards to define various kinds of data structures. One could define three kinds:

  1. XML accessories (e.g. XML Schema)
    • Extend the capabilities specified in XML
    • Intended for wide, general use
  2. XML transducers (e.g. XSLT)
    • Convert XML input data into output
    • Associated with a processing model
  3. XML applications in the narrow sense (e.g XHTML)
    • Define grammars, constraints for a class of XML data
    • Intended for a specific application area as diverse as for example word processors, e-learning, banking, multimedia, translation. Well know examples are Microsoft Office contents (e.g. .docx files) or Adobe Flash *.fla files. These files are in fact zip files composed of a series of XML documents. Unzip on of these and you can see.

Some technical XML concepts

An XML document can refer to a physical file, a database entry, a datastream. In other words, technically speaking an XML document is any sort of delimited "text" defined as a string and that has XML markup inside.

Wellformedness

An XML document is well formed if and only if:

There is an appropriate XML declaration at the beginning
  • The document starts with an XML declaration that includes a version number (currently 1.0).
<?xml version="1.0"?>
This declaration can also contain encoding information. By default encoding isUTF-8):
 <?xml version="1.0" encoding="UTF-8"?>
XML documents are hierarchical, i.e. each element must be inside an other one (except the first one, the so-called root tag).
  • begin-tags and end-tags that match
  • No tags crossing like
<i>...<b>...</i> .... </b>
  • There must be single root
    • It can only appear once and can not be used within other elements
Other features
  • XML is case sensitive, "LI" is not "li" for example
  • "Empty" tags must be self closing, e.g.
<br />
  • Attribute values are quoted
 <a href="http://tecfa.unige.ch:8080/xml.html">
  • Special caracters: <, &, >," and '. Use instead (and that include URLs !):
&amp; &amp; &amp; &aquot; &apos;

Valid

An XML document is said valid if it conforms to some kind of grammar also called schema. An XML grammar formally describes an XML application (or vocabulary or language).

The most popular ones are in this order:

XML applications in addition to DTDs may include other constraints. Some XML applications may include languages that are not XML-based (e.g. CSS or XPath).

The most popular grammars are DTDs. Below we just include a picture of a little grammar (read the details in DTD tutorial

Simple DTD

Text-centric vs. data-centric XML

Data-centric XML as opposed to the text-centric XML refers to XML whose primary audience is not a human reader, but a computer program which will process the information, respond to it, store data items in a database, and so on.

A list of XML applications

See also Tour de XML, a selection of links demonstrating various uses of XML.

Accessories

Extend the power of XML

Transducers

Various style sheet and query languages

Metadata and semantic web

Graphics and multimedia

Contents

  • Docbook: the Most popular standard for writing large documents.
  • DITA. A more flexible module-based approach to documents, originally made by IBM.

These document standards (as well others) can intervene all stages of the document production/delivery pipeline. XML in the documentation world appears as:

Application development

  • MXML is an XML-based user interface markup language used in Adobe Flex, a software development kit to create Flash-based Internet and desk-top applications.
  • XML User Interface Language (XUL) is the XML user interface markup language developed by the Mozilla project. It operates in cross-platform applications such as Firefox.

XML and documentation languages

Any XML document can directly be put on the web together with a CSS stylesheet or an XSLT transformation. Specialized formats like SVG (vector graphics), X3D (3d vector graphics), MathML (formulas) can be added to XML-compatible browsers. Larger documents are often produces with specialized vocabularies such as DITA or DocBook. Contents can be written either with an XML editor or an XML-aware word processor. Such documents can then be either directly "saved as" or sent through various cusom output filters.

XML document production pipelines

Presenting XML documents with style sheets

Today one can directly display information encoded in XML (of any grammar) in a browser, by using a style-sheet. The style-sheets allow to:

  • Prepare / arrange content for a "presentation".
  • Define the “layout” (format, formatting) of a text written in Xml.
  • Alter contents, e.g. by adding a table of contents or page headers

The utility of style-sheets is therefore

  • Separation of content and presentation
  • Simplification of work (one style-sheet for many documents, single source publishing, etc.)

XSL - The Extensible Stylesheet Language

XSL refers to two languages recommended by W3C

XSL either refers to XSLT or XSL/FO and they provide two principal functions:

(1) XSLT is a transformation language for Xml elements.

For example: XSLT allow for the creation of table of contents or the translation XML to HTML

(2) XSL/FO is formatting language that allow to create high quality print documents

Formatting with XSL-FO

  • sophisticated formatting, also by inheritance, position etc..
  • generation of text and graphics
  • possibility to define macros
  • everything that can be found in CSS, and more…

CSS(Cascading Style Sheets)

CSS also can be used for style XML contents. However since its transformation capabilities are rather poor, the XML already should include all the data to be published.

  • XML support is included since CSS2.


XLink - Towards a better hypertext?

Xlink allows inserting a link in XML document, where a link expresses a relationship between two or more objects. XLink remains a proposal, there’s no complete implementation for the moment. However, subsets of Xlink are used in various other XML languages.

XLink is based on other standards (and that are also shared with XSLT)

  • XPointer = how to identify an XML fragment (used by links). XPointer is based on Xpath
  • XPath = how to identify a path towards a resource

Principal characteristics of XLink

  • Multi-directional links, links towards multiple destinations
  • Occlusions, inclusions, content replacements in a document

Where does this standard come from?

  • HTML
  • Anchors: href (attribute A), src (attribute of IMG and NOTE) ...
  • Targets: name attributes (A), id (attribute in HTML 4.x)
  • mainly: SGML HyTime and TEI Extended Pointers (extension to HyTime) languages.

XML in education

On has to make a distinction between languages specifically developped for the education section (see below) and all the rest of XML technology, most of which can be useful to education

  • In addition, there are languages for selling education, to exchange student data, curricula data etc.

XML Software

(longer entries have their own page)

XML creation

XML databases

Validation

Off-line validation
  • Most decent XML editors do offer validation functionality. However, some free XML editors do not. Some (like Xemacs) only offer limited verification.
  • xmllint, a command line tool which is distributed as part of the libxml2 C parser developed for the Gnome project. This means that it ships with most Linux installations, but there also distributions for Windows and other OSs.
  • You can find several tools on source forge
On-line validation

Note: You may need to change DTD's or Schema's local system identifier. These programs must be able to get the DTD. I rather suggest installing a local program on your machine (like xmllint or xmlTester).

On-line validation for specific XML applications
  • W3C HTML Validation Service This validator doesn't work with your own DTD's. Its primary function is to validate W3C vocabularies (HTML, XHTML, SVG, MathML, ... )

Online tools

Some websites offer functionality to perform simple xml tasks like formatting, diffing, transforming, validating, querying XML.

Website Features
http://www.shancarter.com/data_converter/ Conversion from Excel and csv to XML
http://www.shell-tools.net/index.php?op=xml_format Format and validation (dtd and xsd)
http://tools.decisionsoft.com/xmldiff.html Diff (compare XML files)
http://tools.decisionsoft.com/schemaValidate/ Validation (XSD)
http://chris.photobooks.com/xml/ Format, transformation (XSLT) and query (Xpath)
http://www.xmltools.dk/ Query (Xpath)
http://xslt.online-toolz.com/tools/xslt-transformation.php Format, transformation (XSLT) and Validation (XSD)
http://www.w3schools.com/xsl/tryxslt.asp?xmlfile=cdcatalog&xsltfile=cdcatalog Transformation (XSLT)
http://www.qutoric.com/xslt/analyser/xpathtool.html Query (Xpath)

Links

Overviews

Tutorials

(this sections needs to be expanded some day)

News

Lists of XML applications (schemas)

References

  • Elliotte Rusty Harold, (2004). XML in a Nutshell, O'Reilly, Abstract/TOC ISBN 0-596-00764-7 (Best buy according to DSchneider).