XML
Introduction
This page provides a short rather non-technical overview of XML. See also the XML category for all XML-related topics (many) or follow up links in this overview.
Learning goals
- Understand the role of XML in IT
- Be able to identify major roles and XML languages made for the Web
Prerequisites
- none
Next steps
- Tour de XML or equivalent (having seen some real world applications would be good for motivation)
- XML principles
- Editing XML tutorial
- DTD tutorial
Definition
- XML means "Extended markup language". XML is a formalism that allows to define all sorts of languages that describe a wide range of "information contents" (e.g. web pages, vector graphics, programming languages). In technical terms, such languages are called XML applications or XML vocabularies.
- XML is designed as a machine readable self describing text editable persistent store for data, but it can be read (somewhat) by humans. XML is a formalism or a meta-language. Such a metalange is not to be confounded with HTML, a language to describe the structure of Web pages. XHTML, for example, is one out of the thousands of XML appplications.
See also: Editing XML tutorial
History
- XML is a subset of the Standardized Generalized Markup Language (SGML). SGML has been used to define HTML, whereas XHTML is defined with XML (This is why empty tags are not allowed anymore in XHTML). HTML5 on the other hand, is neither based on SMGL nor on XML.
- XML was formally defined in 1998 as W3C's XML Recommendation 1.0
- Since then, hundreds of XML languages have been defined and a few dozen are popular and in production. Ken Sall's famous Big Picture only lists some, e.g. none of the many IMS e-learning standards are mentionned.
XML and web standards
Currently, there are hundreds of more or less popular XML languages. Within the more narrow area of web standards there are less and we shall shortly introduce the most important ones that non-programmers like content developers or web designers should know about
XML for richer Web contents
Initially, XML was thought to redefine the way contents are delivered. After the non-use of XHTML as XML and the death of XHTML 2 this vision still exists, but doesn't represent the current mainstream view that rather adopted the HTML5 application-centered model.
The picture below shows the idea that web documents are composed of several components: In the case of HTML, there is HTML + CSS, in the case of HTML5 there is HTML + built-in SVG and MATHML + plus CSS. In the case of XHTML 1 or XHTML 5 a document can include any other XML language, provided that these are identified by so-called namespaces. Although it is not longer popular, we also included SGML in the picture, since it is the "mother" of all tag-based markup languages.
XML as the foundation for the future semantic Web
- Essentially the RDF framework. While RDF itself is used in some areas (e.g. Metadata formalisms), the global semantic web project seems to be somewhat stalled, except for occasional flares. Web 2.0 was supposed to be semantic but web 2.0 became all the opposite, i.e. it is based on simple micro-formats. Then it became web 3. Then came the anti-semantic HTML 5 initiative became dominant and the semantic web remains a tiny island.
XML for machine to machine talk
- Several protocols for machine-to-machine interaction like SOAP and XML-RPC. See the web service article
- Specialized search engines
- Special formats like RSS or FOAF are meant to help organization of networked information spaces.
- ...
XML as formalism to define "grammars"
- This more formal view simply states that XML is currently the most popular standard to define various kinds of data structures. On could define three kinds:
- XML accessories (e.g. XML Schema)
- Extend the capabilities specified in XML
- Intended for wide, general use
- XML transducers (e.g. XSLT)
- Convert XML input data into output
- Associated with a processing model
- XML applications in the narrow sense (e.g XHTML)
- Define grammars, constraints for a class of XML data
- Intended for a specific application area as diverse as for example e-learning, banking, multimedia, translation,...
Some technical XML concepts
An XML document can refer to a physical file, a database entry, a datastream (any appropriate "text" that is delimited).
Wellformedness
An XML document is well formed if and only if
- There is an appropriate XML declaration at the beginning
- The document starts with an XML declaration that includes a version number (currently 1.0).
<?xml version="1.0"?>
- This declaration can also contain encoding information. By default encoding isUTF-8):
<?xml version="1.0" encoding="ISO-8859-1"?>
- XML documents are hierarchical, i.e. each element must be inside an other one (except the first one, the so-called root tag).
- begin-tags and end-tags that match
- No tags crossing like
<i>...<b>...</i> .... </b>
- There must be single root
- It can only appear once and can not be used within other elements
- Other features
- XML is case sensitive, "LI" is not "li" for example
- "Empty" tags must be self closing, e.g.
<br />
- Attribute values are quoted
<a href= " http://tecfa.unige.ch:8080/xml.html " >)
- Special caracters: <, &, >," and '. Use instead (and that include URLs !):
< & > &aquot; '
Valid
An XML document is said valid if it conforms to some kind of grammar also called schema. An XML grammar formally describes an XML application (or vocabulary or language).
The most popular ones are in this order:
- DTDs (Document Type Definitions)
- XML Schema
- Relax NG
XML applications in addition to DTDs may include other constraints. Some XML applications may include languages that are not XML-based (e.g. CSS or XPath).
The most popular grammars are DTDs. Below we just include a picture of a little grammar (read the details in DTD tutorial
Text-centric vs. data-centric XML
Data-centric XML as opposed to the text-centric XML refers to XML whose primary audience is not a human reader, but a computer program which will process the information, respond to it, store data items in a database, and so on.
XML Applications
See also Tour de XML, a selection of links demonstrating various uses of XML.
Accessories
Extend the power of XML
- XLink: Hypertext links
- XPointer (ressource pointers)
- XPath (XML fragments identification)
- http://www.w3.org/TR/xpath
- (used by XSLT, XInclude, XLink, XQuery, XPointer etc.)
- See the XPath tutorial - basics
Transducers
Various style sheet and query languages
- XSL/FO (application XML): XML Style language
- XSLT(application XML): XML Transformation language
- XQuery: XML Query Language
Metadata and semantic web
- RDF applications (a whole lot of languages for the semantic web)
- PICS 2.0: Platform for Internet Content Selection
- P3P: Platform for Privacy Preferences
Graphics and multimedia
- SMIL: Synchronized Multimedia Integration Language
- SVG: Scalable Vector Graphics
- MathML: Mathematical Markup Language
Contents
- XHTML (several variants for new generation HTML)
- Docbook. Most popular standard for writing large documents.
- DITA. A more flexible module-based approach to documents, originally made by IBM.
These document standards (as well others) can intervene all stages of the document production/delivery pipeline. XML in the documentation world appears as:
- XHTML: HTML rewritten in XML
- Any XML document can directly be put on the web together with a CSS stylesheet or an XSLT transformation
- Specialized formats like SVG (vector graphics), X3D (3d vector graphics), MathML (formulas) can be added to XML-compatible browsers
- Larger documents are often produces with specialized vocabularies such as DITA or DocBook. Contents can be written either with an XML editor or an XML-aware word processor. Such documents can then be either directly "saved as" or sent through various cusom output filters.
Application development
- MXML is an XML-based user interface markup language used in Adobe Flex, a software development kit to create Flash-based Internet and desk-top applications.
- XML User Interface Language (XUL) is the XML user interface markup language developed by the Mozilla project. It operates in cross-platform applications such as Firefox.
XML in education
On has to make a distinction between languages specifically developped for the education section (see below) and all the rest of XML technology, most of which can be useful to education
- See the various Educational modeling languages, in particular the ones produced by IMS
- Content packaging and storage, e.g.
- In addition, there are languages for selling education, to exchange student data, curricula data etc.
XML Software
(longer entries have their own page)
XML creation
- See XML editor
XML databases
- See XML database
Validation
- Off-line validation
- Most decent XML editors do offer validation functionality. However, some free XML editors do not. Some (like Xemacs) only offer limited verification.
- xmllint, a command line tool which is distributed as part of the libxml2 C parser developed for the Gnome project. This means that it ships with most Linux installations, but there also distributions for Windows and other OSs.
- xmlTester.jar. This tools is based on the Xerxes parser.
- XML Nanny. XML Nanny is a Free Mac OS X developer tool that provides an Aqua interface for checking XHTML and XML documents for Well-Formedness and Validity either locally or across the network. (Tiger OS X 10.4) [sept 2005]
- On-line validation
Note: You may need to change DTD's or Schema's local system identifier. These programs must be able to get the DTD. I rather suggest installing a local program on your machine (like xmllint or xmlTester).
- STG XML Validation Form, curtosy of Scholarly Technology Group, Brown University
- XML well-formedness checker and validator, Richard Tobin, University of Endinburgh (RXP parser)
- XML.com's (simple well-formedness)
- On-line validation for specific XML applications
- W3C HTML Validation Service This validator doesn't work with your own DTD's. Its primary function is to validate W3C vocabularies (HTML, XHTML, SVG, MathML, ... )
- FEED Validator. Validates various RSS Formats plus PIE
Online tools
Some websites offer functionality to perform simple xml tasks like formatting, diffing, transforming, validating, querying XML.
Website | Features |
http://www.shancarter.com/data_converter/ | Conversion from Excel and csv to XML |
http://www.shell-tools.net/index.php?op=xml_format | Format and validation (dtd and xsd) |
http://tools.decisionsoft.com/xmldiff.html | Diff (compare XML files) |
http://tools.decisionsoft.com/schemaValidate/ | Validation (XSD) |
http://chris.photobooks.com/xml/ | Format, transformation (XSLT) and query (Xpath) |
http://www.xmltools.dk/ | Query (Xpath) |
http://xslt.online-toolz.com/tools/xslt-transformation.php | Format, transformation (XSLT) and Validation (XSD) |
http://www.w3schools.com/xsl/tryxslt.asp?xmlfile=cdcatalog&xsltfile=cdcatalog | Transformation (XSLT) |
http://www.qutoric.com/xslt/analyser/xpathtool.html | Query (Xpath) |
Links
Overviews
- XML in 10 points (W3C, 2001)
Tutorials
(this sections needs to be expanded some day)
- See XML-related entries in this wiki. There are some tutorials.
- whitepaper on structured authoring, by Sarah O’Keefe.
- XML:Managing Data Exchange. Thi Wikibook project introduced XML from a Data exchange perspective.
News
- Cafe con Leche XML News and Resources (Best resource to keep in touch with XML-related news)
Lists of XML applications (schemas)
- XML Standards Library by Liquid technologies. Good list, indexed by category. Added 15:54, 28 July 2010 (UTC).
- List of XML markup languages Good list, but as of 17:11, 9 June 2010 (UTC) by no means complete, e.g. most e-learning standards are missing.
- OASIS (an XML standardization organization) publishes Coverpages (newsletter and XML resource). It includes XML Applications and Initiatives (an ugly unsorted but interesting list).
- XML.gov is a US government web site that coordinates XML initiatives. Helps to find various standards they are interested in, e.g. a list of efforts.
References
- Elliotte Rusty Harold, (2004). XML in a Nutshell, O'Reilly, Abstract/TOC ISBN 0-596-00764-7 (Best buy according to DSchneider).