XSLT for compound documents tutorial: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
(Created page with "{{stub}} {{web technology tutorial|Intermediate}} == Introduction == <div class="tut_goals"> ; Learning goals * Learn how to create XSLT that can handle XML documents that c...")
 
 
(13 intermediate revisions by the same user not shown)
Line 3: Line 3:


== Introduction ==
== Introduction ==
In principle, it should be easy to transform so-called compound documents with XSLT. In practice it is not because (a) documentation only can be found by searching specialized XML web sites and (b) there are some very tricky issues. For now, this article just includes an example that demonstrates the principle with a working "life" example.


<div class="tut_goals">
<div class="tut_goals">
; Learning goals
; Learning goals
* Learn how to create XSLT that can handle XML documents that combine several vocabulariescombined XHTML/SVG/MathML/etc documents
* Learn how to create XSLT that can handle XML documents that combine several vocabularies, for example combined XHTML + RDF/dc + your own XML documents
; Prerequisites
; Prerequisites
* [[XML]]
* [[XML]]
Line 15: Line 17:
* Intermediate XML/XSLT users
* Intermediate XML/XSLT users
; Remarks
; Remarks
* This tutorial provides a short overview about XSLT dealing with several XML namespaces. It includes minimal knowledge needed for an XML class. Most parts also can be used in introductory web technologies class.
* Only works with more recent versions of IE explorer, i.e. version 9 or better. Should work well with older Firefox version. Not tested with Safari.
</div>
</div>
== Introduction ==


[[XSLT]] can handle XML documents that include more than one namespace.
[[XSLT]] can handle XML documents that include more than one namespace.


Principles:
Principles:
* Declare the same namespaces on top of the XSLT stylesheet
* Declare '''all''' the namespaces found in the XML document on top of the XSLT stylesheet
* Declare even the default namespace (e.g. XHTML)
* If you produce XHTML, you must declare the XHTML namespace twice, as default namespace '''and''' with a prefix for the XSLT rules
* Each XPath expression must use a prefix !
* Each XPath expression must use a prefix, and that includes the XHTML ones !
* Strip the output from prefixes if you produce XHTML
 
Warning:
* The namespace URI/URN/URLs must be '''identical''' between the XML and the XSLT. One spelling mistake and nothing will work
* Really, '''every XPath''' element and attribute name must have a prefix, e.g. "match" and "select" attributes. See the example below.
 
The XSLT engine of your navigator may not be able to handle namespaces. Transform in your XML editor, or use a server-side solution. E.g. in php, do it like this:
<source lang="php">
<?php
# Made by DKS in 2005, still works in 2015. Substitute "YOUR" by your file names.
error_reporting(E_ALL);
$xml_file = 'YOUR.xml';
$xsl_file = 'YOUR.xsl';
// load the xml file (and test first if it exists)
$dom_object = new DomDocument();
if (!file_exists($xml_file)) exit('Failed to open $xml_file');
$dom_object->load($xml_file);
// create dom object for the XSL stylesheet and configure the transformer
$xsl_obj = new DomDocument();
if (!file_exists($xsl_file)) exit('Failed to open $xsl_file');
$xsl_obj->load($xsl_file);
 
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl_obj); // attach the xsl rules
$html_fragment = $proc->transformToXML($dom_object);
print ($html_fragment);
</source>
 
== Examples ==
 
=== XHTML with RDF, Dublin core and our own soup ===
 
Tested on April 2013 with IE9, Chrome and Firefox 20 under Windows 7. In principle, this should work with all modern browsers ....


== An example ==
Life files:
* [http://tecfa.unige.ch/guides/xml/examples/xslt-compound/compound-cd-list.xml compound-cd-list.xml]
* [http://tecfa.unige.ch/guides/xml/examples/xslt-compound/compound-cd-list.xsl compound-cd-list.xsl]


(soon)
XML input:
<source lang="XML">
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 
<?xml-stylesheet href="compound-cd-list.xsl" version="1.0" type="text/xsl"?>
 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Compound XML document demo</title>
</head>
<body>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:Description rdf:about="http://edutechwiki.unige.ch/XSLT_for_compound_documents_tutorial">
<dc:title>XSLT for compound documents demo</dc:title>
<dc:creator>DKS</dc:creator>
<dc:format>XHTML + private XML + DC</dc:format>
<dc:rights>Free as in free beer</dc:rights>
</dc:Description>
</rdf:RDF>
<p>This is an XML document with a compound vocabulary,
  i.e. XHMTL + CD-list + Dublin Core metadata.
  Read the <a href="http://edutechwiki.unige.ch/en/XSLT_for_compound_documents_tutorial">XSLT for compound documents tutorial</a></p>
<p>Stuff below belongs to a "my" namespace. Stuff at the bottom are "dc" contents. Output is ugly, no time for styling - DKS/4/2013.</p>
<hr/>
<my:cd-list xmlns:my="http://edutechwiki.unige.ch/XML">
<my:title>My (reduced) Hard Bop list</my:title>
<my:cd>
<my:artist>John Coltrane</my:artist>
<my:title>Blue Train</my:title>
<my:genre>Jazz</my:genre>
<my:description>From Wikipedia: ...... </my:description>
<my:track-list>
<my:track no="1">
<my:title>Blue Train</my:title>
<my:artist>John coltrane</my:artist>
<my:genre>Blues</my:genre>
</my:track>
<my:track no="2">
<my:title>Moment's Notice</my:title>
<my:artist>John coltrane</my:artist>
<my:genre>Hard Bop</my:genre>
</my:track>
</my:track-list>
</my:cd>
<my:cd>
<my:artist>Art Blakey</my:artist>
<my:title>Moanin'</my:title>
</my:cd>
</my:cd-list>
</body>
</html>
</source>
 
XSLT file:
 
<source lang="XML">
<?xml version="1.0"?>
 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:h="http://www.w3.org/1999/xhtml"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:my="http://edutechwiki.unige.ch/XML"
version="1.0">
<xsl:output method="xml"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" indent="yes"/>
 
<xsl:template match="h:html">
<html>
<head>
<title>
<xsl:value-of select="h:head/h:title"/>
</title>
</head>
<body bgcolor="#FFFFFF">
<xsl:apply-templates select="h:body"/>
</body>
</html>
</xsl:template>
<xsl:template match="h:body">
<!-- for all HTML tags -->
<xsl:apply-templates select="h:*"/>
<!-- CD list will come first -->
<xsl:apply-templates select="my:cd-list"/>
<!-- Metadata at the end, skip the RDF part -->
<xsl:apply-templates select="rdf:RDF/dc:Description"/>
</xsl:template>
<!-- CD list contents -->
<xsl:template match="my:cd-list">
<h1><xsl:value-of select="my:title"/></h1>
<xsl:apply-templates select="my:cd"/>
</xsl:template>
<xsl:template match="my:cd">
<h3><xsl:value-of select="my:artist"/>:
    <xsl:value-of select="my:title"/> -
    <xsl:value-of select="my:genre"/>
</h3>
<p><xsl:value-of select="my:description"/></p>
<xsl:apply-templates select="my:track-list"/>
</xsl:template>
<xsl:template match="my:track-list">
        <ol>
<xsl:apply-templates select="my:track"/>
</ol>
</xsl:template>
<xsl:template match="my:track">
        <li>
      <xsl:value-of select="my:title"/> -
      <xsl:value-of select="my:artist"/> -
      <xsl:value-of select="my:genre"/>
</li>
</xsl:template>
<!-- metadata -->
<xsl:template match="rdf:RDF/dc:Description">
    <hr/>
    <p style="font-size:60%;">Meta data:
    Title:<xsl:value-of select="dc:title"/> -
    Creator: <xsl:value-of select="dc:creator"/> -
    Format: <xsl:value-of select="dc:format"/> -
    Copyright: <xsl:value-of select="dc:rights"/>
    </p>
</xsl:template>
  <!-- HTML tags and contents are just copied -->
  <xsl:template match="h:*">
    <xsl:copy>
        <xsl:copy-of select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
 
</xsl:stylesheet>
</source>
 
== Using XSLT with documents that have a single name space ==
 
The same principle applies, you must use prefixes in your XSLT code. See also [[XML Schema tutorial - Basics]]
 
XSL
<syntaxhighlight lang="XML">
<xsl:stylesheet
    xmlns="http://www.ibm.com/software/analytics/spss/xml/oms"
    xmlns:oms="http://www.ibm.com/software/analytics/spss/xml/oms"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 
  <xsl:output method="html" indent="yes"/>
 
  <xsl:template match="oms:outputTree">
    <html>
      <head>
<title>SPSS Codebook</title>
      </head>
      <body bgcolor="#ffffff">
<xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
.....
  <xsl:template match="oms:pivotTable/oms:dimension//oms:category[@text='Measurement']/oms:dimension/oms:category/oms:cell[@text='Nominal']">
    .........
    <xsl:apply-templates/>
  </xsl:template>
 
</syntaxhighlight>
 
XML (a codebook file in XML generated by SPSS)
<syntaxhighlight lang="XML">
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="spss-codebook.xsl"?>
 
<outputTree
xmlns="http://www.ibm.com/software/analytics/spss/xml/oms"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.ibm.com/software/analytics/spss/xml/oms http://www.ibm.com/software/analytics/spss/xml/oms/spss-output-1.8.xsd">
 
 
  <command command="Codebook" displayOutlineValues="label" displayOutlineVariables="label" displayTableValues="label" displayTableVariables="label" lang="en" text="Codebook">
    <pivotTable subType="Variable Information" text="id">
      <dimension axis="row" text="Attributes">
<group text="Standard Attributes">
  <category text="Position">
    <dimension axis="column" text="Values">
      <category text="Value">
<cell number="1" text="1"/>
      </category>
    </dimension>
  </category>
 
.......
</syntaxhighlight>
 
A live example should be here: http://tecfa.unige.ch/proj/ccl/ILICS
 
Alternatively, you also could use XSLT functions instead, e.g.
 
The following two are equivalent for retrieving the same elements (but I did not test the second expression, may have a syntax error ...)
<syntaxhighlight lang="XML">
oms:pivotTable/oms:dimension//oms:category[@text='Measurement']/oms:dimension/oms:category/oms:cell[@text='Nominal']
 
/*[name()='outputTree']/*[name()='command']/*[name()='pivotTable']/*[name()='dimension']/*[name()='group']/*[name()='category']/*[name()='dimension']/*[name()='category']/*[name()='cell'][@text='Nominal]'
</syntaxhighlight>


== Links ==
== Links ==
; XSLT and namespaces
* [http://www.jenitennison.com/xslt/namespaces.xml Handling namespaces] (Jenny Tennison, btw. a good resource for all XSLT problems).
* [http://www.jenitennison.com/xslt/namespaces.xml Handling namespaces] (Jenny Tennison, btw. a good resource for all XSLT problems).
* [http://www.ibm.com/developerworks/library/x-xsltmistakes/ Avoid common XSLT mistakes] by Jirka Kosek, Dec 2008 (retrieved 4/2013)
* [http://www.ibm.com/developerworks/library/x-xsltmistakes/ Avoid common XSLT mistakes] by Jirka Kosek, Dec 2008 (retrieved 4/2013)
; RDF and Dublin Core (older version)
* http://dublincore.org/documents/dcmes-xml/
* http://www.w3schools.com/rdf/rdf_dublin.asp


[[Category: XML]]
[[Category: XML]]

Latest revision as of 13:42, 4 October 2015

Draft

Introduction

In principle, it should be easy to transform so-called compound documents with XSLT. In practice it is not because (a) documentation only can be found by searching specialized XML web sites and (b) there are some very tricky issues. For now, this article just includes an example that demonstrates the principle with a working "life" example.

Learning goals
  • Learn how to create XSLT that can handle XML documents that combine several vocabularies, for example combined XHTML + RDF/dc + your own XML documents
Prerequisites
Level and target population
  • Intermediate XML/XSLT users
Remarks
  • Only works with more recent versions of IE explorer, i.e. version 9 or better. Should work well with older Firefox version. Not tested with Safari.

XSLT can handle XML documents that include more than one namespace.

Principles:

  • Declare all the namespaces found in the XML document on top of the XSLT stylesheet
  • If you produce XHTML, you must declare the XHTML namespace twice, as default namespace and with a prefix for the XSLT rules
  • Each XPath expression must use a prefix, and that includes the XHTML ones !

Warning:

  • The namespace URI/URN/URLs must be identical between the XML and the XSLT. One spelling mistake and nothing will work
  • Really, every XPath element and attribute name must have a prefix, e.g. "match" and "select" attributes. See the example below.

The XSLT engine of your navigator may not be able to handle namespaces. Transform in your XML editor, or use a server-side solution. E.g. in php, do it like this:

<?php
 # Made by DKS in 2005, still works in 2015. Substitute "YOUR" by your file names.
error_reporting(E_ALL);
 
$xml_file = 'YOUR.xml';
$xsl_file = 'YOUR.xsl';
 
// load the xml file (and test first if it exists)
$dom_object = new DomDocument();
if (!file_exists($xml_file)) exit('Failed to open $xml_file');
$dom_object->load($xml_file);
 
// create dom object for the XSL stylesheet and configure the transformer
$xsl_obj = new DomDocument();
if (!file_exists($xsl_file)) exit('Failed to open $xsl_file');
$xsl_obj->load($xsl_file);

$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl_obj); // attach the xsl rules
$html_fragment = $proc->transformToXML($dom_object);
print ($html_fragment);

Examples

XHTML with RDF, Dublin core and our own soup

Tested on April 2013 with IE9, Chrome and Firefox 20 under Windows 7. In principle, this should work with all modern browsers ....

Life files:

XML input:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<?xml-stylesheet href="compound-cd-list.xsl" version="1.0" type="text/xsl"?>

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
	<head>
		<title>Compound XML document demo</title>
	</head>
	<body>
		<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
			     xmlns:dc="http://purl.org/dc/elements/1.1/">
			<dc:Description rdf:about="http://edutechwiki.unige.ch/XSLT_for_compound_documents_tutorial">
				<dc:title>XSLT for compound documents demo</dc:title>
				<dc:creator>DKS</dc:creator>
				<dc:format>XHTML + private XML + DC</dc:format>
				<dc:rights>Free as in free beer</dc:rights>
			</dc:Description>
		</rdf:RDF>
		<p>This is an XML document with a compound vocabulary, 
		   i.e. XHMTL + CD-list + Dublin Core metadata.
		   Read the <a href="http://edutechwiki.unige.ch/en/XSLT_for_compound_documents_tutorial">XSLT for compound documents tutorial</a></p>
		<p>Stuff below belongs to a "my" namespace. Stuff at the bottom are "dc" contents. Output is ugly, no time for styling - DKS/4/2013.</p>
		<hr/>
		<my:cd-list xmlns:my="http://edutechwiki.unige.ch/XML">
			<my:title>My (reduced) Hard Bop list</my:title>
			<my:cd>
				<my:artist>John Coltrane</my:artist>
				<my:title>Blue Train</my:title>
				<my:genre>Jazz</my:genre>
				<my:description>From Wikipedia: ...... </my:description>
				<my:track-list>
					<my:track no="1">
						<my:title>Blue Train</my:title>
						<my:artist>John coltrane</my:artist>
						<my:genre>Blues</my:genre>
					</my:track>
					<my:track no="2">
						<my:title>Moment's Notice</my:title>
						<my:artist>John coltrane</my:artist>
						<my:genre>Hard Bop</my:genre>
					</my:track>
				</my:track-list>
			</my:cd>
			<my:cd>
				<my:artist>Art Blakey</my:artist>
				<my:title>Moanin'</my:title>
			</my:cd>
		</my:cd-list>
	</body>
</html>

XSLT file:

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
	xmlns:h="http://www.w3.org/1999/xhtml" 
	xmlns="http://www.w3.org/1999/xhtml" 
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
	xmlns:dc="http://purl.org/dc/elements/1.1/" 
	xmlns:my="http://edutechwiki.unige.ch/XML" 
	version="1.0">
	
	<xsl:output method="xml" 
		doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" 
		doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" indent="yes"/>

	<xsl:template match="h:html">
		<html>
			<head>
				<title>
					<xsl:value-of select="h:head/h:title"/>
				</title>
			</head>
			<body bgcolor="#FFFFFF">
				<xsl:apply-templates select="h:body"/>
			</body>
		</html>
	</xsl:template>
	
	<xsl:template match="h:body">
		<!-- for all HTML tags -->
		<xsl:apply-templates select="h:*"/>
		<!-- CD list will come first -->
		<xsl:apply-templates select="my:cd-list"/>
		<!-- Metadata at the end, skip the RDF part -->
		<xsl:apply-templates select="rdf:RDF/dc:Description"/>
	</xsl:template>
	
	<!-- CD list contents -->
	<xsl:template match="my:cd-list">
		<h1><xsl:value-of select="my:title"/></h1>
		<xsl:apply-templates select="my:cd"/>
	</xsl:template>
	
	<xsl:template match="my:cd">
		<h3><xsl:value-of select="my:artist"/>: 
		    <xsl:value-of select="my:title"/> - 
		    <xsl:value-of select="my:genre"/>
		</h3>
		<p><xsl:value-of select="my:description"/></p>
		<xsl:apply-templates select="my:track-list"/>
	</xsl:template>
	
	<xsl:template match="my:track-list">
        <ol>
		<xsl:apply-templates select="my:track"/>
		</ol>
	</xsl:template>
		<xsl:template match="my:track">
        <li>
 	      <xsl:value-of select="my:title"/> -
 	      <xsl:value-of select="my:artist"/> -
 	      <xsl:value-of select="my:genre"/>
		</li>
	</xsl:template>
	
	
	<!-- metadata -->
	<xsl:template match="rdf:RDF/dc:Description">
	    <hr/>
	     <p style="font-size:60%;">Meta data:
	     Title:<xsl:value-of select="dc:title"/> -
	     Creator: <xsl:value-of select="dc:creator"/> -
	     Format: <xsl:value-of select="dc:format"/> -
	     Copyright: <xsl:value-of select="dc:rights"/>
	     </p>
	</xsl:template>	
	
  <!-- HTML tags and contents are just copied -->
	
  <xsl:template match="h:*">
     <xsl:copy>
         <xsl:copy-of select="@*"/>
       <xsl:apply-templates/>
     </xsl:copy>
   </xsl:template>
   
</xsl:stylesheet>

Using XSLT with documents that have a single name space

The same principle applies, you must use prefixes in your XSLT code. See also XML Schema tutorial - Basics

XSL

<xsl:stylesheet 
    xmlns="http://www.ibm.com/software/analytics/spss/xml/oms"
    xmlns:oms="http://www.ibm.com/software/analytics/spss/xml/oms"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  
  <xsl:output method="html" indent="yes"/>

  <xsl:template match="oms:outputTree">
    <html>
      <head> 
	<title>SPSS Codebook</title> 
      </head>
      <body bgcolor="#ffffff">
	<xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
.....
  <xsl:template match="oms:pivotTable/oms:dimension//oms:category[@text='Measurement']/oms:dimension/oms:category/oms:cell[@text='Nominal']">
    .........
    <xsl:apply-templates/>
  </xsl:template>

XML (a codebook file in XML generated by SPSS)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="spss-codebook.xsl"?>

<outputTree 
 xmlns="http://www.ibm.com/software/analytics/spss/xml/oms"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.ibm.com/software/analytics/spss/xml/oms http://www.ibm.com/software/analytics/spss/xml/oms/spss-output-1.8.xsd">


  <command command="Codebook" displayOutlineValues="label" displayOutlineVariables="label" displayTableValues="label" displayTableVariables="label" lang="en" text="Codebook">
    <pivotTable subType="Variable Information" text="id">
      <dimension axis="row" text="Attributes">
	<group text="Standard Attributes">
	  <category text="Position">
	    <dimension axis="column" text="Values">
	      <category text="Value">
		<cell number="1" text="1"/>
	      </category>
	    </dimension>
	  </category>

.......

A live example should be here: http://tecfa.unige.ch/proj/ccl/ILICS

Alternatively, you also could use XSLT functions instead, e.g.

The following two are equivalent for retrieving the same elements (but I did not test the second expression, may have a syntax error ...)

oms:pivotTable/oms:dimension//oms:category[@text='Measurement']/oms:dimension/oms:category/oms:cell[@text='Nominal']

/*[name()='outputTree']/*[name()='command']/*[name()='pivotTable']/*[name()='dimension']/*[name()='group']/*[name()='category']/*[name()='dimension']/*[name()='category']/*[name()='cell'][@text='Nominal]'

Links

XSLT and namespaces
RDF and Dublin Core (older version)