XSLT for compound documents tutorial

The educational technology and digital learning wiki
Revision as of 13:42, 4 October 2015 by Daniel K. Schneider (talk | contribs) (→‎Introduction)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Draft

Introduction

In principle, it should be easy to transform so-called compound documents with XSLT. In practice it is not because (a) documentation only can be found by searching specialized XML web sites and (b) there are some very tricky issues. For now, this article just includes an example that demonstrates the principle with a working "life" example.

Learning goals
  • Learn how to create XSLT that can handle XML documents that combine several vocabularies, for example combined XHTML + RDF/dc + your own XML documents
Prerequisites
Level and target population
  • Intermediate XML/XSLT users
Remarks
  • Only works with more recent versions of IE explorer, i.e. version 9 or better. Should work well with older Firefox version. Not tested with Safari.

XSLT can handle XML documents that include more than one namespace.

Principles:

  • Declare all the namespaces found in the XML document on top of the XSLT stylesheet
  • If you produce XHTML, you must declare the XHTML namespace twice, as default namespace and with a prefix for the XSLT rules
  • Each XPath expression must use a prefix, and that includes the XHTML ones !

Warning:

  • The namespace URI/URN/URLs must be identical between the XML and the XSLT. One spelling mistake and nothing will work
  • Really, every XPath element and attribute name must have a prefix, e.g. "match" and "select" attributes. See the example below.

The XSLT engine of your navigator may not be able to handle namespaces. Transform in your XML editor, or use a server-side solution. E.g. in php, do it like this:

<?php
 # Made by DKS in 2005, still works in 2015. Substitute "YOUR" by your file names.
error_reporting(E_ALL);
 
$xml_file = 'YOUR.xml';
$xsl_file = 'YOUR.xsl';
 
// load the xml file (and test first if it exists)
$dom_object = new DomDocument();
if (!file_exists($xml_file)) exit('Failed to open $xml_file');
$dom_object->load($xml_file);
 
// create dom object for the XSL stylesheet and configure the transformer
$xsl_obj = new DomDocument();
if (!file_exists($xsl_file)) exit('Failed to open $xsl_file');
$xsl_obj->load($xsl_file);

$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl_obj); // attach the xsl rules
$html_fragment = $proc->transformToXML($dom_object);
print ($html_fragment);

Examples

XHTML with RDF, Dublin core and our own soup

Tested on April 2013 with IE9, Chrome and Firefox 20 under Windows 7. In principle, this should work with all modern browsers ....

Life files:

XML input:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<?xml-stylesheet href="compound-cd-list.xsl" version="1.0" type="text/xsl"?>

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
	<head>
		<title>Compound XML document demo</title>
	</head>
	<body>
		<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
			     xmlns:dc="http://purl.org/dc/elements/1.1/">
			<dc:Description rdf:about="http://edutechwiki.unige.ch/XSLT_for_compound_documents_tutorial">
				<dc:title>XSLT for compound documents demo</dc:title>
				<dc:creator>DKS</dc:creator>
				<dc:format>XHTML + private XML + DC</dc:format>
				<dc:rights>Free as in free beer</dc:rights>
			</dc:Description>
		</rdf:RDF>
		<p>This is an XML document with a compound vocabulary, 
		   i.e. XHMTL + CD-list + Dublin Core metadata.
		   Read the <a href="http://edutechwiki.unige.ch/en/XSLT_for_compound_documents_tutorial">XSLT for compound documents tutorial</a></p>
		<p>Stuff below belongs to a "my" namespace. Stuff at the bottom are "dc" contents. Output is ugly, no time for styling - DKS/4/2013.</p>
		<hr/>
		<my:cd-list xmlns:my="http://edutechwiki.unige.ch/XML">
			<my:title>My (reduced) Hard Bop list</my:title>
			<my:cd>
				<my:artist>John Coltrane</my:artist>
				<my:title>Blue Train</my:title>
				<my:genre>Jazz</my:genre>
				<my:description>From Wikipedia: ...... </my:description>
				<my:track-list>
					<my:track no="1">
						<my:title>Blue Train</my:title>
						<my:artist>John coltrane</my:artist>
						<my:genre>Blues</my:genre>
					</my:track>
					<my:track no="2">
						<my:title>Moment's Notice</my:title>
						<my:artist>John coltrane</my:artist>
						<my:genre>Hard Bop</my:genre>
					</my:track>
				</my:track-list>
			</my:cd>
			<my:cd>
				<my:artist>Art Blakey</my:artist>
				<my:title>Moanin'</my:title>
			</my:cd>
		</my:cd-list>
	</body>
</html>

XSLT file:

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
	xmlns:h="http://www.w3.org/1999/xhtml" 
	xmlns="http://www.w3.org/1999/xhtml" 
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
	xmlns:dc="http://purl.org/dc/elements/1.1/" 
	xmlns:my="http://edutechwiki.unige.ch/XML" 
	version="1.0">
	
	<xsl:output method="xml" 
		doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" 
		doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" indent="yes"/>

	<xsl:template match="h:html">
		<html>
			<head>
				<title>
					<xsl:value-of select="h:head/h:title"/>
				</title>
			</head>
			<body bgcolor="#FFFFFF">
				<xsl:apply-templates select="h:body"/>
			</body>
		</html>
	</xsl:template>
	
	<xsl:template match="h:body">
		<!-- for all HTML tags -->
		<xsl:apply-templates select="h:*"/>
		<!-- CD list will come first -->
		<xsl:apply-templates select="my:cd-list"/>
		<!-- Metadata at the end, skip the RDF part -->
		<xsl:apply-templates select="rdf:RDF/dc:Description"/>
	</xsl:template>
	
	<!-- CD list contents -->
	<xsl:template match="my:cd-list">
		<h1><xsl:value-of select="my:title"/></h1>
		<xsl:apply-templates select="my:cd"/>
	</xsl:template>
	
	<xsl:template match="my:cd">
		<h3><xsl:value-of select="my:artist"/>: 
		    <xsl:value-of select="my:title"/> - 
		    <xsl:value-of select="my:genre"/>
		</h3>
		<p><xsl:value-of select="my:description"/></p>
		<xsl:apply-templates select="my:track-list"/>
	</xsl:template>
	
	<xsl:template match="my:track-list">
        <ol>
		<xsl:apply-templates select="my:track"/>
		</ol>
	</xsl:template>
		<xsl:template match="my:track">
        <li>
 	      <xsl:value-of select="my:title"/> -
 	      <xsl:value-of select="my:artist"/> -
 	      <xsl:value-of select="my:genre"/>
		</li>
	</xsl:template>
	
	
	<!-- metadata -->
	<xsl:template match="rdf:RDF/dc:Description">
	    <hr/>
	     <p style="font-size:60%;">Meta data:
	     Title:<xsl:value-of select="dc:title"/> -
	     Creator: <xsl:value-of select="dc:creator"/> -
	     Format: <xsl:value-of select="dc:format"/> -
	     Copyright: <xsl:value-of select="dc:rights"/>
	     </p>
	</xsl:template>	
	
  <!-- HTML tags and contents are just copied -->
	
  <xsl:template match="h:*">
     <xsl:copy>
         <xsl:copy-of select="@*"/>
       <xsl:apply-templates/>
     </xsl:copy>
   </xsl:template>
   
</xsl:stylesheet>

Using XSLT with documents that have a single name space

The same principle applies, you must use prefixes in your XSLT code. See also XML Schema tutorial - Basics

XSL

<xsl:stylesheet 
    xmlns="http://www.ibm.com/software/analytics/spss/xml/oms"
    xmlns:oms="http://www.ibm.com/software/analytics/spss/xml/oms"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  
  <xsl:output method="html" indent="yes"/>

  <xsl:template match="oms:outputTree">
    <html>
      <head> 
	<title>SPSS Codebook</title> 
      </head>
      <body bgcolor="#ffffff">
	<xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
.....
  <xsl:template match="oms:pivotTable/oms:dimension//oms:category[@text='Measurement']/oms:dimension/oms:category/oms:cell[@text='Nominal']">
    .........
    <xsl:apply-templates/>
  </xsl:template>

XML (a codebook file in XML generated by SPSS)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="spss-codebook.xsl"?>

<outputTree 
 xmlns="http://www.ibm.com/software/analytics/spss/xml/oms"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.ibm.com/software/analytics/spss/xml/oms http://www.ibm.com/software/analytics/spss/xml/oms/spss-output-1.8.xsd">


  <command command="Codebook" displayOutlineValues="label" displayOutlineVariables="label" displayTableValues="label" displayTableVariables="label" lang="en" text="Codebook">
    <pivotTable subType="Variable Information" text="id">
      <dimension axis="row" text="Attributes">
	<group text="Standard Attributes">
	  <category text="Position">
	    <dimension axis="column" text="Values">
	      <category text="Value">
		<cell number="1" text="1"/>
	      </category>
	    </dimension>
	  </category>

.......

A live example should be here: http://tecfa.unige.ch/proj/ccl/ILICS

Alternatively, you also could use XSLT functions instead, e.g.

The following two are equivalent for retrieving the same elements (but I did not test the second expression, may have a syntax error ...)

oms:pivotTable/oms:dimension//oms:category[@text='Measurement']/oms:dimension/oms:category/oms:cell[@text='Nominal']

/*[name()='outputTree']/*[name()='command']/*[name()='pivotTable']/*[name()='dimension']/*[name()='group']/*[name()='category']/*[name()='dimension']/*[name()='category']/*[name()='cell'][@text='Nominal]'

Links

XSLT and namespaces
RDF and Dublin Core (older version)