XPath tutorial - basics

The educational technology and digital learning wiki
Revision as of 15:23, 4 August 2009 by WikiSysop (talk | contribs)
Jump to navigation Jump to search

<pageby nominor="false" comments="false"/>

Introduction

This is an introducty XPath tutorial. Cut/paste from slides with a few fixes. Needs more work ... - Daniel K. Schneider

Prerequisites

  • Editing XML (being able to use a simple DTD)
  • Introductory XSLT (xsl:template, xsl:apply-templates and xsl:value-of)
  • Know about the role of XPath with respect to XSLT

Objectives

  • Better understand XPath expressions
  • Learn some XSLT programming constructions (conditions and loops)
  • Being able to cope with most XML to HTML transformations

Disclaimer

  • There may be typos (sorry) and mistakes (sorry again)
  • Please also consult a textbook !

Introduction to XML Path Language

Definition and history

  • XPath is a language for addressing parts of an XML document
  • In support of this primary purpose, it also provides basic facilities for manipulation of strings, numbers and booleans.
  • XPath uses a compact non-XML syntax (to facilitate use of XPath within URIs and XML attribute values).
  • XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.
  • XPath was defined at the same time as XSLT (nov 1999)
  • Initally, it was developped to support XSLT and XPointer (XML Pointer Language used for XLink, XInclude, etc.)
Specifications

XSLT, XQuery and XPath

  • Each time a given XSLT or XQuery instruction needs to address (refer to) parts of an XML document we use XPath expressions.
  • XPath expressions also can contain functions, simple math and boolean expressions

With XSLT, XPath expressions are typicially used in match , select and test attributes:

Xpath expressions in an XSLT template

The XPath Syntax and the document model

Xpath Syntax

XPath expressions (at least the simple ones) look a bit like file paths. E.g.

/section/title

means: find all title nodes below section nodes

Primary (relatively simple) XPath expressions

Xml-xpath-3.png

Result of an Xpath
  • Can be various things, e.g. sets of nodes, a single node, a number, etc
  • Most often the result is a set of nodes
There are two notations for location paths
  1. abbreviated (less available options)
    • e.g. para is identical to child::para
  2. unabbreviated (not presented in these slides !!)
    • e.g. " child::para " would define the para element children of the current context node.
The formal specification of an XML Path
  • is very complex, i.e. has about 39 clauses and is very difficult to understand
  • Some expressions shown here are beyound the scope of this tutorial, don't panic !

2.2 The document model of XPath

  • XPath sees an XML document as a tree structure
  • Each information (XML elements, attributes, text, etc.) is called a node. This is fairly similar to the W3C DOM model an XML or XSLT processor would use.
Nodes that XPath can see
  • root node
    • ATTENTION: The root is not necessarily the XML root element. E.g. processing instructions like a stylesheet declaration are also nodes.
  • Elements and attributes
  • Special nodes like comments, processing instructions, namespace declarations.
Nodes XPath can't see
  • XPath looks at the final document, therefore it can't see entities and document type declarations....
The XML context
  • What a given XPath expression means, is always defined by a given XML context, i.e. the current node in the XML tree

Element Location Paths

We present a few expressions for locating nodes. This is not complete and we will use abbreviated syntax.

Document root node - returns the document root (which is not necessarily the XML root!)
/
Direct child element

XML_element_name

Direct child of the root node

/XML_element_name

Child of a child

XML_element_name/XML_element_name

Descendant of the root

//XML_element_name

Descendant of a node

XML_element_name//XML_element_name

Parent of a node

../

A far cousin of a node

../../XML_element_name/XML_element_name/XML_element_name

Example Extracting titles from an XML file with XSLT

An XML document (file
xpath-jungle.xml used throughout the rest)

<?xml version="1.0"?>
<project>
 <title>The Xpath project</title>
 <participants>
  <participant>
    <FirstName>Daniel</FirstName>
    <qualification>8</qualification>
    <description>Daniel will be the tutor</description>
    <FoodPref picture="dolores_001.jpg">Sea Food</FoodPref>
  </participant>
  <participant>
    <FirstName>Jonathan</FirstName>
    <qualification>5</qualification>
    <FoodPref picture="dolores_002.jpg">Asian</FoodPref>
  </participant>
  <participant>
   <FirstName>Bernadette</FirstName>
   <qualification>8</qualification>
    <description>Bernadette is an arts major</description>
  </participant>
  <participant>
   <FirstName>Nathalie</FirstName>
   <qualification>2</qualification>
  </participant>
 </participants>
 <problems>
  <problem>
    <title>Initial problem</title>
    <description>We have to learn something about Location Path</description>
    <difficulty level="5">This problem should not be too hard</difficulty>
  </problem>
  <solutions>
   <item val="low">Buy a XSLT book</item>
   <item val="low">Find an XSLT website</item>
   <item val="high">Register for a XSLT course and do exercices</item>
  </solutions>
   <problem>
    <title>Next problem</title>
    <description>We have to learn something about predicates</description>
    <difficulty level="6">This problem is a bit more difficult</difficulty>
  </problem>
  <solutions>
   <item val="low">Buy a XSLT book</item>
   <item val="medium">Read the specification and do some exercises</item>
   <item val="high">Register for a XPath course and do exercices</item>
  </solutions>
 </problems>
</project>
Task

We would like to get a simple list of problem titles

Solution

XSLT template (file: xpath-jungle-1.xsl)

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:output method="html"/>

  <xsl:template match="/project">
    <html>
      <body bgcolor="#FFFFFF">
        <h1><xsl:value-of select="title" /></h1>
        Here are the titles of our problems: <ul>
        <xsl:apply-templates select="problems/problem" /> 
      </ul>
      </body>
    </html>
  </xsl:template>

<xsl:template match="problems/problem">
  <li><xsl:value-of select="title" /></li>
</xsl:template>

</xsl:stylesheet>
(1) XSLT template for the root element
  • The XPath of the "match" means: applies to project element node, descendant of root node
  • Execution context of this template is therefore the element "project "
  • xsl:apply-templates will select a rule for descendant " problem ".
(2) XSLT template for the problem element
  • The second rule will be triggered by the first rule, because problems/problem is indeed a descendant of the project element
(3) Result HTML
<html>
   <body bgcolor="#FFFFFF">
      <h1>The Xpath project</h1>
      Here are the titles of our problems:
      <ul>
         <li>Initial problem</li>
         <li>Next problem</li>
      </ul>
   </body>
</html>

Attribute Location Paths

Top find an attribute of a child element of the current context

@attribute_name

Example:

@val
Find attributes of an element in a longer location path starting from root

/element_name/element_name/@attribute_name

Example:

/project/problems/solutions/item/@val
Find attributes in the whole document

//@attribute_name

Example 2-2: Make an html img link from an attribute

XML fragment

Same as above

Task

Display a list of First Names plus their food preferences

XSLT (File xpath-jungle-2.xsl)
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:output method="html"/>

  <xsl:template match="/">
    <html>
      <body bgcolor="#FFFFFF">
        <h1>What do we know about our participants ?</h1>
        Here are some food preferences: <ul>
        <xsl:apply-templates select=".//participant" /> 
      </ul>
      </body>
    </html>
  </xsl:template>

<xsl:template match="participant">
  <li><xsl:value-of select="FirstName"/>
  <xsl:apply-templates select="FoodPref"/>
  </li>
</xsl:template>

<xsl:template match="FoodPref">
  prefers <xsl:value-of select="."/>. 
  <img src="{@picture}"/> <br clear="all"/>
</xsl:template>

</xsl:stylesheet>

  • The second rule will display names of participants and launch a template for FoodPref
  • Note: Not all participants have a FoodPref element. If it is absent it will just be ignored.
  • The third rule (FoodPref) displays the text (contents) of FoodPref and then makes an HTML img tag


Parts of the result
  <h1>What do we know about our participants ?</h1>
      Here are some food preferences:
      <ul>
         <li>Daniel prefers Sea Food.
            <img src="dolores_001.jpg"><br clear="all"></li>
         <li>Jonathan
            prefers Asian.
            <img src="dolores_002.jpg"><br clear="all"></li>
         <li>Bernadette</li>
         <li>Nathalie</li>
      </ul>

Location wildcards

  • Sometimes (but not often!), it is useful to work with wildcards
  • You have to understand that only one rule will be applied/element. Rules with wildcards have less priority and this is why "your rules" are applied before the system defaults.
Find all child nodes of type XML element

*

Find all child nodes (including comments, etc.)

node()

Find all element attributes

@*

Find all text nodes

text()

In XSLT there is built-in default rule and that relies on wildcards

This rule applies to the document root and all other elements

<xsl:template match="*|/">
  <xsl:apply-templates/>
</xsl:template>

Text and attribute values are just copied

<xsl:template match="text()|@*">
  <xsl:value-of select="."/>
</xsl:template>

XPaths with predicates

  • A predicate is an expression that can be true or false
  • It is appended within [...] to a given location path and will refine results
  • More than one predicate can be appended to and within (!) a location path
  • Expressions can contain mathematical or boolean operators
Find element number N in a list

XML_element_name [ N ]

/project/participants/participant[2]
/project/participants/participant[2]/FirstName
Find elements that have a given attribute

XML_element_name [ @attribute_name ]

Find elements that have a given element as child

XML_element_name [ XML_element_name ]//participant[FoodPref]

Mathematical expressions
  • Use the standard operators, except div instead of / ")
- * div mod
  • mod is interesting if you want to display a long list in table format
5 mod 2 returns 1, "7 mod 2" and "3 mod 2" too
Boolean operators (comparison, and, or)
  • List of operators (according to precedence)
<=, <, >=, >
=, !=
and, or


Examples
  • Return all exercise titles with a grade bigger than 5.

//exercise[note>5]/title

  • Find elements that have a given attribute with a given value

XML_element_name [ @attribute_name = 'value']

//solutions/item[@val="low"]
  • Example XSLT template that will match all item elements with val="low".
<xsl:template match=" //item[@val='low'] ">
   <xsl:value-of select="." />
</xsl:template>
Note
Usually expressions also contain functions
  • Return last five elements of a list
author [(last() - 4) <= position()) and (position() <= last())]
  • Return all Participant nodes with a contents of FirstName bigger than 7 characters:
"//Participant[string-length(FirstName)>=8]"

Example: Retrieve selected elements

The XSLT stylesheet (file xpath-jungle-3.xsl)
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:output method="html"/>

  <xsl:template match="/">
    <html>
      <body bgcolor="#FFFFFF">
        <h1>What do we know about our participants ?</h1>
        Here are some food preferences: <ul>
        <xsl:apply-templates select=".//participant" /> 
      </ul>
      </body>
    </html>
  </xsl:template>

<xsl:template match="participant">
  <li><xsl:value-of select="FirstName"/>
  <xsl:apply-templates select="FoodPref"/>
  </li>
</xsl:template>

<xsl:template match="FoodPref">
  prefers <xsl:value-of select="."/>. 
  <img src="{@picture}"/> <br clear="all"/>
</xsl:template>

</xsl:stylesheet>
HTML result
<html>
   <body bgcolor="#FFFFFF">
      <h1>Retrieve selected elements</h1>
      Here is the name of participant two:
      <ul>
         <li>Jonathan</li>
      </ul>
      Here are all participant's firstnames that have a food preference:
      <ul>
         <li>Daniel</li>
         <li>Jonathan</li>
      </ul>
      Here are all items that have a value of "high"
      <ul>
         <li>Register for a XSLT course and do exercices</li>
         <li>Register for a XPath course and do exercices</li>
      </ul>
   </body>
</html>

XPath functions

  • XPath defines a certain number of functions
    • You can recognize a function because it has "()".
  • Functions are programming constructs that will return various kinds of informations, e.g.
    • true / false
    • a number
    • a string
    • a list of nodes
  • It is not obvious to understand these ....
  • There are restrictions on how you can use functions (stick to examples or the reference)
last()

last() gives the number or nodes within a context

position()

position() returns the position of an element with respect to other children for a parent

count(node-set)

count gives the number of nodes in a node set (usually found with an XPath).

starts-with(string, string)

returns TRUE if the second string is part of the first and starts off the first

//Participant[starts-with(Firstname,'Berna')]"
contains(string, string)

returns TRUE if the second string is part of the first

//Participant[contains(FirstName,'nat')]
string-length(string)

returns the length of a string

number(string)

transforms a string into a number

sum(node-set)

computes the sum of a given set of nodes.
If necessary, does string conversion with number()

round(number)

round a number, e.g. 1.4 becomes 1 and 1.7 becomes 2

translate(string1, string2, string3)

translates string1 by substituting string2 elements with string3 elements

Example: Computation of an average

  • We would like to compute the average of participant's qualifications
<participant><FirstName>Daniel</FirstName>
             <qualification>8</qualification>
   </participant>
The XSLT stylesheet (file xpath-jungle-4.xsl)
  • We compute the sum of a node-set and then divide by the number of nodes
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:output method="html"/>

  <xsl:template match="/">
    <html>
      <body bgcolor="#FFFFFF">
        <h1>Qualification level of participants</h1>
        Average is
        <xsl:value-of select="sum(.//participant/qualification) div count(.//participant/qualification)"/>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>
HTML result
<html>
   <body bgcolor="#FFFFFF">
      <h1>Qualification level of participants</h1>
      Average is
      5.75
   </body>
</html>

Example: Find first names containing 'nat'

The XSLT stylesheet (file xpath-jungle-5.xsl)
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:output method="html"/>

  <xsl:template match="/">
    <html>
      <body bgcolor="#FFFFFF">
        <h1>Do we have a "nat" ?</h1>
        First Names that contain "nat":
        <ul><xsl:apply-templates select=".//participant[contains(FirstName,'nat')]"/></ul>
        First Names that contain "nat" and "Nat":
        <ul><xsl:apply-templates select=".//participant[contains(translate(FirstName,'N','n'),'nat')]"/></ul>
      </body>
    </html>
  </xsl:template>

<xsl:template match="participant">
  <li><xsl:value-of select="FirstName"/></li>
</xsl:template>

</xsl:stylesheet>

Union of XPaths

  • Union Xpaths combine more than one XPath (and all the resulting nodes are returned).
  • A typical example is the default rule which means that the template matches either the root element (i.e. "/" or just any element),
<xsl:template match="*|/">
  <xsl:apply-templates/>
</xsl:template>
  • Often this is used to simplify apply-templates or even templates themselves. E.g. the following rules applies to both "description" and "para" elements.
<xsl:template match="para|description">
  <p><xsl:apply-templates/></p>
</xsl:template>

List of commonly used XPath expressions

Syntax
element

(Type of path)

Example path

Example matches

name

child element name

project

<project> ...... </project>

/

child / child

project/title

<project> <title> ... </title>

/

(root element)

//

descendant

project//title

<project><problem> <title>....</title>

//title

<root>... <title>..</title> (any place)

*

"wildcard"

*/title

<bla> <title>..</title> and <bli> <title>...</title>

|

"or operator

title|head

<title>...</title> or <head> ...</head>

*|/|@*

All elements: root, children and attributes

.

current element

.

../

parent element

../problem

<project>

@attr

attribute name

@id

<xyz id="test">...</xyz>

element/@attr

attribute of child

project/@id

<project id="test" ...> ... </project>

@attr='value'

value of attribute

list[@type='ol']

<list type="ol"> ...... </list>

position()

position of element
in parent

position()

last()

number of elements within a context

last()

position()!=last()

links

Introductory tutorials

Other