XPath tutorial - basics: Difference between revisions
mNo edit summary |
No edit summary |
||
Line 772: | Line 772: | ||
[[Category: databases]] | [[Category: databases]] | ||
[[Category: | [[Category:Web technology tutorials]] |
Revision as of 15:23, 4 August 2009
<pageby nominor="false" comments="false"/>
Introduction
This is an introducty XPath tutorial. Cut/paste from slides with a few fixes. Needs more work ... - Daniel K. Schneider
Prerequisites
- Editing XML (being able to use a simple DTD)
- Introductory XSLT (xsl:template, xsl:apply-templates and xsl:value-of)
- Know about the role of XPath with respect to XSLT
Objectives
- Better understand XPath expressions
- Learn some XSLT programming constructions (conditions and loops)
- Being able to cope with most XML to HTML transformations
Disclaimer
- There may be typos (sorry) and mistakes (sorry again)
- Please also consult a textbook !
Introduction to XML Path Language
Definition and history
- XPath is a language for addressing parts of an XML document
- In support of this primary purpose, it also provides basic facilities for manipulation of strings, numbers and booleans.
- XPath uses a compact non-XML syntax (to facilitate use of XPath within URIs and XML attribute values).
- XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.
- XPath was defined at the same time as XSLT (nov 1999)
- Initally, it was developped to support XSLT and XPointer (XML Pointer Language used for XLink, XInclude, etc.)
- Specifications
- XPath 1.0 http://www.w3.org/TR/xpath (nov 1999)
- Used by XSLT 1.0
- XPath 2.0 http://www.w3.org/TR/xpath20/ (Jan 2007)
- XPath 2.0 Functions and Operators http://www.w3.org/TR/xquery-operators/
- XPath 2.0 is a superset of XPath 1.0
- Used by XSLT 2.0 and XQuery ... and other specifications
XSLT, XQuery and XPath
- Each time a given XSLT or XQuery instruction needs to address (refer to) parts of an XML document we use XPath expressions.
- XPath expressions also can contain functions, simple math and boolean expressions
With XSLT, XPath expressions are typicially used in match , select and test attributes:
The XPath Syntax and the document model
Xpath Syntax
XPath expressions (at least the simple ones) look a bit like file paths. E.g.
/section/title
means: find all title nodes below section nodes
- Primary (relatively simple) XPath expressions
- Result of an Xpath
- Can be various things, e.g. sets of nodes, a single node, a number, etc
- Most often the result is a set of nodes
- There are two notations for location paths
- abbreviated (less available options)
- e.g. para is identical to child::para
- unabbreviated (not presented in these slides !!)
- e.g. " child::para " would define the para element children of the current context node.
- The formal specification of an XML Path
- is very complex, i.e. has about 39 clauses and is very difficult to understand
- Some expressions shown here are beyound the scope of this tutorial, don't panic !
2.2 The document model of XPath
- XPath sees an XML document as a tree structure
- Each information (XML elements, attributes, text, etc.) is called a node. This is fairly similar to the W3C DOM model an XML or XSLT processor would use.
- Nodes that XPath can see
- root node
- ATTENTION: The root is not necessarily the XML root element. E.g. processing instructions like a stylesheet declaration are also nodes.
- Elements and attributes
- Special nodes like comments, processing instructions, namespace declarations.
- Nodes XPath can't see
- XPath looks at the final document, therefore it can't see entities and document type declarations....
- The XML context
- What a given XPath expression means, is always defined by a given XML context, i.e. the current node in the XML tree
Element Location Paths
We present a few expressions for locating nodes. This is not complete and we will use abbreviated syntax.
- Document root node - returns the document root (which is not necessarily the XML root!)
/
- Direct child element
XML_element_name
- Direct child of the root node
/XML_element_name
- Child of a child
XML_element_name/XML_element_name
- Descendant of the root
//XML_element_name
- Descendant of a node
XML_element_name//XML_element_name
- Parent of a node
../
- A far cousin of a node
../../XML_element_name/XML_element_name/XML_element_name
Example Extracting titles from an XML file with XSLT
- An XML document (file
- xpath-jungle.xml used throughout the rest)
<?xml version="1.0"?> <project> <title>The Xpath project</title> <participants> <participant> <FirstName>Daniel</FirstName> <qualification>8</qualification> <description>Daniel will be the tutor</description> <FoodPref picture="dolores_001.jpg">Sea Food</FoodPref> </participant> <participant> <FirstName>Jonathan</FirstName> <qualification>5</qualification> <FoodPref picture="dolores_002.jpg">Asian</FoodPref> </participant> <participant> <FirstName>Bernadette</FirstName> <qualification>8</qualification> <description>Bernadette is an arts major</description> </participant> <participant> <FirstName>Nathalie</FirstName> <qualification>2</qualification> </participant> </participants> <problems> <problem> <title>Initial problem</title> <description>We have to learn something about Location Path</description> <difficulty level="5">This problem should not be too hard</difficulty> </problem> <solutions> <item val="low">Buy a XSLT book</item> <item val="low">Find an XSLT website</item> <item val="high">Register for a XSLT course and do exercices</item> </solutions> <problem> <title>Next problem</title> <description>We have to learn something about predicates</description> <difficulty level="6">This problem is a bit more difficult</difficulty> </problem> <solutions> <item val="low">Buy a XSLT book</item> <item val="medium">Read the specification and do some exercises</item> <item val="high">Register for a XPath course and do exercices</item> </solutions> </problems> </project>
- Task
We would like to get a simple list of problem titles
- Solution
XSLT template (file: xpath-jungle-1.xsl)
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/project"> <html> <body bgcolor="#FFFFFF"> <h1><xsl:value-of select="title" /></h1> Here are the titles of our problems: <ul> <xsl:apply-templates select="problems/problem" /> </ul> </body> </html> </xsl:template> <xsl:template match="problems/problem"> <li><xsl:value-of select="title" /></li> </xsl:template> </xsl:stylesheet>
- (1) XSLT template for the root element
- The XPath of the "match" means: applies to project element node, descendant of root node
- Execution context of this template is therefore the element "project "
- xsl:apply-templates will select a rule for descendant " problem ".
- (2) XSLT template for the problem element
- The second rule will be triggered by the first rule, because problems/problem is indeed a descendant of the project element
- (3) Result HTML
<html> <body bgcolor="#FFFFFF"> <h1>The Xpath project</h1> Here are the titles of our problems: <ul> <li>Initial problem</li> <li>Next problem</li> </ul> </body> </html>
Attribute Location Paths
- Top find an attribute of a child element of the current context
@attribute_name
Example:
@val
- Find attributes of an element in a longer location path starting from root
/element_name/element_name/@attribute_name
Example:
/project/problems/solutions/item/@val
- Find attributes in the whole document
//@attribute_name
Example 2-2: Make an html img link from an attribute
- XML fragment
Same as above
- Task
Display a list of First Names plus their food preferences
- XSLT (File xpath-jungle-2.xsl)
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <html> <body bgcolor="#FFFFFF"> <h1>What do we know about our participants ?</h1> Here are some food preferences: <ul> <xsl:apply-templates select=".//participant" /> </ul> </body> </html> </xsl:template> <xsl:template match="participant"> <li><xsl:value-of select="FirstName"/> <xsl:apply-templates select="FoodPref"/> </li> </xsl:template> <xsl:template match="FoodPref"> prefers <xsl:value-of select="."/>. <img src="{@picture}"/> <br clear="all"/> </xsl:template> </xsl:stylesheet>
- The second rule will display names of participants and launch a template for FoodPref
- Note: Not all participants have a FoodPref element. If it is absent it will just be ignored.
- The third rule (FoodPref) displays the text (contents) of FoodPref and then makes an HTML img tag
- Parts of the result
<h1>What do we know about our participants ?</h1> Here are some food preferences: <ul> <li>Daniel prefers Sea Food. <img src="dolores_001.jpg"><br clear="all"></li> <li>Jonathan prefers Asian. <img src="dolores_002.jpg"><br clear="all"></li> <li>Bernadette</li> <li>Nathalie</li> </ul>
Location wildcards
- Sometimes (but not often!), it is useful to work with wildcards
- You have to understand that only one rule will be applied/element. Rules with wildcards have less priority and this is why "your rules" are applied before the system defaults.
- Find all child nodes of type XML element
*
- Find all child nodes (including comments, etc.)
node()
- Find all element attributes
@*
- Find all text nodes
text()
In XSLT there is built-in default rule and that relies on wildcards
This rule applies to the document root and all other elements
<xsl:template match="*|/"> <xsl:apply-templates/> </xsl:template>
Text and attribute values are just copied
<xsl:template match="text()|@*"> <xsl:value-of select="."/> </xsl:template>
XPaths with predicates
- A predicate is an expression that can be true or false
- It is appended within [...] to a given location path and will refine results
- More than one predicate can be appended to and within (!) a location path
- Expressions can contain mathematical or boolean operators
- Find element number N in a list
XML_element_name [ N ]
/project/participants/participant[2] /project/participants/participant[2]/FirstName
- Find elements that have a given attribute
XML_element_name [ @attribute_name ]
- Find elements that have a given element as child
XML_element_name [ XML_element_name ]//participant[FoodPref]
- Mathematical expressions
- Use the standard operators, except div instead of / ")
- * div mod
- mod is interesting if you want to display a long list in table format
5 mod 2 returns 1, "7 mod 2" and "3 mod 2" too
- Boolean operators (comparison, and, or)
- List of operators (according to precedence)
<=, <, >=, > =, != and, or
- Examples
- Return all exercise titles with a grade bigger than 5.
//exercise[note>5]/title
- Find elements that have a given attribute with a given value
XML_element_name [ @attribute_name = 'value']
//solutions/item[@val="low"]
- Example XSLT template that will match all item elements with val="low".
<xsl:template match=" //item[@val='low'] "> <xsl:value-of select="." /> </xsl:template>
- Note
- Usually expressions also contain functions
- Return last five elements of a list
author [(last() - 4) <= position()) and (position() <= last())]
- Return all Participant nodes with a contents of FirstName bigger than 7 characters:
"//Participant[string-length(FirstName)>=8]"
Example: Retrieve selected elements
- The XSLT stylesheet (file xpath-jungle-3.xsl)
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <html> <body bgcolor="#FFFFFF"> <h1>What do we know about our participants ?</h1> Here are some food preferences: <ul> <xsl:apply-templates select=".//participant" /> </ul> </body> </html> </xsl:template> <xsl:template match="participant"> <li><xsl:value-of select="FirstName"/> <xsl:apply-templates select="FoodPref"/> </li> </xsl:template> <xsl:template match="FoodPref"> prefers <xsl:value-of select="."/>. <img src="{@picture}"/> <br clear="all"/> </xsl:template> </xsl:stylesheet>
- HTML result
<html> <body bgcolor="#FFFFFF"> <h1>Retrieve selected elements</h1> Here is the name of participant two: <ul> <li>Jonathan</li> </ul> Here are all participant's firstnames that have a food preference: <ul> <li>Daniel</li> <li>Jonathan</li> </ul> Here are all items that have a value of "high" <ul> <li>Register for a XSLT course and do exercices</li> <li>Register for a XPath course and do exercices</li> </ul> </body> </html>
XPath functions
- XPath defines a certain number of functions
- You can recognize a function because it has "()".
- Functions are programming constructs that will return various kinds of informations, e.g.
- true / false
- a number
- a string
- a list of nodes
- It is not obvious to understand these ....
- There are restrictions on how you can use functions (stick to examples or the reference)
- last()
last() gives the number or nodes within a context
- position()
position() returns the position of an element with respect to other children for a parent
- count(node-set)
count gives the number of nodes in a node set (usually found with an XPath).
- starts-with(string, string)
returns TRUE if the second string is part of the first and starts off the first
//Participant[starts-with(Firstname,'Berna')]"
- contains(string, string)
returns TRUE if the second string is part of the first
//Participant[contains(FirstName,'nat')]
- string-length(string)
returns the length of a string
- number(string)
transforms a string into a number
- sum(node-set)
computes the sum of a given set of nodes.
If necessary, does string conversion with number()
- round(number)
round a number, e.g. 1.4 becomes 1 and 1.7 becomes 2
- translate(string1, string2, string3)
translates string1 by substituting string2 elements with string3 elements
Example: Computation of an average
- We would like to compute the average of participant's qualifications
<participant><FirstName>Daniel</FirstName> <qualification>8</qualification> </participant>
- The XSLT stylesheet (file xpath-jungle-4.xsl)
- We compute the sum of a node-set and then divide by the number of nodes
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <html> <body bgcolor="#FFFFFF"> <h1>Qualification level of participants</h1> Average is <xsl:value-of select="sum(.//participant/qualification) div count(.//participant/qualification)"/> </body> </html> </xsl:template> </xsl:stylesheet>
- HTML result
<html> <body bgcolor="#FFFFFF"> <h1>Qualification level of participants</h1> Average is 5.75 </body> </html>
Example: Find first names containing 'nat'
- The XSLT stylesheet (file xpath-jungle-5.xsl)
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <html> <body bgcolor="#FFFFFF"> <h1>Do we have a "nat" ?</h1> First Names that contain "nat": <ul><xsl:apply-templates select=".//participant[contains(FirstName,'nat')]"/></ul> First Names that contain "nat" and "Nat": <ul><xsl:apply-templates select=".//participant[contains(translate(FirstName,'N','n'),'nat')]"/></ul> </body> </html> </xsl:template> <xsl:template match="participant"> <li><xsl:value-of select="FirstName"/></li> </xsl:template> </xsl:stylesheet>
Union of XPaths
- Union Xpaths combine more than one XPath (and all the resulting nodes are returned).
- A typical example is the default rule which means that the template matches either the root element (i.e. "/" or just any element),
<xsl:template match="*|/"> <xsl:apply-templates/> </xsl:template>
- Often this is used to simplify apply-templates or even templates themselves. E.g. the following rules applies to both "description" and "para" elements.
<xsl:template match="para|description"> <p><xsl:apply-templates/></p> </xsl:template>
List of commonly used XPath expressions
Syntax |
(Type of path) |
Example path |
Example matches |
---|---|---|---|
name |
child element name |
project |
<project> ...... </project> |
/ |
child / child |
project/title |
<project> <title> ... </title> |
/ |
(root element) | ||
// |
descendant |
project//title |
<project><problem> <title>....</title> |
//title |
<root>... <title>..</title> (any place) | ||
* |
"wildcard" |
*/title |
<bla> <title>..</title> and <bli> <title>...</title> |
| |
"or operator |
title|head |
<title>...</title> or <head> ...</head> |
*|/|@* |
All elements: root, children and attributes | ||
. |
current element |
. |
|
../ |
parent element |
../problem |
<project> |
@attr |
attribute name |
@id |
<xyz id="test">...</xyz> |
element/@attr |
attribute of child |
project/@id |
<project id="test" ...> ... </project> |
@attr='value' |
value of attribute |
list[@type='ol'] |
<list type="ol"> ...... </list> |
position() |
position of element |
position() |
|
last() |
number of elements within a context |
last() position()!=last() |
links
Introductory tutorials
- Xpath (Wikipedia)
- Zvon tutorial (lots of examples)
- XPath for .NET Developers by Darshan Singh
Other
- XPath Visualizer A windows program you can install to train. Alternatively, just use a XML editor with Xpath support.
- Liquid XML has an XPath builder