XML database: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
 
 
(23 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{incomplete}}
== Definition ==
== Definition ==


In this article, we are primairly interested in '''Native XML Databases''', {{quotation | Databases that store XML in "native" form, generally as some variant of the DOM mapped to an underlying data store. This includes the category formerly known as persistent DOM (PDOM) implementations.For data- and document-centric applications.}}([http://www.rpbourret.com/xml/XMLDatabaseProds.htm R. Bourret], retrieved 17:06, 3 November 2006 (MET))
In this article, we are primarily interested in '''Native XML Databases''', {{quotation | Databases that store XML in "native" form, generally as some variant of the DOM mapped to an underlying data store. This includes the category formerly known as persistent DOM (PDOM) implementations.For data- and document-centric applications.}}([http://www.rpbourret.com/xml/XMLDatabaseProds.htm R. Bourret], retrieved 17:59, 3 November 2006 (MET))
 
See also [[XQuery]] (the XML Query language) and [[Database]] (overview of various kinds of databases).


== When do we need an XML database ==
== When do we need an XML database ==


Ronald Bourret (2006) summarizes the major different XML and databases scenarios: {{quotationbox |
Ronald Bourret (2006) summarizes the major different XML and databases scenarios: {{quotationbox |
To store and retrieve the data in data-centric documents, what kind of software you need will depend on how well structured your data is. For highly structured data, such as the white pages in a telephone book, you will need an XML-enabled database that is tuned for data storage, such as a relational or object-oriented database, and some sort of data transfer software. This may be built in to the database (in which case the database is said to be XML-enabled) or might be third-party software, such as middleware or an XML server. If your data is semi-structured, such as the yellow pages in a telephone book or health data, you have two choices. You can try to fit your data into a well-structured database, such as a relational database, or you can store it in a native XML database, which is designed to handle semi-structured data.}} ([http://www.rpbourret.com/xml/XMLDatabaseProds.htm R. Bourret], retrieved 17:06, 3 November 2006 (MET))
To store and retrieve the data in data-centric documents, what kind of software you need will depend on how well structured your data is. For highly structured data, such as the white pages in a telephone book, you will need an XML-enabled database that is tuned for data storage, such as a relational or object-oriented database, and some sort of data transfer software. This may be built in to the database (in which case the database is said to be XML-enabled) or might be third-party software, such as middleware or an XML server. If your data is semi-structured, such as the yellow pages in a telephone book or health data, you have two choices. You can try to fit your data into a well-structured database, such as a relational database, or you can store it in a native XML database, which is designed to handle semi-structured data.}} ([http://www.rpbourret.com/xml/XMLDatabaseProds.htm R. Bourret], retrieved 17:59, 3 November 2006 (MET))


== Standards ==
== Standards ==


 
* [http://www.w3.org/TR/xquery/ XQuery 1.0: An XML Query Language] (the central document). XQuery can only query (not update). See the [[XQuery]] article.
* [http://www.w3.org/TR/xquery/ XQuery 1.0: An XML Query Language] (the central document). XQuery can only query (not update).
* There are some other parts, e.g. [http://www.w3.org/TR/xquery-use-cases/ XML Query Use Cases] and [http://www.w3.org/TR/xmlquery-req XML Query Requirements] and [http://www.w3.org/TR/query-datamodel/ XML Query Data Model] (W3C Working Draft). XQuery is a full programming language and comes with 2 syntaxes: one human readable and on in XML ;)
* There are some other parts, e.g. [http://www.w3.org/TR/xquery-use-cases/ XML Query Use Cases] and [http://www.w3.org/TR/xmlquery-req XML Query Requirements] and [http://www.w3.org/TR/query-datamodel/ XML Query Data Model] (W3C Working Draft). XQuery is a full programming language and comes with 2 syntaxes: one humain readable and on in XML ;)
* [http://www.w3.org/TR/xqupdate/ xqupdate - XQuery Update Facility]. This will replace Xupdate (a very informal "standard")
* [http://www.w3.org/TR/xqupdate/ xqupdate - XQuery Update Facility]. This will replace Xupdate (a very informal "standard")
* [http://www.w3.org/TR/xpath20/ Xpath 2.0] is used both by XSLT 2.0 and XQuery. XQuery is an extension of XPath 2.0 and both XSLT 2.0 and XQuery 2.0 share the same data model.
* [http://www.w3.org/TR/xpath20/ Xpath 2.0] is used both by XSLT 2.0 and XQuery. XQuery is an extension of XPath 2.0 and both XSLT 2.0 and XQuery 2.0 share the same data model.
Line 19: Line 21:
== Software ==
== Software ==


* [http://xml.apache.org/xindice/ Xindice] is a native XML database. Supports XPath queries and XML:DBXUpdate. Java and has a XML-RPC API plugin. It has grown out from dbXML is a native XML database (JDK 1.3, LGPL licence). See also the [http://wiki.apache.org/xindice/ Xindice Wiki] at Apache for more information, e.g. a pointer to [http://wiki.apache.org/xindice/XindicePHPAdmin XIndicePHPAdmin]. There exist other PHP libraries, but most projects seem to be inactive. [This project seems to stall since sept 2004, see dbXML or eXist)
=== Native XML databases ===
* [http://www.dbxml.com/ dbXML], GPL license. dbXML is a Native XML Database (NXD). NXDs are databases that store XML using an internalized format for faster overall processing. dbXML was developed using the Java 2 Standard Edition version 1.4.
(just some popular free ones)
* [http://exist.sourceforge.net/ eXist] XML database and retrieval engine, an alternatives to popular Xindice. [1.0b2 in 6/2005, tested @Tecfa but in in production - faster (for lots of small documents) than Xindice, more other features too that I like: XQuery, XUpdate, XML:DB API, xml-rpc support, REST support, WebDav, PHP API (needs dowloading from CVS), good Cocoon Integration ]. A side note for qualitative analysis people: seed [http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2004/08/14/xml-dbs-as-research-tools XML DBs as Research Tools] (by darcusblog)
 
* [http://www.informatik.fh-wiesbaden.de/~turau/DB2XML/index.html DB2XML: A tool for transforming relational databases into XML documents] As Servlet or standalone. Xindice is a fork of this (don't know if db2xml development continues).
According to [http://www.rpbourret.com/xml/ProdsNative.htm R. Bourret], retrieved 13:38, 24 November 2007 (MET), native XML databases differ from XML-enabled databases in three main ways:
* Native XML databases can preserve physical structure (entity usage, CDATA sections, etc.) as well as comments, PIs, DTDs, etc.
* Native XML databases can store XML documents without knowing their schema (DTD), assuming one even exists.
* The only interface to the data in native XML databases is XML and related technologies (such as XQuery, XPath, the DOM) or an XML-specific API, such as the XML:DB API. XML-enabled databases, on the other hand, offer direct access to the data, such as through ODBC.
 
'''List of some XML databases''':
 
* [http://exist.sourceforge.net/ eXist] is a poplar and easy to use XML database and retrieval engine. Features: XQuery, XUpdate, XML:DB API, xml-rpc support, [[REST]] support, WebDav, PHP API, etc. This is the only one [[User:Daniel K. Schneider|DKS]] tested sometimes in the past and it perfectly worked. Newer versions probably add lots of new features.....
** A side note for qualitative analysis people: see [http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2004/08/14/xml-dbs-as-research-tools XML DBs as Research Tools] (by darcusblog)
 
* [http://xml.apache.org/xindice/ Xindice] is a native XML database. Supports XPath queries and XML:DBXUpdate. Java and has a XML-RPC API plugin. It has grown out from dbXML is a native XML database (JDK 1.3, LGPL licence). See also the [http://wiki.apache.org/xindice/ Xindice Wiki] at Apache for more information
 
* [http://basex.org/ BaseX] Processing and Visualizing XML Data with a native XML Database. A Java-based [[XQuery]] processor that will create XML databases (somewhere in your home) and allow for some data visualizations with TreeMaps and Scatterplots. (April 2009)
 
* [http://www.modis.ispras.ru/sedna Sedna] is an open source XML database management system. It is a XML-native system developed from scratch in C/C++ and Scheme. Sedna is a full-featured database system that supports queries, updates, ACID transactions, security, etc. Sedna is designed to be fast, reliable and easy-to-use for production-grade applications. Apache license (free).
 
* dbXML was a Native XML Database (NXD) and was developed using the Java 2 Standard Edition version 1.4. Dead project, later became Xindice and others ....
 
* [http://www.oracle.com/database/berkeley-db/xml/index.html Oracle Berkeley DB XML]. Open source, embeddable XML database with XQuery-based access to documents stored in containers and indexed based on their content. Oracle Berkeley DB XML is built on top of Oracle Berkeley DB. ('''broken link''')
 
* [http://monetdb.cwi.nl/projects/monetdb/XQuery/index.html MonetDB database system with XQuery front-end]. Supports XQuery and XUpdate.
 
* [http://www.marklogic.com/ MarkLogic Server]. Proprietary noSQL (and XML ?) Database.


== Discussion ==
== Discussion ==


[[User:DSchneider|DSchneider]] believes that native XML databases eXist "will take off" in our field as soon as there are sufficient PHP APIs. eXist is both a server but there is also a portal written in Java (which makes this technology not very accessible to the "bricoleur" world of education). The advantage of an XML-database is that you can just "stick in" XML contents and then retrieve them with XQuery expressions. Adding new information structures may require rewriting of interfaces to XQuery to optimize retrieving, but otherwise flexibility comes at a much lower price....
[[User:DSchneider|DSchneider]] believes that native XML databases like eXist "will take off" in our field as soon as there are sufficient PHP APIs or maybe some XML-based authoring frameworks. For example, the [[eXist XML database]] is both a database server and a portal based on Java/Cocoon technology. This technology is not very accessible to the "bricoleur" world of education.
 
The advantage of an XML-database is that you can just "stick in" XML contents and then retrieve them with XQuery expressions. Native XML databases are most commonly used to store document-centric documents and that's something we may see a lot in education (e.g. descriptions of pedagogical scenarios, lesson plans, contents other than [[IMS Content Packaging]]. Adding new information structures may require writing interfaces to XQuery, but this kind of flexibility comes at a much lower price. Most of these database now support standardized XQuery Update. Lack of an updating standard was a problem that is now almost solved - 21:26, 1 November 2007 (MET)
 
See some discussion in the [[DITA]] article and also El-Seoud et al. (2007)
 
== Links ==
 
* (more needed here)
 
* [http://www.rpbourret.com/xml/XMLDatabaseProds.htm XML Database Products] by V. Bourret (best resource)
 
* Yuli Vasiliev (2007), PHP Oracle Web Development: Data processing, Security, Caching, XML, Web Services, and Ajax (Paperback). ISBN-10: 1847193633. Book excepts are available at [http://search.internet.com/query.php?site=webreference&IC_QueryText=xml+enabled+applications&Search=Search webreference.com] ("XML enabled applications").
 
* Elliotte Rusty Harold (2007), Native XML Databases, eXQuisite or eXcruciating?, New York PHP Users Group, Presentation slides, [http://www.cafeconleche.org/slides/nyphp/xquery/ HTML pages] - [http://www.cafeconleche.org/slides/nyphp/xquery/Native_XML_Databases.html HTML 1 page]
 
* [http://monetdb.cwi.nl/XQuery/Benchmark/XMark/ XQuery performance on the XMark benchmark], Several DBs compared by MonetDB
 
* [http://dream.berkeley.edu/~cwhitney/xml_db/ Comparison and Review: eXist, Xindice, Berkeley DB XML] (2006).
 
== References ==
 
* El-Seoud, Samir Abou; Hosam El-Sofany, Fayed Ghaleb, Sameh Daoud, Jihad AL Ja'am, Ahmad Hasna (2007). XML and Databases for E-Learning Applications, International Journal of Emerging Technologies in Learning (iJET), Vol 2, No 4 [http://www.online-journals.org/index.php/i-jet/article/view/190 Abstract/PDF]


[[Category: XML]]
[[Category: XML]]
[[Category: Technologies]]
 
[[Category:databases]]

Latest revision as of 11:58, 12 February 2013

Definition

In this article, we are primarily interested in Native XML Databases, “Databases that store XML in "native" form, generally as some variant of the DOM mapped to an underlying data store. This includes the category formerly known as persistent DOM (PDOM) implementations.For data- and document-centric applications.”(R. Bourret, retrieved 17:59, 3 November 2006 (MET))

See also XQuery (the XML Query language) and Database (overview of various kinds of databases).

When do we need an XML database

Ronald Bourret (2006) summarizes the major different XML and databases scenarios:

To store and retrieve the data in data-centric documents, what kind of software you need will depend on how well structured your data is. For highly structured data, such as the white pages in a telephone book, you will need an XML-enabled database that is tuned for data storage, such as a relational or object-oriented database, and some sort of data transfer software. This may be built in to the database (in which case the database is said to be XML-enabled) or might be third-party software, such as middleware or an XML server. If your data is semi-structured, such as the yellow pages in a telephone book or health data, you have two choices. You can try to fit your data into a well-structured database, such as a relational database, or you can store it in a native XML database, which is designed to handle semi-structured data.

(R. Bourret, retrieved 17:59, 3 November 2006 (MET))

Standards

Software

Native XML databases

(just some popular free ones)

According to R. Bourret, retrieved 13:38, 24 November 2007 (MET), native XML databases differ from XML-enabled databases in three main ways:

  • Native XML databases can preserve physical structure (entity usage, CDATA sections, etc.) as well as comments, PIs, DTDs, etc.
  • Native XML databases can store XML documents without knowing their schema (DTD), assuming one even exists.
  • The only interface to the data in native XML databases is XML and related technologies (such as XQuery, XPath, the DOM) or an XML-specific API, such as the XML:DB API. XML-enabled databases, on the other hand, offer direct access to the data, such as through ODBC.

List of some XML databases:

  • eXist is a poplar and easy to use XML database and retrieval engine. Features: XQuery, XUpdate, XML:DB API, xml-rpc support, REST support, WebDav, PHP API, etc. This is the only one DKS tested sometimes in the past and it perfectly worked. Newer versions probably add lots of new features.....
  • Xindice is a native XML database. Supports XPath queries and XML:DBXUpdate. Java and has a XML-RPC API plugin. It has grown out from dbXML is a native XML database (JDK 1.3, LGPL licence). See also the Xindice Wiki at Apache for more information
  • BaseX Processing and Visualizing XML Data with a native XML Database. A Java-based XQuery processor that will create XML databases (somewhere in your home) and allow for some data visualizations with TreeMaps and Scatterplots. (April 2009)
  • Sedna is an open source XML database management system. It is a XML-native system developed from scratch in C/C++ and Scheme. Sedna is a full-featured database system that supports queries, updates, ACID transactions, security, etc. Sedna is designed to be fast, reliable and easy-to-use for production-grade applications. Apache license (free).
  • dbXML was a Native XML Database (NXD) and was developed using the Java 2 Standard Edition version 1.4. Dead project, later became Xindice and others ....
  • Oracle Berkeley DB XML. Open source, embeddable XML database with XQuery-based access to documents stored in containers and indexed based on their content. Oracle Berkeley DB XML is built on top of Oracle Berkeley DB. (broken link)

Discussion

DSchneider believes that native XML databases like eXist "will take off" in our field as soon as there are sufficient PHP APIs or maybe some XML-based authoring frameworks. For example, the eXist XML database is both a database server and a portal based on Java/Cocoon technology. This technology is not very accessible to the "bricoleur" world of education.

The advantage of an XML-database is that you can just "stick in" XML contents and then retrieve them with XQuery expressions. Native XML databases are most commonly used to store document-centric documents and that's something we may see a lot in education (e.g. descriptions of pedagogical scenarios, lesson plans, contents other than IMS Content Packaging. Adding new information structures may require writing interfaces to XQuery, but this kind of flexibility comes at a much lower price. Most of these database now support standardized XQuery Update. Lack of an updating standard was a problem that is now almost solved - 21:26, 1 November 2007 (MET)

See some discussion in the DITA article and also El-Seoud et al. (2007)

Links

  • (more needed here)
  • Yuli Vasiliev (2007), PHP Oracle Web Development: Data processing, Security, Caching, XML, Web Services, and Ajax (Paperback). ISBN-10: 1847193633. Book excepts are available at webreference.com ("XML enabled applications").
  • Elliotte Rusty Harold (2007), Native XML Databases, eXQuisite or eXcruciating?, New York PHP Users Group, Presentation slides, HTML pages - HTML 1 page

References

  • El-Seoud, Samir Abou; Hosam El-Sofany, Fayed Ghaleb, Sameh Daoud, Jihad AL Ja'am, Ahmad Hasna (2007). XML and Databases for E-Learning Applications, International Journal of Emerging Technologies in Learning (iJET), Vol 2, No 4 Abstract/PDF