EPub: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
 
(39 intermediate revisions by 2 users not shown)
Line 5: Line 5:


{{quotation|".epub" is the file extension of an XML format for reflowable digital books and publications. ".epub" is composed of three open standards, the Open Publication Structure (OPS), Open Packaging Format (OPF) and Open Container Format (OCF), produced by the IDPF.}} ([http://www.openebook.org/ dipf], retrieved 22:38, 26 February 2009 (UTC))
{{quotation|".epub" is the file extension of an XML format for reflowable digital books and publications. ".epub" is composed of three open standards, the Open Publication Structure (OPS), Open Packaging Format (OPF) and Open Container Format (OCF), produced by the IDPF.}} ([http://www.openebook.org/ dipf], retrieved 22:38, 26 February 2009 (UTC))
Notice: Contents of this page should be updated to Epub 3.x. Some software links also may be missing - [[User:Daniel K. Schneider|Daniel K. Schneider]] ([[User talk:Daniel K. Schneider|talk]]) 22:31, 27 September 2015 (CEST)
See also:
* [[e-book]] (overview article)
* [[e-book reader]] (short hardware overview)
* [[e-book conversion with Calibre]]


== Software and Formats ==
== Software and Formats ==
Line 10: Line 17:
ePub can be authored and read with an increasing set of software. Since it is an open standard, it does have support from various vendors and publishers (see e.g. [http://www.informationweek.com/blog/main/archives/2009/02/tim_oreilly_unp.html Tim O'Reilly Unplugged: The Kindle 2 And Transforming Industries]).
ePub can be authored and read with an increasing set of software. Since it is an open standard, it does have support from various vendors and publishers (see e.g. [http://www.informationweek.com/blog/main/archives/2009/02/tim_oreilly_unp.html Tim O'Reilly Unplugged: The Kindle 2 And Transforming Industries]).


== Formats ==
Versions:
* Epub 2 (2007)
* Epub 2.01 (2010)
* Epub 3.0 (2011, recommendation)
* Epub 3.0.1 (2014, specification)
:* [http://www.idpf.org/epub/30/spec/epub30-overview.html EPUB 3 Overview]
 
This article describes mostly Epub 2.0


ePub formats are defined with [[Relax NG]] but rely on other standards too.
== The Epub 2.x Formats ==
 
ePub formats are defined with [[Relax NG]] but rely on other standards too. Ebup 2.x was finalised in 2010 and should work with any reader (even an "old" one). Since 2011, the recommended standard is ePub 3.x.


=== Overview ===
=== Overview ===


The [http://www.openebook.org/specs.htm ePub Specification] comes in three parts:
The [http://www.openebook.org/specs.htm ePub 2.0 Specification] comes in three standards that cover two parts:
; (1) The open publication structure (OPS) - 09/11/07
; (1) The open publication structure (OPS) - 09/11/07
* The '''Open Publication Structure 2.0 (OPS)''' is an XML-based standard for authoring digital publications. Contents can be marked up either with a subset of XHTML or Daisy DTBook.
* The '''Open Publication Structure 2.0 (OPS)''' is an XML-based standard for authoring digital publications. Contents can be marked up either with a subset of XHTML or Daisy DTBook.
Line 25: Line 41:


ePub contents may be DRM controlled, but must not ...
ePub contents may be DRM controlled, but must not ...
ePub 2.01 uses:
* Open Publication Structure (OPS) 2.0.1
* Open Packaging Format (OPF) 2.0.1
* Open Container Format (OCF) 2.0.1


=== The container and packaging ===
=== The container and packaging ===
Line 35: Line 56:
:: META-INF folder
:: META-INF folder
::: container.xml  
::: container.xml  
:: OEBPS folder
:: OPS folder
::: fonts folder (includes ttf fonts used)
::: fonts folder (includes ttf fonts used)
::: content1.xhtml
::: content1.xhtml
Line 284: Line 305:


OPS Style sheets are CSS2 styles in the XML tradition, i.e. selectors and attribute names are case sensitive. Again, like for XHTML, there are some restrictions.
OPS Style sheets are CSS2 styles in the XML tradition, i.e. selectors and attribute names are case sensitive. Again, like for XHTML, there are some restrictions.
== Epub 3.0x ==
Epub 3.0 was introduced in 2011 and in 2014 version 3.1 was published.
According to the [http://www.idpf.org/epub/30/spec/epub30-changes.html official manual], EPUB 3's base content format is now based on the XML serialization of HTML5 (XHTML5) [<abbr>ContentDocs30</abbr>], whereas EPUB2 supported two basic content types: a profile of XHTML 1.1 and DTBook [<abbr>OPS2</abbr>] (a semantically-enhanced markup focused on accessibility concerns) [...] the EPUB 3 XHTML Content Document definition includes both extensions to and restrictions on its HTML5 base.
In summary:
* Epub contents are a subset version of HTML5 that respects XML syntax (closed tags).
* Epub 3.0 allows for some interactivity using a special attribute for triggers and a JS API
* Styling is done with a subset of CSS3 (mostly CSS2)
* Rich media (e.g. video) are supported


== Software ==
== Software ==
Line 289: Line 321:
=== Authoring software ===
=== Authoring software ===


{{quotation|Web-standard formats such as XHTML make up the core of OPS. OPS files can be created using a wide variety of Web and eBook publishing tools. They can also be created using XML editors such as Altova’s XML Spy. Adobe InDesign CS3 supports the direct generation of OCF-packaged OPS content as an export function. OCF files can also easily be created using standard ZIP applications like WINZIP}} ([https://www.idpf.org/forums/viewtopic.php?t=22 IDPF FAQ], retrieved 22:38, 26 February 2009 (UTC).
(1) One solution for authoring new e-pub contents is using an XHTML authoring tool. To create the epub archive, there are two options so far:
* Build the epub archive manually which is not too difficult for a technical person.
* Use conversion software (see below), some of which is good. E.g. Calibre can handle multi-file documents in various input formats and it also can split a well structured XHTML file with an XPath expression into several sub-files.
 
(2) An other solution is use an Epub editor (i.e. Sigil, introduced below)
 
(3) Many authors probably use a combination. Produce contents with any sort of XHTML editor and the use an authoring tool to fine tune the contents.
 
; '''Sigil'''
: [http://sigil-ebook.com/ Sigil] seems to be the best free authoring software so far.
* Download: http://sigil-ebook.com/get/
* Source code: https://github.com/Sigil-Ebook/Sigil


We didn't find any convincing open source software. Well, one can write in XHTML and then use a translator. But the problem is that one should use several smaller XHTML files since epub is for small devices. To create the epub archive:
;'''BlueGriffon EPUB Edition'''
* Build the epub archive manually which is not difficult for a technical person.
: [http://www.bluegriffon-epubedition.com/BGEE.html Product home page]
* Use a better conversion software. E.g. Calibre can handle multi-file documents and it also can split a well structured XHTML files with an XPath expression.
* Commercial (195 Euros)
* Epub3 compatible
* Not tested so far ...


; '''eScape'''
;'''Jutoh'''
* [http://www.infogridpacific.com/igp/AZARDI/eScape -ODT2ePub/ EScape] is a free '''for non-commercial use''' Open Office Writer to ePub creator. According to the [http://www.infogridpacific.com/igp/AZARDI/eScape -ODT2ePub/ home page], production is easy:
: [http://www.jutoh.com/ Product home page] is both a convertor and editor.
# Use the supplied Open Office Template (OTT) to Structure-Style™= your text files in the familiar Open Office Writer environment.
* Commercial (30 Euros)
# Open eScape, browse to your ODT, cover image and select a CSS.
# Set a few reader friendly options and click "Create ePub"
(Not tested so far).


=== Conversion software ===
=== Conversion software ===
Line 307: Line 349:
* PDF input leads to really ugly output
* PDF input leads to really ugly output
* Web pages like this mediawiki page translate badly, e.g. bullets go away.
* Web pages like this mediawiki page translate badly, e.g. bullets go away.
; Atlantis Word Processor
: [http://www.AtlantisWordProcessor.com Atlantis Word Processor] converts any document to EPUB. Supports multilevel TOCs, font embedding, and batch conversion.
: Source formats: RTF, DOC, DOCX, ODT, TXT.
: Platform: Windows.


; Calibre
; Calibre
Line 315: Line 363:
: Output formats: EPUB, LRF, MOBI
: Output formats: EPUB, LRF, MOBI
: Easy to install under Ubuntu (tested with 8.04 Hardy Heron and 9.04 Jaunty). You can do it with one command line.
: Easy to install under Ubuntu (tested with 8.04 Hardy Heron and 9.04 Jaunty). You can do it with one command line.
* Read [[e-book conversion with Calibre]] for a short how-to.
; Writer2ePub
* [http://extensions.services.openoffice.org/en/project/Writer2ePub Writer2ePub] is an OpenOffice.org extension that creates an ePub file from any document openable by the OOo Word Processor.
* Quote: {{quotation|W2E is an ePub creator. Simply write your document with the OOo Word Processor and W2E will make an ePub file using the best traditional typographic rules, by applying a predefined style sheet (CSS). If you need a good ePub document and you can use the OOo Writer Word Processor, W2E is your tool.}}


; eCub
; eCub
: [http://www.juliansmart.com/ecub eCub] a simple to use EPUB and MobiPocket ebook creator.
: Seems to be a dead project (as of 2014/2015)
: Cross-platform: Win/Mac/Linux/etc.
: Converter can make a book out of several HTML files and add Author, Title picture, etc.
: Input formats: txt and XHTML
: Output formats: EPUB and MobiPocket
: Extra features: Can convert book content to audio MP3 and WAV.
: Easy to install under Ubuntu (tested with 8.04 Hardy Heron and 9.04 Jaunty). All you need to do is to click on the debian distribution link and install.


; On-line conversion tools:
; On-line conversion tools:
Line 333: Line 380:
; Wiki 2 epub
; Wiki 2 epub
* We did not find any software, but as we describe below, wiki to xhtml to epub somewhat works.
* We did not find any software, but as we describe below, wiki to xhtml to epub somewhat works.
* There is an on-demand Epub converter for MediaWikis. However it only works with some versions: [http://www.mediawiki.org/wiki/Extension_talk:EPubExport Extension talk:EPubExport]


=== Creation and conversion testing ===
=== Integrated online environments ===


; A (preliminary) conversion test of a huge HTML file
* [http://www.zinepal.com/ Zinepal] allows to create magazines in various formats (including Epug). Sign-up is required.
([[User:Daniel K. Schneider|Daniel K. Schneider]] 19:17, 22 April 2009 (UTC))


I tried to convert mediawiki contents, i.e. what could be called a [[wiki book]].
=== Validation ===


* I took a larger list of flash tutorials from this wiki (bad idea since they include huge pictures) and generated a single (ugly) html page with the pdfbook generator (800MB for the HTML only).
* [http://www.bluegriffon-epubedition.com/BGEV.html BlueGriffon EPUB3 Validator] (Firefox extension)
* This ''some_flash_tutorials.html'' file then had to be repaired with tidy. I also converted it to real XHTML.
tidy -o flash_tutorials.xhtml -asxhtml some_flash_tutorials.html
* I opened the XHTML file in a browser and then "saved as complete web page" in a directory. This is important in order to have a local copy of all images.
* I then imported the XHTML File to Calibri and converted to epub.
** I changed the XPath expression for chapter detection to "//h1" since each each wiki page starts with h1.


The result was sort of ok, a "442 pages" 8MB file
* [https://github.com/IDPF/epubcheck EpubCheck] is a Java-based commandline tool to validate EPUB files. Also can be used as programming library.
* Some links ("a" tags) were wrong, i.e. extented over several pages. They were ok in the original XHTML file. The conversion software cannot handle something like this and looses a closing "a" somewhere.
<pre>
<div class="thumb tleft">
  <div class="thumbinner" style="width: 182px;">
  <a href="http://edutechwiki.unige.ch/en/Image:Flash-cs3-tools-panel-items.png"
    class="image" title="Items of the Flash CS3 tools panel">
  <img height="480" border="0" width="180"
      alt="Items of the Flash CS3 tools panel"
      src="flash_tutorials_files/180px-Flash-cs3-tools-panel-items.png"
      class="thumbimage"/></a>
  <div class="thumbcaption">
  <div class="magnify">
  <a href="http://edutechwiki.unige.ch/en/Image:Flash-cs3-tools-panel-items.png"
    class="internal" title="Enlarge"/></div>
Items of the Flash CS3 tools panel</div>
</div>
</div>
</pre>
* There were too many wiki links (normal since my wiki pages are linked).
Basically, I'd have to clean up the XHTML to get a better result, i.e. remove some wiki things that are really not needed.
 
I then also tested eCub.
* Copy the xhtml file plus the image directory into the project directory.
* In Options, tick '''Portable mode'''. This will include images
 
The result was a also a 8.7MB file. This software is a bit easier to use and results were slightly better, but the links problem was the same. But it can't split a file into chapters, i.e. create a table of contents for a single big HTML file. Therefore, one has to import xhtml files one by one in order to get a chapter structure.
 
Conclusion: It is possible to create rather large e-books from mediawiki pages. But for quality results there is manual work to be done (or filtering script writing).


=== Reader Software ===
=== Reader Software ===
Line 384: Line 398:


There exist several ePub capable readers. Here are just some of these:
There exist several ePub capable readers. Here are just some of these:
; Firefox
: [https://addons.mozilla.org/en-US/firefox/addon/epubreader/ EPUBReader]
: Good quality reader and one of the easiest (however it seems to need a fair amount of CPU power)


; FBReader
; FBReader
Line 392: Line 410:


; Calibre
; Calibre
: Multi-purpose tool, see above
: Calibre is a converting tool (see above), but also includes a (OK, but not too good) reader.
 
; Kobo
: Kobo is an ecosystem like Amazon: It has a store, your online library, an (optional ebook reader), etc. Ebooks are in Epub format and can be read on any device using the Kobo e-book reader software. e.g. a laptop computer or an android tablet/phone. The software can automatically synchronize with your online library, but you also can take just one book, e.g. on your mobile phone when you are stuck waiting somwhere ...
: You also can download the e-pub from the online library. I recommend this, if you buy expensive books. The file has an Adobe DRM on it and must be opened with Adobe Digital Editions. If you want to read it on Linux, you must crack the file.
: Read [http://ebookfriendly.com/kobo-tips-tricks/ 10 simple Kobo tips and tricks]


; Openberg
; Openberg
Line 404: Line 427:
; Stanza
; Stanza
: [http://www.lexcycle.com/ Lexcycle Stanza] (favorite free iPhone/iPod reader). Win/Mac/i*.
: [http://www.lexcycle.com/ Lexcycle Stanza] (favorite free iPhone/iPod reader). Win/Mac/i*.
: [http://en.wikipedia.org/wiki/Lexcycle_Stanza Wikipedia Entry]  
: [http://en.wikipedia.org/wiki/Lexcycle_Stanza Wikipedia Entry]


=== Hardware ===
=== Hardware ===
Line 421: Line 444:


See [[e-book]].
See [[e-book]].
=== DRM removal ===
Read [http://helpx.adobe.com/digital-editions.html Digital Editions Help] first, there may be a solution without removing the DRM that fits your needs.
While I understand that editors want to sell books (as opposed to see just few persons downloading and sharing them), I still argue against DRM. I would not oppose watermarking, e.g. with a (verified) real name and email address.
* Publishing houses like O'Reilly don't use DRM and still seem to sell books (I, for example, bought a whole lot). Often you can get them half price, e.g. about $20 a piece.
* DRM doesn't allow you to read books from your Linux systems.
* Books with DRM's are really difficult to install on mobile phones. In theory there is a way, but I did not manage on a Galaxy S3
* As a researcher, I want to be able to copy/paste sentences, but I cannot copy/paste from Digital editions
* Some publishing houses sell e-books with dreadful formatting that would need some fixing (ok I wouldn't take time to do that but others might). Most often maps and other detailed graphics are unreadable (maybe the quality is better in the assets directory)
* As a citizen, I do not want Adobe to register what I read.
- [[User:Daniel K. Schneider|Daniel K. Schneider]] ([[User talk:Daniel K. Schneider|talk]]) 17:43, 30 June 2014 (CEST)
Therefore, removing the DRM is just fine (and legal in most countries), unless you redistribute cracked books (which I don't do)
Solutions (careful ! Some of the software may be dangerous)
* Check Alf's Calibre plugin, read [https://apprenticealf.wordpress.com/2011/01/13/ebooks-formats-drm-and-you-%E2%80%94-a-guide-for-the-perplexed/ Ebook Formats, DRM and You — A Guide for the Perplexed] and [https://apprenticealf.wordpress.com/2012/09/10/drm-removal-tools-for-ebooks/ DRM Removal Tools for eBooks] and finally [https://apprenticealf.wordpress.com/2012/09/10/calibre-plugins-the-simplest-option-for-removing-most-ebook-drm/ DeDRM plugin for calibre: the simplest option for removing DRM from most ebooks] by Apprentice Alf.
* Under Linux you have to run Wine, then use the above, e.g. read [http://dikkiisdiatribe.blogspot.ch/2013/01/alternate-method-linux-calibre-e-books.html
* [http://www.epubconverter.com/ Epub Converter] sells a [http://www.epubconverter.com/epub-drm-removal/ Alternate method - Linux, Calibre, e-books (epub) and DRM] and [http://dikkiisdiatribe.blogspot.com.au/2013/01/how-to-get-round-drm-issues-with-e.html How to get round DRM issues with e-books in Linux (epub)]
EPUB DRM Removal - Download] tool (not tested)
* [http://www.epubsoft.com/drm-removal.html EPubSoft] PDF / EPub DRM Removal (not tested)
== Conversion testing ==
=== A (preliminary) conversion test of a huge HTML file ===
- [[User:Daniel K. Schneider|Daniel K. Schneider]] 19:17, 22 April 2009. Therefore this section does need to upgraded !!
I tried to convert mediawiki contents, i.e. what could be called a [[wiki book]].
* I took a larger list of flash tutorials from this wiki (bad idea since they include huge pictures) and generated a single (ugly) html page with the pdfbook generator (800MB for the HTML only).
* This ''some_flash_tutorials.html'' file then had to be repaired with tidy. I also converted it to real XHTML.
tidy -o flash_tutorials.xhtml -asxhtml some_flash_tutorials.html
* I opened the XHTML file in a browser and then "saved as complete web page" in a directory. This is important in order to have a local copy of all images.
* I then imported the XHTML File to Calibre and converted to epub.
** I changed the XPath expression for chapter detection to "//h1" since each each wiki page starts with h1.
The result was sort of ok, a "442 pages" 8MB file
* Some links ("a" tags) were wrong, i.e. extented over several pages. They were ok in the original XHTML file. The conversion software cannot handle something like this and looses a closing "a" somewhere.
<pre>
<div class="thumb tleft">
  <div class="thumbinner" style="width: 182px;">
  <a href="http://edutechwiki.unige.ch/en/Image:Flash-cs3-tools-panel-items.png"
    class="image" title="Items of the Flash CS3 tools panel">
  <img height="480" border="0" width="180"
      alt="Items of the Flash CS3 tools panel"
      src="flash_tutorials_files/180px-Flash-cs3-tools-panel-items.png"
      class="thumbimage"/></a>
  <div class="thumbcaption">
  <div class="magnify">
  <a href="http://edutechwiki.unige.ch/en/Image:Flash-cs3-tools-panel-items.png"
    class="internal" title="Enlarge"/></div>
Items of the Flash CS3 tools panel</div>
</div>
</div>
</pre>
* There were too many wiki links (normal since my wiki pages are linked).
Basically, I'd have to clean up the XHTML to get a better result, i.e. remove some wiki things that are really not needed.
I then also tested eCub.
* Copy the xhtml file plus the image directory into the project directory.
* In Options, tick '''Portable mode'''. This will include images
The result was a also a 8.7MB file. This software is a bit easier to use and results were slightly better, but the links problem was the same. But it can't split a file into chapters, i.e. create a table of contents for a single big HTML file. Therefore, one has to import xhtml files one by one in order to get a chapter structure.
Conclusion: It is possible to create rather large e-books from mediawiki pages. But for quality results there is manual work to be done (or filtering script writing).
== In education ==
* There is increasing interest for using EPUB (version 3) to create electronic textbooks, e.g. see the IMS [http://www.imsglobal.org/edupub/index.html EduPub] initiative.


== Links ==
== Links ==
Line 426: Line 521:
; General
; General
* [http://wiki.mobileread.com/wiki/EPUB EPUB]  (mobileread.com wiki)
* [http://wiki.mobileread.com/wiki/EPUB EPUB]  (mobileread.com wiki)
* [https://en.wikipedia.org/wiki/EPUB ePub] (Wikipedia)


; Official
; Official
* [http://www.openebook.org/ International Digital Publishing Forum] (idpf, where the standards are defined)
* [http://www.openebook.org/ International Digital Publishing Forum] (idpf, where the standards are defined)
; Software
* [http://epubtest.com/resources.php epubtest] has a good list of software, including commercial and free authoring/conversion tools
* http://android.stackexchange.com/questions/21093/is-there-any-android-epub-reader-supporting-epub3-with-audio-video-javascript


; Contents
; Contents
* [http://www.epubbooks.com/ Where to download ePub Books online]. Several publishers are listed, but also free contents.
* [http://www.epubbooks.com/ Where to download ePub Books online]. Several publishers are listed, but also free contents.
* [http://www.feedbooks.com/ Feedbooks] (free contents)
* [http://www.feedbooks.com/ Feedbooks] (free contents)
; Tutorials and technical tips
* [http://www.teleread.org/2010/01/03/the-abcs-of-format-conversion-for-the-kindle-sony-and-nook-plus-some-calibre-tips/ The ABCs of e-book format conversion: Easy Calibre tips for the Kindle, Sony and Nook] By John Schember
* [http://www.ibm.com/developerworks/web/library/x-richlayoutepub/index.html Create rich-layout publications in EPUB 3 with HTML5, CSS3, and MathML], by Liza Daly, July 2012.


[[Category: XML]]
[[Category: XML]]
[[Category: Standards]]
[[Category: Standards]]
[[Category: Document standards]]
[[Category: Document standards]]
[[Category: E-book]]

Latest revision as of 21:05, 26 March 2017

Definition

ePub is a popular open e-book standard.

“".epub" is the file extension of an XML format for reflowable digital books and publications. ".epub" is composed of three open standards, the Open Publication Structure (OPS), Open Packaging Format (OPF) and Open Container Format (OCF), produced by the IDPF.” (dipf, retrieved 22:38, 26 February 2009 (UTC))

Notice: Contents of this page should be updated to Epub 3.x. Some software links also may be missing - Daniel K. Schneider (talk) 22:31, 27 September 2015 (CEST)

See also:

Software and Formats

ePub can be authored and read with an increasing set of software. Since it is an open standard, it does have support from various vendors and publishers (see e.g. Tim O'Reilly Unplugged: The Kindle 2 And Transforming Industries).

Versions:

  • Epub 2 (2007)
  • Epub 2.01 (2010)
  • Epub 3.0 (2011, recommendation)
  • Epub 3.0.1 (2014, specification)

This article describes mostly Epub 2.0

The Epub 2.x Formats

ePub formats are defined with Relax NG but rely on other standards too. Ebup 2.x was finalised in 2010 and should work with any reader (even an "old" one). Since 2011, the recommended standard is ePub 3.x.

Overview

The ePub 2.0 Specification comes in three standards that cover two parts:

(1) The open publication structure (OPS) - 09/11/07
  • The Open Publication Structure 2.0 (OPS) is an XML-based standard for authoring digital publications. Contents can be marked up either with a subset of XHTML or Daisy DTBook.
  • The Open Packaging Format (OPF) 2.0 describes the structure of an .epub in XML
(2) The open container format - 10/27/06
  • The OPS Container Format 1.0 (OCF) is a zip-based standard used to encapsulate publication components for transport and delivery.

ePub contents may be DRM controlled, but must not ...

ePub 2.01 uses:

  • Open Publication Structure (OPS) 2.0.1
  • Open Packaging Format (OPF) 2.0.1
  • Open Container Format (OCF) 2.0.1

The container and packaging

The *.epub zip file by example:

If we create an e-pub version of this page we get a file called xxx.epub. This *.epub file is an OCF zip file. Here is the structure:

edutechwiki_epub.epub (the zip file)
META-INF folder
container.xml
OPS folder
fonts folder (includes ttf fonts used)
content1.xhtml
content.opf
style.css
image_name1
image_name2
....
Mimetype

That kind of packaging structure follows quite a similar philosophy as the IMS Content Packaging standard. I.e. a zip file includes a central xml file (content.opf) that includes the definition of organization (the "spine") and the metadata. It then includes all the assets needed for rendering.

The mime-type, i.e. contents of the Mimetype file is application/epub+zip.

Let us now describe some of its files:

The Open Packaging Format (OPF)

File content.opf describes and organizes the various content elements of the epub package. It also provides metadata about the publication, fallback mechanisms when unsupported extensions are used, and a table of contents.

E.g. an example made with an automatic online converter for this page looks like this:

<?xml version='1.0' encoding='UTF-8'?>
<package xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="bookid">
  <metadata>
    <dc:title>EPub</dc:title>
    <dc:identifier id="bookid">web2fb2_200904221954_3347363837</dc:identifier>
    <dc:language>En</dc:language>
    <dc:creator>Daniel K. Schneider</dc:creator>
    <dc:type>reference</dc:type>
  </metadata>
  <manifest>
    <item id="css" href="style.css" media-type="text/css"/>
    <item id="content1" href="content1.xhtml" media-type="application/xhtml+xml"/>
    <item id="i0ced13f269" href="i0ced13f269" media-type="image/png"/>
    <item id="ib166f0f69c" href="ib166f0f69c" media-type="image/png"/>
    <item id="i7fa52f212a" href="i7fa52f212a" media-type="image/png"/>
    <item id="i33954a4ae2" href="i33954a4ae2" media-type="image/png"/>
    <item id="i8be224f209" href="i8be224f209" media-type="image/png"/>
  </manifest>
  <spine>
    <itemref idref="content1"/>
  </spine>
</package>
(1) The manifest

This manifest (like in IMS content packaging) must include all files that are part of the publication in any order. It must have a structure like this according to the specification. I.e. each item must have an id, an href to a resource and a media-type. In addition, one can define fall-back elements.

 <manifest>
        <item id="intro" href="introduction.html"
                media-type="application/xhtml+xml" />
        <item id="c1" href="chapter-1.html"
                media-type="application/xhtml+xml" />
        <item id="c2" href="chapter-2.html"
                media-type=application/xhtml+xml" />
        <item id="toc" href="contents.xml"
                media-type="application/xhtml+xml"
                fallback="fall1"  />
        <item id="oview" href="arch.png"
                media-type="image/png" />
        <item id="fall1" fallback="fall2"
                href="SomeDoc.pdf"
                media-type="application/pdf" />
 </manifest>
(2) The spine section

“Following manifest, there must be one and only one spine element, which contains one or more itemref elements. Each itemref references an OPS Content Document designated in the manifest. The order of the itemref elements organizes the associated OPS Content Documents into the linear reading order of the publication.” (Open Packaging Format (OPF) 2.0 specification). This spine (one could translate this to "parts") can include three different kinds of files:

  • XHTML
  • XML (yours, i.e. what they call out-of-line XML
  • DTBook

Spine elements refer to resources defined in the manifest and may include a table of contents. E.g. a simple example would look like this:

<manifest>
     <item id="intro"
           href="intro.html"
           media-type="application/xhtml+xml" />
     <item id="chap1"
           href="chap1.html"
           media-type="application/xhtml+xml" />
     <item id="chap2"
           href="chap2.dtb"
           media-type="application/x-dtbook+xml" />
     <item id="chap3"
           href="chap3.html"
           media-type="application/xhtml+xml" />
     <item id="f1"
           href="fig1.jpg"
           media-type="image/jpeg" />

     <!--  ...... other multimedia assets here .... -->

     <item id="toc_item"
           href="toc.ncx"
           media-type="application/x-dtbncx+xml" />
</manifest>

<spine toc="toc_item">
     <itemref idref="intro" />
     <itemref idref="chap1" />
     <itemref idref="chap2" />
     <itemref idref="chap3" />
</spine>
(3) The metadata section

The metadata are defined using Dublin Core plus possible user-defined tags. Some of these metadata are mandatory, i.e. title, identifier and language.

Other remarks

The XHTML files can include various formats, e.g. binary pictures, SVG and in-line XML. All these formats can be style with a subset of CSS2.

This Open Packaging Format (OPF) 2.0 v0.9871.0 is defined as a relax NG schema.

META-INF files in the epub package

All valid OCF Containers must include a directory called META-INF at the root level of the container file system. This directory contains the files specified below that describe the contents, metadata, signatures, encryption, rights and other information about the contained publication. (OCF 1.0 specification, retrieved 19:17, 22 April 2009 (UTC)).

The container.xml file describes in a simple case where to find the content.opf file. In our simple example it looks like this:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

An other example taken from the OCF 1.0 specification show that one could include an alternative PDF file for example:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/My Crazy Life.opf"
     media-type="application/oebps-package+xml" />
    <rootfile full-path="PDF/My Crazy Life.pdf"
     media-type="application/pdf" />
  </rootfiles>
</container>

In addition to container.xml, there can be five other files:

  • manifest.xml
  • metadata.xml
  • signatures.xml
  • encryption.xml
  • rights.xml

The formal specification of these files in META-INF is done with a little Relax NG schema:

  • There is one rule for container (see the examples above).
  • One rule for signatures and that includes the xmldsig-core-schema.rng Schema
  • One rule for encryption that also refers to extenal *.rng Schemas.

The XHTML modules used in OPS

OPS uses a set of XHTML modules with some additional restrictions. E.g. OPS is always XHTML compatible, but no the other way round.


XHTML 1.1 Module Name Elements (non-normative)
Structure body, head, html, title
Text abbr, acronym, address, blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var
Hypertext a
List dl, dt, dd, ol, ul, li
Object object, param
Presentation b, big, hr, i, small, sub, sup, tt
Edit del, ins
Bidirectional Text bdo
Table caption, col, colgroup, table, tbody, td, tfoot, th, thead, tr
Image img
Client-Side Image Map area, map
Meta-Information meta
Style Sheet style
Style Attribute (deprecated) style attribute
Link link
Base base

Remark: EPub also can use the DTBook (DAISY/NISO standard) for markup.

SVG

Readers must support SVG 1.1. SVG animation and scripting features are not supported and must not be used by publication authors; a Reading System should not render such content. CSS styling of SVG must be fully supported.

SVG content can be used from XHML img and object elements but also within XHTML (probably in the standard way with namespaces).

XML

You also may use your own XML both inline within XHTML and out-of-line as documents. Both can have fallback options (to be used when the contents can't be rendered by a client).

OPS style Sheets

OPS Style sheets are CSS2 styles in the XML tradition, i.e. selectors and attribute names are case sensitive. Again, like for XHTML, there are some restrictions.

Epub 3.0x

Epub 3.0 was introduced in 2011 and in 2014 version 3.1 was published.

According to the official manual, EPUB 3's base content format is now based on the XML serialization of HTML5 (XHTML5) [ContentDocs30], whereas EPUB2 supported two basic content types: a profile of XHTML 1.1 and DTBook [OPS2] (a semantically-enhanced markup focused on accessibility concerns) [...] the EPUB 3 XHTML Content Document definition includes both extensions to and restrictions on its HTML5 base.

In summary:

  • Epub contents are a subset version of HTML5 that respects XML syntax (closed tags).
  • Epub 3.0 allows for some interactivity using a special attribute for triggers and a JS API
  • Styling is done with a subset of CSS3 (mostly CSS2)
  • Rich media (e.g. video) are supported

Software

Authoring software

(1) One solution for authoring new e-pub contents is using an XHTML authoring tool. To create the epub archive, there are two options so far:

  • Build the epub archive manually which is not too difficult for a technical person.
  • Use conversion software (see below), some of which is good. E.g. Calibre can handle multi-file documents in various input formats and it also can split a well structured XHTML file with an XPath expression into several sub-files.

(2) An other solution is use an Epub editor (i.e. Sigil, introduced below)

(3) Many authors probably use a combination. Produce contents with any sort of XHTML editor and the use an authoring tool to fine tune the contents.

Sigil
Sigil seems to be the best free authoring software so far.
BlueGriffon EPUB Edition
Product home page
  • Commercial (195 Euros)
  • Epub3 compatible
  • Not tested so far ...
Jutoh
Product home page is both a convertor and editor.
  • Commercial (30 Euros)

Conversion software

There exist several tools that can convert from one format to another. Mileage varies according to input. E.g. PDF is more difficult to convert than XHTML for example,

  • PDF input leads to really ugly output
  • Web pages like this mediawiki page translate badly, e.g. bullets go away.
Atlantis Word Processor
Atlantis Word Processor converts any document to EPUB. Supports multilevel TOCs, font embedding, and batch conversion.
Source formats: RTF, DOC, DOCX, ODT, TXT.
Platform: Windows.
Calibre
Calibre is a one stop solution to all your e-book needs. It is free, open source and cross-platform in design and works well on Linux, OS X and Windows.
List of general features: Library Management - Format conversion (all major ebook formats can be converted from) - Syncing to ebook reader devices - Fetching news from the web and converting it into ebook form - Viewing many different ebook formats - Giving you access to your book collection over the internet using just a browser.
Converter: can create chapters out of single text using XPath expressions, add title image, title, author, data, etc.
Source formats: LIT, MOBI, EPUB, HTML, PRC, RTF, TXT, PDF. Some convert better than others, e.g. PDF pictures don't translate.
Output formats: EPUB, LRF, MOBI
Easy to install under Ubuntu (tested with 8.04 Hardy Heron and 9.04 Jaunty). You can do it with one command line.
Writer2ePub
  • Writer2ePub is an OpenOffice.org extension that creates an ePub file from any document openable by the OOo Word Processor.
  • Quote: “W2E is an ePub creator. Simply write your document with the OOo Word Processor and W2E will make an ePub file using the best traditional typographic rules, by applying a predefined style sheet (CSS). If you need a good ePub document and you can use the OOo Writer Word Processor, W2E is your tool.”
eCub
Seems to be a dead project (as of 2014/2015)
On-line conversion tools
  • Web2FB2 is a webpage to F3B and EPUB converter. (tested April 2009).
    • The result wasn't convincing regarding conversion of a fairly ugly XHTML file (I tried with this page) - Daniel K. Schneider 19:17, 22 April 2009 (UTC).
    • Tip: make use of the advanced options to set both title and author. The file name created will be named authorname_first_name_title.epub.
Wiki 2 epub
  • We did not find any software, but as we describe below, wiki to xhtml to epub somewhat works.
  • There is an on-demand Epub converter for MediaWikis. However it only works with some versions: Extension talk:EPubExport

Integrated online environments

  • Zinepal allows to create magazines in various formats (including Epug). Sign-up is required.

Validation

  • EpubCheck is a Java-based commandline tool to validate EPUB files. Also can be used as programming library.

Reader Software

Most readers support several e-book formats and several support ePub. See also: EPUB at mobileread.com and Wikipedia's Comparison of e-book formats.

There exist several ePub capable readers. Here are just some of these:

Firefox
EPUBReader
Good quality reader and one of the easiest (however it seems to need a fair amount of CPU power)
FBReader
FBReader e-book reader for Unix/Windows computers. It supports several formats.
Wikipedia entry
Unix/Windows (It's included in Ubuntu's Synaptic)
Features: Several formats (partially or fully) . Several reading options.
Calibre
Calibre is a converting tool (see above), but also includes a (OK, but not too good) reader.
Kobo
Kobo is an ecosystem like Amazon: It has a store, your online library, an (optional ebook reader), etc. Ebooks are in Epub format and can be read on any device using the Kobo e-book reader software. e.g. a laptop computer or an android tablet/phone. The software can automatically synchronize with your online library, but you also can take just one book, e.g. on your mobile phone when you are stuck waiting somwhere ...
You also can download the e-pub from the online library. I recommend this, if you buy expensive books. The file has an Adobe DRM on it and must be opened with Adobe Digital Editions. If you want to read it on Linux, you must crack the file.
Read 10 simple Kobo tips and tricks
Openberg
OpenBerg, was an opensource initiative to write a system (reader, authoring etc.). The reader was a Firefox extension. Now dead ?
Wikipedia entry
Adobe Digital editions
Adobe Digital Editions is a free software with built-in DRM mechanism. If you buy contents and if you register with Adobe you can use the contents on a limited set of other computers - needs some clarification)
Wikipedia entry
Stanza
Lexcycle Stanza (favorite free iPhone/iPod reader). Win/Mac/i*.
Wikipedia Entry

Hardware

E-books make most sense when read on specialized hardware. Several brands can read ePub documents, e.g.

Sony Reader (Mobileread wiki)
E.g. the Sony PRS 505 has 64MB RAM and 256 flash RAM, a size of 6", 800 x 600px, 167 PPI, 8 grey levels, USB 2.0, about 2-3 weeks battery life (7500 page turns).
E.g. the Sony PRS 700BC (oct 2008) has a size of 127.6 x 174.3 x 9.7 mm, a touch screen / 167 ppi / 800x600px.
JetBook
153 x 109 x 10mm: 640 x 480 VGA greyscale
Hanlin EBooks (Mobileread wiki).
The same hardware is sold under different brand names and with different firmware.

See e-book.

DRM removal

Read Digital Editions Help first, there may be a solution without removing the DRM that fits your needs.

While I understand that editors want to sell books (as opposed to see just few persons downloading and sharing them), I still argue against DRM. I would not oppose watermarking, e.g. with a (verified) real name and email address.

  • Publishing houses like O'Reilly don't use DRM and still seem to sell books (I, for example, bought a whole lot). Often you can get them half price, e.g. about $20 a piece.
  • DRM doesn't allow you to read books from your Linux systems.
  • Books with DRM's are really difficult to install on mobile phones. In theory there is a way, but I did not manage on a Galaxy S3
  • As a researcher, I want to be able to copy/paste sentences, but I cannot copy/paste from Digital editions
  • Some publishing houses sell e-books with dreadful formatting that would need some fixing (ok I wouldn't take time to do that but others might). Most often maps and other detailed graphics are unreadable (maybe the quality is better in the assets directory)
  • As a citizen, I do not want Adobe to register what I read.

- Daniel K. Schneider (talk) 17:43, 30 June 2014 (CEST)

Therefore, removing the DRM is just fine (and legal in most countries), unless you redistribute cracked books (which I don't do)

Solutions (careful ! Some of the software may be dangerous)

EPUB DRM Removal - Download] tool (not tested)

  • EPubSoft PDF / EPub DRM Removal (not tested)

Conversion testing

A (preliminary) conversion test of a huge HTML file

- Daniel K. Schneider 19:17, 22 April 2009. Therefore this section does need to upgraded !!

I tried to convert mediawiki contents, i.e. what could be called a wiki book.

  • I took a larger list of flash tutorials from this wiki (bad idea since they include huge pictures) and generated a single (ugly) html page with the pdfbook generator (800MB for the HTML only).
  • This some_flash_tutorials.html file then had to be repaired with tidy. I also converted it to real XHTML.
tidy -o flash_tutorials.xhtml -asxhtml some_flash_tutorials.html
  • I opened the XHTML file in a browser and then "saved as complete web page" in a directory. This is important in order to have a local copy of all images.
  • I then imported the XHTML File to Calibre and converted to epub.
    • I changed the XPath expression for chapter detection to "//h1" since each each wiki page starts with h1.

The result was sort of ok, a "442 pages" 8MB file

  • Some links ("a" tags) were wrong, i.e. extented over several pages. They were ok in the original XHTML file. The conversion software cannot handle something like this and looses a closing "a" somewhere.
 <div class="thumb tleft">
  <div class="thumbinner" style="width: 182px;">
  <a href="http://edutechwiki.unige.ch/en/Image:Flash-cs3-tools-panel-items.png"
     class="image" title="Items of the Flash CS3 tools panel">
  <img height="480" border="0" width="180" 
       alt="Items of the Flash CS3 tools panel"
       src="flash_tutorials_files/180px-Flash-cs3-tools-panel-items.png"
       class="thumbimage"/></a>
  <div class="thumbcaption">
  <div class="magnify">
  <a href="http://edutechwiki.unige.ch/en/Image:Flash-cs3-tools-panel-items.png"
     class="internal" title="Enlarge"/></div>
 Items of the Flash CS3 tools panel</div>
 </div>
</div>
  • There were too many wiki links (normal since my wiki pages are linked).

Basically, I'd have to clean up the XHTML to get a better result, i.e. remove some wiki things that are really not needed.

I then also tested eCub.

  • Copy the xhtml file plus the image directory into the project directory.
  • In Options, tick Portable mode. This will include images

The result was a also a 8.7MB file. This software is a bit easier to use and results were slightly better, but the links problem was the same. But it can't split a file into chapters, i.e. create a table of contents for a single big HTML file. Therefore, one has to import xhtml files one by one in order to get a chapter structure.

Conclusion: It is possible to create rather large e-books from mediawiki pages. But for quality results there is manual work to be done (or filtering script writing).


In education

  • There is increasing interest for using EPUB (version 3) to create electronic textbooks, e.g. see the IMS EduPub initiative.

Links

General
  • EPUB (mobileread.com wiki)
  • ePub (Wikipedia)
Official
Software
Contents
Tutorials and technical tips