EPub: Difference between revisions
m (using an external editor) |
m (using an external editor) |
||
Line 54: | Line 54: | ||
=== The Open Packaging Format (OPF) === | === The Open Packaging Format (OPF) === | ||
File '''content.opf''' describes the various content elements of the epub package. E.g. an example made with an automatic online converter for this page looks like this: | File '''content.opf''' describes and organizes the various content elements of the epub package. It also provides metadata about the publication, fallback mechanisms when unsupported extensions are used, and a table of contents. | ||
E.g. an example made with an automatic online converter for this page looks like this: | |||
<source lang="xml"> | <source lang="xml"> | ||
<?xml version='1.0' encoding='UTF-8'?> | <?xml version='1.0' encoding='UTF-8'?> | ||
Line 81: | Line 83: | ||
</source> | </source> | ||
; The spine | ; '''(1) The manifest''' | ||
This manifest (like in IMS content packaging) must include '''all files''' that are part of the publication in any order. It must have a structure like this according to the specification. I.e. each item must have an id, an href to a resource and a media-type. In addition, one can define fall-back elements. | |||
<source lang="xml"> | |||
<manifest> | |||
<item id="intro" href="introduction.html" | |||
media-type="application/xhtml+xml" /> | |||
<item id="c1" href="chapter-1.html" | |||
media-type="application/xhtml+xml" /> | |||
<item id="c2" href="chapter-2.html" | |||
media-type=application/xhtml+xml" /> | |||
<item id="toc" href="contents.xml" | |||
media-type="application/xhtml+xml" | |||
fallback="fall1" /> | |||
<item id="oview" href="arch.png" | |||
media-type="image/png" /> | |||
<item id="fall1" fallback="fall2" | |||
href="SomeDoc.pdf" | |||
media-type="application/pdf" /> | |||
</manifest> | |||
</source> | |||
; '''(2) The spine section''' | |||
{{quotation|Following manifest, there must be one and only one spine element, which contains one or more itemref elements. Each itemref references an OPS Content Document designated in the manifest. The order of the itemref elements organizes the associated OPS Content Documents into the linear reading order of the publication.}} ([http://www.openebook.org/2007/opf/OPF_2.0_final_spec.html#Section2.2 Open Packaging Format (OPF) 2.0 specification]). This '''spine''' (one could translate this to "parts") can include three different kinds of files: | {{quotation|Following manifest, there must be one and only one spine element, which contains one or more itemref elements. Each itemref references an OPS Content Document designated in the manifest. The order of the itemref elements organizes the associated OPS Content Documents into the linear reading order of the publication.}} ([http://www.openebook.org/2007/opf/OPF_2.0_final_spec.html#Section2.2 Open Packaging Format (OPF) 2.0 specification]). This '''spine''' (one could translate this to "parts") can include three different kinds of files: | ||
* XHTML | * XHTML | ||
Line 120: | Line 143: | ||
</source> | </source> | ||
; The metadata | ; '''(3) The metadata section''' | ||
The metadata are defined using [[Dublin Core]] plus possible user-defined tags. Some of these metadata are '''mandatory''', i.e. title, identifier and language. | The metadata are defined using [[Dublin Core]] plus possible user-defined tags. Some of these metadata are '''mandatory''', i.e. title, identifier and language. | ||
; | ; '''Other remarks''' | ||
The XHTML files can include various formats, e.g. binary pictures, SVG and in-line XML. All these formats can be style with a subset of CSS2. | The XHTML files can include various formats, e.g. binary pictures, SVG and in-line XML. All these formats can be style with a subset of CSS2. | ||
This [http://www.openebook.org/2007/opf/OPF_2.0_final_spec.html Open Packaging Format (OPF) 2.0 v0.9871.0] is defined as a relax NG schema. | |||
=== META-INF files in the epub package === | === META-INF files in the epub package === | ||
All valid OCF Containers '''must''' include a directory called META-INF at the root level of the container file system. This directory contains the files specified below that describe the contents, metadata, signatures, encryption, rights and other information about the contained publication. ([http://www.openebook.org/ocf/ocf1.0/download/ocf10.htm OCF 1.0 specification], retrieved 17 | All valid OCF Containers '''must''' include a directory called META-INF at the root level of the container file system. This directory contains the files specified below that describe the contents, metadata, signatures, encryption, rights and other information about the contained publication. ([http://www.openebook.org/ocf/ocf1.0/download/ocf10.htm OCF 1.0 specification], retrieved 19:17, 22 April 2009 (UTC)). | ||
The '''container.xml''' file describes in a simple case where to find the content.opf file. In our simple example it looks like this: | The '''container.xml''' file describes in a simple case where to find the content.opf file. In our simple example it looks like this: | ||
Line 280: | Line 284: | ||
OPS Style sheets are CSS2 styles in the XML tradition, i.e. selectors and attribute names are case sensitive. Again, like for XHTML, there are some restrictions. | OPS Style sheets are CSS2 styles in the XML tradition, i.e. selectors and attribute names are case sensitive. Again, like for XHTML, there are some restrictions. | ||
== Software == | |||
=== Authoring software === | === Authoring software === | ||
{{quotation|Web-standard formats such as XHTML make up the core of OPS. OPS files can be created using a wide variety of Web and eBook publishing tools. They can also be created using XML editors such as Altova’s XML Spy. Adobe InDesign CS3 supports the direct generation of OCF-packaged OPS content as an export function. OCF files can also easily be created using standard ZIP applications like WINZIP}} ([https://www.idpf.org/forums/viewtopic.php?t=22 IDPF FAQ], retrieved 22:38, 26 February 2009 (UTC). | {{quotation|Web-standard formats such as XHTML make up the core of OPS. OPS files can be created using a wide variety of Web and eBook publishing tools. They can also be created using XML editors such as Altova’s XML Spy. Adobe InDesign CS3 supports the direct generation of OCF-packaged OPS content as an export function. OCF files can also easily be created using standard ZIP applications like WINZIP}} ([https://www.idpf.org/forums/viewtopic.php?t=22 IDPF FAQ], retrieved 22:38, 26 February 2009 (UTC). | ||
We didn't find any convincing open source software. Well, one can write in XHTML and then use a translator. But the problem is that one should use several smaller XHTML files since epub is for small devices. To create the epub archive: | |||
* Build the epub archive manually which is not difficult for a technical person. | |||
* Use a better conversion software. E.g. Calibre can handle multi-file documents and it also can split a well structured XHTML files with an XPath expression. | |||
; A (preliminary) conversion test of a huge HTML file | |||
([[User:Daniel K. Schneider|Daniel K. Schneider]] 19:17, 22 April 2009 (UTC)) | |||
I tried a conversion test with Calibre for mediawiki contents. I took a larger list of flash tutorials from this wiki (bad idea since they include huge pictures). Then I generated a single (ugly) html page with the pdfbook generator (800MB for the HTML only). | |||
This ''some_flash_tutorials.html'' file then had to be repaired with tidy. I also made it XHTML with a .xhtml extension. | |||
tidy -o flash_tutorials.xhtml -asxhtml some_flash_tutorials.html | |||
I then imported the result to Calibri and converted to epub. I change the XPath expression for chapter detection to "//h1" | |||
The result was sort of ok. But images haven't been included (i.e. they show, but are pulled from the web site. Also, some anchors were wrong, i.e. extented over several pages. Basically, I'd have to clean up the XHTML to get a better result... | |||
; '''eScape''' | |||
* [http://www.infogridpacific.com/igp/AZARDI/eScape -ODT2ePub/ EScape] is a free '''for non-commercial use''' Open Office Writer to ePub creator. According to the [http://www.infogridpacific.com/igp/AZARDI/eScape -ODT2ePub/ home page], production is easy: | |||
# Use the supplied Open Office Template (OTT) to Structure-Style™= your text files in the familiar Open Office Writer environment. | |||
# Open eScape, browse to your ODT, cover image and select a CSS. | |||
# Set a few reader friendly options and click "Create ePub" | |||
(Not tested so far). | |||
=== Conversion software === | === Conversion software === | ||
Line 299: | Line 328: | ||
; On-line conversion tools: | ; On-line conversion tools: | ||
* [http://web2fb2.net/ Web2FB2] is a webpage to F3B and EPUB converter. (tested April 2009). | |||
* [http://web2fb2.net/ Web2FB2] is a webpage to F3B and EPUB converter. (tested April 2009). | |||
** The result wasn't convincing regarding conversion of a fairly ugly XHTML file (I tried with this page) - [[User:Daniel K. Schneider|Daniel K. Schneider]] 19:17, 22 April 2009 (UTC). | |||
** Tip: make use of the advanced options to set both title and author. The file name created will be named authorname_first_name_title.epub. | |||
=== Reader Software === | === Reader Software === |
Revision as of 20:17, 22 April 2009
Definition
ePub is a popular open e-book standard.
“".epub" is the file extension of an XML format for reflowable digital books and publications. ".epub" is composed of three open standards, the Open Publication Structure (OPS), Open Packaging Format (OPF) and Open Container Format (OCF), produced by the IDPF.” (dipf, retrieved 22:38, 26 February 2009 (UTC))
Software and Formats
ePub can be authored and read with an increasing set of software. Since it is an open standard, it does have support from various vendors and publishers (see e.g. Tim O'Reilly Unplugged: The Kindle 2 And Transforming Industries).
Formats
ePub formats are defined with Relax NG but rely on other standards too.
Overview
The ePub Specification comes in three parts:
- (1) The open publication structure (OPS) - 09/11/07
- The Open Publication Structure 2.0 (OPS) is an XML-based standard for authoring digital publications. Contents can be marked up either with a subset of XHTML or Daisy DTBook.
- The Open Packaging Format (OPF) 2.0 describes the structure of an .epub in XML
- (2) The open container format - 10/27/06
- The OPS Container Format 1.0 (OCF) is a zip-based standard used to encapsulate publication components for transport and delivery.
ePub contents may be DRM controlled, but must not ...
The container and packaging
The *.epub zip file by example:
If we create an e-pub version of this page we get a file called xxx.epub. This *.epub file is an OCF zip file. Here is the structure:
- edutechwiki_epub.epub (the zip file)
- META-INF folder
- container.xml
- OEBPS folder
- fonts folder (includes ttf fonts used)
- content1.xhtml
- content.opf
- style.css
- image_name1
- image_name2
- ....
- Mimetype
- META-INF folder
That kind of packaging structure follows quite a similar philosophy as the IMS Content Packaging standard. I.e. a zip file includes a central xml file (content.opf) that includes the definition of organization (the "spine") and the metadata. It then includes all the assets needed for rendering.
The mime-type, i.e. contents of the Mimetype file is application/epub+zip.
Let us now describe some of its files:
The Open Packaging Format (OPF)
File content.opf describes and organizes the various content elements of the epub package. It also provides metadata about the publication, fallback mechanisms when unsupported extensions are used, and a table of contents.
E.g. an example made with an automatic online converter for this page looks like this:
<?xml version='1.0' encoding='UTF-8'?>
<package xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="bookid">
<metadata>
<dc:title>EPub</dc:title>
<dc:identifier id="bookid">web2fb2_200904221954_3347363837</dc:identifier>
<dc:language>En</dc:language>
<dc:creator>Daniel K. Schneider</dc:creator>
<dc:type>reference</dc:type>
</metadata>
<manifest>
<item id="css" href="style.css" media-type="text/css"/>
<item id="content1" href="content1.xhtml" media-type="application/xhtml+xml"/>
<item id="i0ced13f269" href="i0ced13f269" media-type="image/png"/>
<item id="ib166f0f69c" href="ib166f0f69c" media-type="image/png"/>
<item id="i7fa52f212a" href="i7fa52f212a" media-type="image/png"/>
<item id="i33954a4ae2" href="i33954a4ae2" media-type="image/png"/>
<item id="i8be224f209" href="i8be224f209" media-type="image/png"/>
</manifest>
<spine>
<itemref idref="content1"/>
</spine>
</package>
- (1) The manifest
This manifest (like in IMS content packaging) must include all files that are part of the publication in any order. It must have a structure like this according to the specification. I.e. each item must have an id, an href to a resource and a media-type. In addition, one can define fall-back elements.
<manifest>
<item id="intro" href="introduction.html"
media-type="application/xhtml+xml" />
<item id="c1" href="chapter-1.html"
media-type="application/xhtml+xml" />
<item id="c2" href="chapter-2.html"
media-type=application/xhtml+xml" />
<item id="toc" href="contents.xml"
media-type="application/xhtml+xml"
fallback="fall1" />
<item id="oview" href="arch.png"
media-type="image/png" />
<item id="fall1" fallback="fall2"
href="SomeDoc.pdf"
media-type="application/pdf" />
</manifest>
- (2) The spine section
“Following manifest, there must be one and only one spine element, which contains one or more itemref elements. Each itemref references an OPS Content Document designated in the manifest. The order of the itemref elements organizes the associated OPS Content Documents into the linear reading order of the publication.” (Open Packaging Format (OPF) 2.0 specification). This spine (one could translate this to "parts") can include three different kinds of files:
- XHTML
- XML (yours, i.e. what they call out-of-line XML
- DTBook
Spine elements refer to resources defined in the manifest and may include a table of contents. E.g. a simple example would look like this:
<manifest>
<item id="intro"
href="intro.html"
media-type="application/xhtml+xml" />
<item id="chap1"
href="chap1.html"
media-type="application/xhtml+xml" />
<item id="chap2"
href="chap2.dtb"
media-type="application/x-dtbook+xml" />
<item id="chap3"
href="chap3.html"
media-type="application/xhtml+xml" />
<item id="f1"
href="fig1.jpg"
media-type="image/jpeg" />
<!-- ...... other multimedia assets here .... -->
<item id="toc_item"
href="toc.ncx"
media-type="application/x-dtbncx+xml" />
</manifest>
<spine toc="toc_item">
<itemref idref="intro" />
<itemref idref="chap1" />
<itemref idref="chap2" />
<itemref idref="chap3" />
</spine>
- (3) The metadata section
The metadata are defined using Dublin Core plus possible user-defined tags. Some of these metadata are mandatory, i.e. title, identifier and language.
- Other remarks
The XHTML files can include various formats, e.g. binary pictures, SVG and in-line XML. All these formats can be style with a subset of CSS2.
This Open Packaging Format (OPF) 2.0 v0.9871.0 is defined as a relax NG schema.
META-INF files in the epub package
All valid OCF Containers must include a directory called META-INF at the root level of the container file system. This directory contains the files specified below that describe the contents, metadata, signatures, encryption, rights and other information about the contained publication. (OCF 1.0 specification, retrieved 19:17, 22 April 2009 (UTC)).
The container.xml file describes in a simple case where to find the content.opf file. In our simple example it looks like this:
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
</rootfiles>
</container>
An other example taken from the OCF 1.0 specification show that one could include an alternative PDF file for example:
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/My Crazy Life.opf"
media-type="application/oebps-package+xml" />
<rootfile full-path="PDF/My Crazy Life.pdf"
media-type="application/pdf" />
</rootfiles>
</container>
In addition to container.xml, there can be five other files:
- manifest.xml
- metadata.xml
- signatures.xml
- encryption.xml
- rights.xml
The formal specification of these META-INF files is done with a little Relax NG schema:
- There is one rule for container (see the examples above).
- One rule for signatures and that includes the xmldsig-core-schema.rng Schema
- One rule for encryption that also refers to extenal *.rng Schemas.
The XHTML modules used in OPS
OPS uses a set of XHTML modules with some additional restrictions. E.g. OPS is always XHTML compatible, but no the other way round.
XHTML 1.1 Module Name | Elements (non-normative) |
---|---|
Structure | body, head, html, title |
Text | abbr, acronym, address, blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var |
Hypertext | a |
List | dl, dt, dd, ol, ul, li |
Object | object, param |
Presentation | b, big, hr, i, small, sub, sup, tt |
Edit | del, ins |
Bidirectional Text | bdo |
Table | caption, col, colgroup, table, tbody, td, tfoot, th, thead, tr |
Image | img |
Client-Side Image Map | area, map |
Meta-Information | meta |
Style Sheet | style |
Style Attribute (deprecated) | style attribute |
Link | link |
Base | base |
Remark: EPub also can use the DTBook (DAISY/NISO standard) for markup.
SVG
Readers must support SVG 1.1. SVG animation and scripting features are not supported and must not be used by publication authors; a Reading System should not render such content. CSS styling of SVG must be fully supported.
SVG content can be used from XHML img and object elements but also within XHTML (probably in the standard way with namespaces).
XML
You also may use your own XML both inline within XHTML and out-of-line as documents. Both can have fallback options (to be used when the contents can't be rendered by a client).
OPS style Sheets
OPS Style sheets are CSS2 styles in the XML tradition, i.e. selectors and attribute names are case sensitive. Again, like for XHTML, there are some restrictions.
Software
Authoring software
“Web-standard formats such as XHTML make up the core of OPS. OPS files can be created using a wide variety of Web and eBook publishing tools. They can also be created using XML editors such as Altova’s XML Spy. Adobe InDesign CS3 supports the direct generation of OCF-packaged OPS content as an export function. OCF files can also easily be created using standard ZIP applications like WINZIP” (IDPF FAQ, retrieved 22:38, 26 February 2009 (UTC).
We didn't find any convincing open source software. Well, one can write in XHTML and then use a translator. But the problem is that one should use several smaller XHTML files since epub is for small devices. To create the epub archive:
- Build the epub archive manually which is not difficult for a technical person.
- Use a better conversion software. E.g. Calibre can handle multi-file documents and it also can split a well structured XHTML files with an XPath expression.
- A (preliminary) conversion test of a huge HTML file
(Daniel K. Schneider 19:17, 22 April 2009 (UTC))
I tried a conversion test with Calibre for mediawiki contents. I took a larger list of flash tutorials from this wiki (bad idea since they include huge pictures). Then I generated a single (ugly) html page with the pdfbook generator (800MB for the HTML only).
This some_flash_tutorials.html file then had to be repaired with tidy. I also made it XHTML with a .xhtml extension.
tidy -o flash_tutorials.xhtml -asxhtml some_flash_tutorials.html
I then imported the result to Calibri and converted to epub. I change the XPath expression for chapter detection to "//h1"
The result was sort of ok. But images haven't been included (i.e. they show, but are pulled from the web site. Also, some anchors were wrong, i.e. extented over several pages. Basically, I'd have to clean up the XHTML to get a better result...
- eScape
- -ODT2ePub/ EScape is a free for non-commercial use Open Office Writer to ePub creator. According to the -ODT2ePub/ home page, production is easy:
- Use the supplied Open Office Template (OTT) to Structure-Style™= your text files in the familiar Open Office Writer environment.
- Open eScape, browse to your ODT, cover image and select a CSS.
- Set a few reader friendly options and click "Create ePub"
(Not tested so far).
Conversion software
There exist several tools that can convert from one format to another. Mileage varies according to input. E.g. PDF is more difficult to convert than XHTML for example,
- PDF input leads to really ugly output
- Web pages like this mediawiki page translate badly, e.g. bullets go away.
- Calibre
- Calibre is a one stop solution to all your e-book needs. It is free, open source and cross-platform in design and works well on Linux, OS X and Windows.
- List of features: Library Management - Format conversion (all major ebook formats can be converted from) - Syncing to ebook reader devices - Fetching news from the web and converting it into ebook form - Viewing many different ebook formats - Giving you access to your book collection over the internet using just a browser
- Source formats: LIT, MOBI, EPUB, HTML, PRC, RTF, TXT, PDF. Some convert better than others, e.g. PDF pictures don't translate.
- Output formats: EPUB, LRF, MOBI
- Easy to install under Ubuntu (tested with 8.04 Hardy Heron)
- On-line conversion tools
- Web2FB2 is a webpage to F3B and EPUB converter. (tested April 2009).
- The result wasn't convincing regarding conversion of a fairly ugly XHTML file (I tried with this page) - Daniel K. Schneider 19:17, 22 April 2009 (UTC).
- Tip: make use of the advanced options to set both title and author. The file name created will be named authorname_first_name_title.epub.
Reader Software
Most readers support several e-book formats and several support ePub. See also: EPUB at mobileread.com and Wikipedia's Comparison of e-book formats.
There exist several readers. Here are just some of these:
- FBReader
- FBReader e-book reader for Unix/Windows computers. It supports several formats.
- Wikipedia entry
- Unix/Windows (It's included in Ubuntu's Synaptic)
- Features: Several formats (partially or fully) . Several reading options.
- Calibre
- Multi-purpose tool, see above
- Openberg
- OpenBerg, was an opensource initiative to write a system (reader, authoring etc.). The reader was a Firefox extension. Now dead ?
- Wikipedia entry
- Adobe Digital editions
- Adobe Digital Editions is a free software with built-in DRM mechanism. If you buy contents and if you register with Adobe you can use the contents on a limited set of other computers - needs some clarification)
- Wikipedia entry
- Stanza
- Lexcycle Stanza (favorite free iPhone/iPod reader). Win/Mac/i*.
- Wikipedia Entry
Links
- General
- EPUB (mobileread.com wiki)
- Official
- International Digital Publishing Forum (idpf, where the standards are defined)
- Contents
- Where to download ePub Books online. Several publishers are listed, but also free contents.
- Feedbooks (free contents)