EPub: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
Line 98: Line 98:
OPS uses a set of XHTML modules with some additional restrictions. E.g. OPS is always XHTML compatible, but no the other way round.
OPS uses a set of XHTML modules with some additional restrictions. E.g. OPS is always XHTML compatible, but no the other way round.


<table summary="OPS XHTML required Modules">
              <thead>
                <tr>
                  <td>
                    XHTML 1.1 Module Name
                  </td>
                  <td>
                    Elements (non-normative)
                  </td>
                </tr>


              </thead>
<table border="1">
<tr>
  <td>
    XHTML 1.1 Module Name
  </td>
  <td>
    Elements (non-normative)
  </td>
</tr>
<tr>
  <td>
    Structure
  </td>
  <td>
    body, head,
    html, title


              <tbody>
  </td>
                <tr>
</tr>
                  <td>
                    Structure
                  </td>
                  <td>
                    <span class="Element">body</span>, <span class="Element">head</span>,
                    <span class="Element">html</span>, <span class="Element">title</span>


                  </td>
<tr>
                </tr>
  <td>
    Text
  </td>
  <td>
    abbr, acronym,
    address, blockquote,
    br, cite,
    code, dfn,
    div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var


                <tr>
  </td>
                  <td>
</tr>
                    Text
                  </td>
                  <td>
                    <span class="Element">abbr</span>, <span class="Element">acronym</span>,
                    <span class="Element">address</span>, <span class="Element">blockquote</span>,
                    <span class="Element">br</span>, <span class="Element">cite</span>,
                    <span class="Element">code</span>, <span class="Element">dfn</span>,
                    <span class="Element">div</span>, <span class="Element">em</span>, <span class="Element">h1</span>, <span class="Element">h2</span>, <span class="Element">h3</span>, <span class="Element">h4</span>, <span class="Element">h5</span>, <span class="Element">h6</span>, <span class="Element">kbd</span>, <span class="Element">p</span>, <span class="Element">pre</span>, <span class="Element">q</span>, <span class="Element">samp</span>, <span class="Element">span</span>, <span class="Element">strong</span>, <span class="Element">var</span>


                  </td>
<tr>
                </tr>
  <td>
    Hypertext
  </td>
  <td>
    a


                <tr>
  </td>
                  <td>
</tr>
                    Hypertext
                  </td>
                  <td>
                    <span class="Element">a</span>


                  </td>
<tr>
                </tr>
  <td>
    List
  </td>
  <td>
    dl, dt, dd, ol, ul, li


                <tr>
  </td>
                  <td>
</tr>
                    List
                  </td>
                  <td>
                    <span class="Element">dl</span>, <span class="Element">dt</span>, <span class="Element">dd</span>, <span class="Element">ol</span>, <span class="Element">ul</span>, <span class="Element">li</span>


                  </td>
<tr>
                </tr>
  <td>
    Object
  </td>
  <td>
    object, param


                <tr>
  </td>
                  <td>
</tr>
                    Object
                  </td>
                  <td>
                    <span class="Element">object</span>, <span class="Element">param</span>


                  </td>
<tr>
                </tr>
  <td>
    Presentation
  </td>
  <td>
    b, big, hr, i, small, sub, sup, tt


                <tr>
  </td>
                  <td>
</tr>
                    Presentation
                  </td>
                  <td>
                    <span class="Element">b</span>, <span class="Element">big</span>, <span class="Element">hr</span>, <span class="Element">i</span>, <span class="Element">small</span>, <span class="Element">sub</span>, <span class="Element">sup</span>, <span class="Element">tt</span>


                  </td>
<tr>
                </tr>
  <td>
    Edit
  </td>
  <td>
    del, ins


                <tr>
  </td>
                  <td>
</tr>
                    Edit
                  </td>
                  <td>
                    <span class="Element">del</span>, <span class="Element">ins</span>


                  </td>
<tr>
                </tr>
  <td>
    Bidirectional Text
  </td>
  <td>
    bdo


                <tr>
  </td>
                  <td>
</tr>
                    Bidirectional Text
                  </td>
                  <td>
                    <span class="Element">bdo</span>


                  </td>
<tr>
                </tr>
  <td>
    Table
  </td>
  <td>
    caption, col,
    colgroup, table,
    tbody, td,
    tfoot, th,
    thead, tr


                <tr>
  </td>
                  <td>
</tr>
                    Table
                  </td>
                  <td>
                    <span class="Element">caption</span>, <span class="Element">col</span>,
                    <span class="Element">colgroup</span>, <span class="Element">table</span>,
                    <span class="Element">tbody</span>, <span class="Element">td</span>,
                    <span class="Element">tfoot</span>, <span class="Element">th</span>,
                    <span class="Element">thead</span>, <span class="Element">tr</span>


                  </td>
<tr>
                </tr>
  <td>
    Image
  </td>
  <td>
    img


                <tr>
  </td>
                  <td>
</tr>
                    Image
                  </td>
                  <td>
                    <span class="Element">img</span>


                  </td>
<tr>
                </tr>
  <td>
    Client-Side Image Map
  </td>
  <td>
    area, map


                <tr>
  </td>
                  <td>
</tr>
                    Client-Side Image Map
                  </td>
                  <td>
                    <span class="Element">area</span>, <span class="Element">map</span>


                  </td>
<tr>
                </tr>
  <td>
    Meta-Information
  </td>
  <td>
    meta


                <tr>
  </td>
                  <td>
</tr>
                    Meta-Information
                  </td>
                  <td>
                    <span class="Element">meta</span>


                  </td>
<tr>
                </tr>
  <td>
    Style Sheet
  </td>
  <td>
    style


                <tr>
  </td>
                  <td>
</tr>
                    Style Sheet
                  </td>
                  <td>
                    <span class="Element">style</span>


                  </td>
<tr>
                </tr>
  <td>
    Style Attribute (deprecated)
  </td>
  <td>
    <span class="Attribute">style attribute
  </td>


                <tr>
</tr>
                  <td>
                    Style Attribute (deprecated)
                  </td>
                  <td>
                    <span class="Attribute">style</span> attribute
                  </td>


                </tr>
<tr>
  <td>
    Link
  </td>
  <td>
    link
  </td>


                <tr>
</tr>
                  <td>
                    Link
                  </td>
                  <td>
                    <span class="Element">link</span>
                  </td>
 
                </tr>
 
                <tr>
                  <td>
                    Base
                  </td>
                  <td>
                    <span class="Element">base</span>
                  </td>
                </tr>
 
              </tbody>
            </table>


<tr>
  <td>
    Base
  </td>
  <td>
    base
  </td>
</tr>
</table>


Remark: EPub also can use the DTBook (DAISY/NISO standard) for markup.
Remark: EPub also can use the DTBook (DAISY/NISO standard) for markup.

Revision as of 17:58, 22 April 2009

Draft

Definition

ePub is a popular open e-book standard.

“".epub" is the file extension of an XML format for reflowable digital books and publications. ".epub" is composed of three open standards, the Open Publication Structure (OPS), Open Packaging Format (OPF) and Open Container Format (OCF), produced by the IDPF.” (dipf, retrieved 22:38, 26 February 2009 (UTC))

Software and Formats

ePub can be authored and read with an increasing set of software. Since it is an open standard, it does have support from various vendors and publishers (see e.g. Tim O'Reilly Unplugged: The Kindle 2 And Transforming Industries).

Formats

Overview

The ePub Specification comes in three parts:

(1) The open publication structure (OPS) - 09/11/07
  • The Open Publication Structure 2.0 (OPS) is an XML-based standard for authoring digital publications. Contents can be marked up either with a subset of XHTML or Daisy DTBook.
  • The Open Packaging Format (OPF) 2.0 describes the structure of an .epub in XML
(2) The open container format - 10/27/06
  • The OPS Container Format 1.0 (OCF) is a zip-based standard used to encapsulate publication components for transport and delivery.

ePub contents may be DRM controlled, but must not ...

The container and packaging

The *.epub zip file by example:

If we create an e-pub version of this page we get a file called xxx.epub. This *.epub file is OCF zip file. Here is the structure:

edutechwiki_epub.epub (the zip file)
META-INF folder
container.xml
OEBPS folder
fonts folder (includes ttf fonts used)
content1.xhtml
content.opf
style.css
image_name1
image_name2
....
Mimetype

The Open Packaging Format (OPF)

File content.opf describes the various content elements. E.g. an example made with an automatic online converter for this page looks like this:

<?xml version='1.0' encoding='UTF-8'?>
<package xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="bookid">
  <metadata>
    <dc:title>EPub</dc:title>
    <dc:identifier id="bookid">web2fb2_200904221954_3347363837</dc:identifier>
    <dc:language>En</dc:language>
    <dc:creator>Daniel K. Schneider</dc:creator>
    <dc:type>reference</dc:type>
  </metadata>
  <manifest>
    <item id="css" href="style.css" media-type="text/css"/>
    <item id="content1" href="content1.xhtml" media-type="application/xhtml+xml"/>
    <item id="i0ced13f269" href="i0ced13f269" media-type="image/png"/>
    <item id="ib166f0f69c" href="ib166f0f69c" media-type="image/png"/>
    <item id="i7fa52f212a" href="i7fa52f212a" media-type="image/png"/>
    <item id="i33954a4ae2" href="i33954a4ae2" media-type="image/png"/>
    <item id="i8be224f209" href="i8be224f209" media-type="image/png"/>
  </manifest>
  <spine>
    <itemref idref="content1"/>
  </spine>
</package>

The mime-type, i.e. contents of the mimetype file is application/epub+zip.

The container.xml file for this simple example includes:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

That kind of packaging structure follows quite a similar philosophy as the IMS Content Packaging standard. I.e. a zip file includes a central xml file (content.opf) that includes the definition of organization (the "spine") and the metadata (using Dublin Core). It then includes all the assets needed for rendering.

The spine (one could translate this to "parts") can include three different kinds of files:

  • XHTML
  • XML (yours, i.e. what they call out-of-line XML
  • DTBook

The XHTML file can include various formats, e.g. binary pictures, SVG and in-line XML. All these formats can be style with a subset of CSS2.

The XHTML modules used in OPS

OPS uses a set of XHTML modules with some additional restrictions. E.g. OPS is always XHTML compatible, but no the other way round.


   XHTML 1.1 Module Name
   Elements (non-normative)
   Structure
   body, head,
   html, title
   Text
   abbr, acronym,
   address, blockquote,
   br, cite,
   code, dfn,
   div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var
   Hypertext
   a
   List
   dl, dt, dd, ol, ul, li
   Object
   object, param
   Presentation
   b, big, hr, i, small, sub, sup, tt
   Edit
   del, ins
   Bidirectional Text
   bdo
   Table
   caption, col,
   colgroup, table,
   tbody, td,
   tfoot, th,
   thead, tr
   Image
   img
   Client-Side Image Map
   area, map
   Meta-Information
   meta
   Style Sheet
   style
   Style Attribute (deprecated)
   style attribute
   Link
   link
   Base
   base

Remark: EPub also can use the DTBook (DAISY/NISO standard) for markup.

SVG

Readers must support SVG 1.1. SVG animation and scripting features are not supported and must not be used by publication authors; a Reading System should not render such content. CSS styling of SVG must be fully supported.

SVG content can be used from XHML img and object elements but also within XHTML (probably in the standard way with namespaces).

XML

You also may use your own XML both inline within XHTML and out-of-line as documents. Both can have fallback options (to be used when the contents can't be rendered by a client).

OPS style Sheets

OPS Style sheets are CSS2 styles in the XML tradition, i.e. selectors and attribute names are case sensitive. Again, like for XHTML, there are some restrictions.

Authoring software

“Web-standard formats such as XHTML make up the core of OPS. OPS files can be created using a wide variety of Web and eBook publishing tools. They can also be created using XML editors such as Altova’s XML Spy. Adobe InDesign CS3 supports the direct generation of OCF-packaged OPS content as an export function. OCF files can also easily be created using standard ZIP applications like WINZIP” (IDPF FAQ, retrieved 22:38, 26 February 2009 (UTC).

Conversion software

There exist several tools that can convert from one format to another. Mileage varies according to input. E.g. PDF is more difficult to convert than XHTML for example,

  • PDF input leads to really ugly output
  • Web pages like this mediawiki page translate badly, e.g. bullets go away.
Calibre
Calibre is a one stop solution to all your e-book needs. It is free, open source and cross-platform in design and works well on Linux, OS X and Windows.
List of features: Library Management - Format conversion (all major ebook formats can be converted from) - Syncing to ebook reader devices - Fetching news from the web and converting it into ebook form - Viewing many different ebook formats - Giving you access to your book collection over the internet using just a browser
Source formats: LIT, MOBI, EPUB, HTML, PRC, RTF, TXT, PDF. Some convert better than others, e.g. PDF pictures don't translate.
Output formats: EPUB, LRF, MOBI
Easy to install under Ubuntu (tested with 8.04 Hardy Heron)
On-line conversion tools
  • Web2FB2 is a webpage to F3B and EPUB converter. (tested April 2009). tip: make use of the advanced options to set both title and author. The file name created will be named authorname_first_name_title.epub.

Reader Software

Most readers support several e-book formats and several support ePub. See also: EPUB at mobileread.com and Wikipedia's Comparison of e-book formats.

There exist several readers. Here are just some of these:

FBReader
FBReader e-book reader for Unix/Windows computers. It supports several formats.
Wikipedia entry
Unix/Windows (It's included in Ubuntu's Synaptic)
Features: Several formats (partially or fully) . Several reading options.
Calibre
Multi-purpose tool, see above
Openberg
OpenBerg, was an opensource initiative to write a system (reader, authoring etc.). The reader was a Firefox extension. Now dead ?
Wikipedia entry
Adobe Digital editions
Adobe Digital Editions is a free software with built-in DRM mechanism. If you buy contents and if you register with Adobe you can use the contents on a limited set of other computers - needs some clarification)
Wikipedia entry
Stanza
Lexcycle Stanza (favorite free iPhone/iPod reader). Win/Mac/i*.
Wikipedia Entry

Links

General
  • EPUB (mobileread.com wiki)
Official
Contents