ePub

From MobileRead
Jump to: navigation, search

ePub is an open format defined by the Open eBook Forum of the International Digital Publishing Forum (<IDPF>). It is based on XHTML and XML along with optional CSS style sheets. Its predecessor was the OEB standard. Specifications are found at the IDPF web site. The page covers ePub version 2.01. For version 3 see ePub 3.

Contents

Advertisement

[edit] Definition

Quoted from the IDPF web site:

"'.epub' is the file extension of an XML format for reflowable digital books and publications. '.epub' is composed of three open standards, the Open Publication Structure (OPS), Open Packaging Format (OPF) and Open Container Format (OCF), produced by the IDPF. '.epub' allows publishers to produce and send a single digital publication file through distribution and offers consumers interoperability between software/hardware for unencrypted reflowable digital books and other publications. The Open eBook Publication Structure or 'OEB', originally produced in 1999, is the precursor to OPS."

[edit] Usage

The intent of ePub is to serve both as a source file format and an end user format. For this reason the files are collected into a container for easy dissemination and use. This container is generally a zip file but the extension has been renamed to .epub. It has special requirements by including an uncompressed mime type file while the rest of the data in the file is compressed. An ePub reader should be capable of reading the content in its compressed format.

Mime Type: Multipurpose Internet Mail Extensions (RFC 2045). “MIME media types” provide a standard methodology for specifying the content type of objects.

[edit] Specifications

The IDPF specification page contains the specifications for this format. In particular check the version 2.01 OPS and OPF specifications and the version 1.01 OCF specifications. The informational documents are also quite useful in understanding the standard's intent and content.

[edit] OCF

A typical OCF is a zip file that might look like:

mimetype
META-INF/
  container.xml
  [manifest.xml]
  [metadata.xml]
  [signatures.xml]
  [encryption.xml]
  [rights.xml]
OEBPS/
  Great Expectations.opf
  cover.html
  chapters/
     chapter01.html
     chapter02.html
     … other HTML files for the remaining chapters …

[edit] mimetype

The first file in the ZIP Container MUST be a file by the ASCII name of ‘mimetype’ which holds the MIME type for the ZIP Container (i.e., “application/epub+zip” as a 20 character ASCII string; no padding, CR/LF, white-space or case change). The file MUST NOT be compressed nor encrypted and there MUST NOT be an extra field in its ZIP header.

[edit] container.xml

The container.xml is a required file with a required name. It must be in the META-INF folder. All other folders are optional and can be any name the user chooses. The container.xml file shows the filename and location of the OPF file.

[edit] OPF

The Open Packaging Format (OPF) Specification, defines the mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the electronic publication.

Specifically, OPF:

An example:

 <package version="2.0" xmlns="http://www.idpf.org/2007/opf"
         unique-identifier="BookId">
     <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
                xmlns:opf="http://www.idpf.org/2007/opf">
           <dc:title>Alice in Wonderland</dc:title>
           <dc:language>en</dc:language>
           <dc:identifier id="BookId" opf:scheme="ISBN">
            123456789X
           </dc:identifier>
           <dc:creator opf:role="aut">Lewis Carroll</dc:creator>
     </metadata>
     <manifest>
        <item id="intro" href="introduction.html"
                media-type="application/xhtml+xml" />
        <item id="c1" href="chapter-1.html"
                media-type="application/xhtml+xml" />
        <item id="c2" href="chapter-2.html"
                media-type="application/xhtml+xml" />
        <item id="toc" href="contents.xml"
                media-type="application/xhtml+xml" />
        <item id="oview" href="arch.png"
                media-type="image/png" />
     </manifest>
     <spine toc="ncx">
        <itemref idref="intro" />
        <itemref idref="toc" />
        <itemref idref="c1" />
        <itemref idref="c2" />
        <itemref idref="oview" linear="no" />
     </spine>
 </package>

[edit] OPS

The Open Publication Structure (OPS) Specification describes a standard for representing the content of electronic publications.

Specifically:

[edit] XHTML

XHTML is predefined XML and as such it should begin with the line:

<?xml version="1.0" encoding="ISO-8859-1"?>

where the character set to be used in the book is defined in the encoding entry. The default is Unicode UTF-8. UTF-16 must also be supported but all the glyphs need not be present in the font set.

A conforming OPS document must support the following XHTML constructions.

XHTML 1.1 Module Name Elements (non-normative) Notes
Structure body, head, html, title the default rendering for body is consistent with the CSS property page-break-before having been set to right (which behaves like always on one-page Reading Systems), but may be overridden by an appropriate style sheet declaration.
Text abbr, acronym, address, blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var The optional attribute cite may be used in blockquote, q, del and ins to provide a URI citation for the element contents. Reading Systems are not required to process or use the referenced URI resource, whether or not the resource is listed in the Manifest.
Hypertext a Reading Systems may use or render a URI referenced physical resource not listed in the Manifest (i.e., it is not a component of the Publication), but they are not required to do so.
List dl, dt, dd, ol, ul, li
Object object, param The object element is the preferred method for generic object inclusion. When adding objects whose data media type is not drawn from the OPS Core Media Type list or which reference an object implementation using the classid attribute, the object element must specify fallback information for the object, such as another object, an img element, or descriptive text.
Presentation b, big, hr, i, small, sub, sup, tt
Edit del, ins
Bidirectional Text bdo
Table caption, col, colgroup, table, tbody, td, tfoot, th, thead, tr
Image img The inline element img should only be used to refer to images with OPS Core Media Types of GIF (http://www.w3.org/Graphics/GIF/spec-gif89a.txt), PNG (RFC 2083), JPG/JFIF (http://www.w3.org/Graphics/JPEG) or SVG (http://www.w3.org/TR/SVG11/). The required URI attribute, src, is used to reference the image resource, which must be listed in the Manifest.

The required alt attribute should contain a brief and informative textual description of the image. This text may be used by Reading Systems as an alternative to, or in addition to, displaying the image. The text is also an acceptable fallback for an img with src referencing a non-OPS Core Media Type for which no viable fallback was found in the manifest.

Client-Side Image Map area, map
Meta-Information meta
Style Sheet style The type attribute of the style element is required and must be given the value of text/css or the deprecated text/x-oeb1-css.
Style Attribute (deprecated) style attribute
Link link The link element allows for the specification of various relationships with other documents. Reading Systems must recognize external style sheet references specified via the href attribute and the associated rel attribute (for the values rel="stylesheet" and rel="alternate stylesheet".)
Base base The root of an ePUB file is the top of the file hierarchy inside the container.

[edit] DRM

The ePUB standard does not endorse any particular DRM scheme but allows for the creation of DRM. The most popular DRM scheme at this time is the one made and used as part of Adobe Digital Editions. This scheme has been licensed by Overdrive for Library use. Many other publishers also use this scheme. The DRM is applied to individual files within the ePub container. Other DRM systems include an offshoot from Barnes and Noble that is supported by the Adobe DRM server and the Apple FairPlay DRM scheme used on iPad and iPhone devices running iBooks.

It is also possible to have DRM on embedded fonts that are part of an ePUB formatted file by applying DRM directly to the internal file. In addition there are schemes to obfuscate the embedded fonts file. The standard defines one method but Adobe has defined a slightly different method and it seems to dominate at this point.

[edit] Relationships

Relationship to NVDL

This specification uses the NVDL language (see http://standards.iso.org/ittf/PubliclyAvailableStandards/c038615_ISO_IEC_19757-4_2006(E).zip) as a means to unambiguously define the interaction between the various schemas used in this specification. NVDL allows for interaction and validation between various XML schema languages. See Appendix A for a normative NVDL definition of OPS.

This specification does not require the use of NVDL tools to validate OPS documents, although such tools are available and may be used for validation.

Relationship to XHTML and DTBook

This specification recognizes the importance of current software tools, legacy data, publication practices, and market conditions, and has therefore incorporated certain XHTML 1.1 Document Type Modules and DTBook as Preferred Vocabularies. This approach allows content providers to exploit current XHTML and DTBook content, tools, and expertise.

To minimize the implementation burden on Reading System implementers (who may be working with devices that have power and display constraints), the Preferred Vocabularies do not include all XHTML 1.1 elements and attributes. Further, the modules selected from the XHTML 1.1 specification were chosen to be consistent with current directions in XHTML.

Any construct deprecated in XHTML 1.1 is either deprecated or omitted from this specification; CSS-based equivalents are provided in most such cases. Style sheet constructs are also used for new presentational functionality beyond that provided in XHTML.

Relationship to CSS

This specification defines a style language based on CSS 2 (see http://www.w3.org/TR/CSS2/.) The style sheet MIME type text/x-oeb1-css has been deprecated in favor of text/css. Note that not all of CSS 2 is supported and there are additional extensions which are prefixed with oeb-. (oeb-page-head, oeb-page-foot, oeb-column-number)

Relationship to XML

OPS is based on XML because of its generality and simplicity, and because XML documents are likely to adapt well to future technologies and uses. XML also provides well-defined rules for the syntax of documents, which decreases the cost to implementers and reduces incompatibility across systems. Further, XML is extensible: it is not tied to any particular type of document or set of element types, it supports internationalization, and it encourages document markup that can represent a document’s internal parts more directly, making them amenable to automated formatting and other types of computer processing.

Relationship to XML Namespaces

Reading Systems must process XML namespaces according to the XML Namespaces Recommendation at http://www.w3.org/TR/xml-names11/. For example:

xmlns:ops="http://www.idpf.org/2007/ops"

Relationship to Dublin Core

Dublin Core is the defined standard for all metadata used in the ePub document. Only the id, title, and language are required but other entries are encouraged. Certainly author should be entered if known.

[edit] Readers

[edit] Software

[edit] Hardware

[edit] ePub Creation and Manipulation software

[edit] Creation and Editing tools

[edit] Conversion tools

[edit] Checking tools

[edit] Utilities

[edit] Tips

[edit] Tutorials

[edit] ePub 3

Here is some quick links on EPUB version 3. We also have a page in this wiki with more details.

The specification is made up of HTML5, CSS, SVG, images, multi-media support, MathML, SMIL and more.

[edit] For more information

Personal tools
Namespaces
Variants
Actions
Navigation
MobileRead Networks
Toolbox