ePub

From MobileRead
Jump to: navigation, search

ePub is an open format defined by the Open eBook Forum of the International Digital Publishing Forum (<IDPF>). It is based on XHTML and XML along with optional CSS style sheets. Its predecessor was the OEB standard. Specifications are found at the IDPF web site. The page covers ePub version 2.01. For version 3 see ePub 3. See also Fixed layout ePub.

Contents

[edit] Definition

Quoted from the IDPF web site:

"'.epub' is the file extension of an XML format for reflowable digital books and publications. '.epub' is composed of three open standards, the Open Publication Structure (OPS), Open Packaging Format (OPF) and Open Container Format (OCF), produced by the IDPF. '.epub' allows publishers to produce and send a single digital publication file through distribution and offers consumers interoperability between software/hardware for unencrypted reflowable digital books and other publications. The Open eBook Publication Structure or 'OEB', originally produced in 1999, is the precursor to OPS."

[edit] Usage

The intent of ePub is to serve both as a source file format and an end user format. For this reason the files are collected into a container for easy dissemination and use. This container is generally a zip file but the extension has been renamed to .epub. It has special requirements by including an uncompressed mime type file while the rest of the data in the file is compressed. An ePub reader should be capable of reading the content in its compressed format.

Mime Type: Multipurpose Internet Mail Extensions (RFC 2045). “MIME media types” provide a standard methodology for specifying the content type of objects.

[edit] Specifications

The IDPF specification page contains the specifications for this format. In particular check the version 2.01 OPS and OPF specifications and the version 1.01 OCF specifications. The informational documents are also quite useful in understanding the standard's intent and content.

[edit] OCF

A typical OCF is a zip file that might look like:

mimetype
META-INF/
  container.xml
  [manifest.xml]
  [metadata.xml]
  [signatures.xml]
  [encryption.xml]
  [rights.xml]
OEBPS/
  Great Expectations.opf
  cover.html
  chapters/
     chapter01.html
     chapter02.html
     … other HTML files for the remaining chapters …

[edit] mimetype

The first file in the ZIP Container MUST be a file by the ASCII name of ‘mimetype’ which holds the MIME type for the ZIP Container (i.e., “application/epub+zip” as a 20 character ASCII string; no padding, CR/LF, white-space or case change). The file MUST NOT be compressed nor encrypted and there MUST NOT be an extra field in its ZIP header.

[edit] container.xml

The container.xml is a required file with a required name. It must be in the META-INF folder. All other folders are optional and can be any name the user chooses. The container.xml file shows the filename and location of the OPF file.

[edit] OPF

The Open Packaging Format (OPF) Specification, defines the mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the electronic publication.

Specifically, OPF:

  • Describes and references all components of the electronic publication (e.g. markup files, images, navigation structures).
  • Provides publication-level metadata. Specifically it should include: dublin core formatted data
  • Specifies the linear reading-order of the publication.
  • Provides fallback information to use when unsupported extensions to OPS are employed.
  • Provides a mechanism to specify a declarative table of contents (the NCX).
  • May provide pointers to additional optional elements such as embedded fonts.

An example:

 <package version="2.0" xmlns="http://www.idpf.org/2007/opf"
         unique-identifier="BookId">
     <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
                xmlns:opf="http://www.idpf.org/2007/opf">
           <dc:title>Alice in Wonderland</dc:title>
           <dc:language>en</dc:language>
           <dc:identifier id="BookId" opf:scheme="ISBN">
            123456789X
           </dc:identifier>
           <dc:creator opf:role="aut">Lewis Carroll</dc:creator>
     </metadata>
     <manifest>
        <item id="intro" href="introduction.html"
                media-type="application/xhtml+xml" />
        <item id="c1" href="chapter-1.html"
                media-type="application/xhtml+xml" />
        <item id="c2" href="chapter-2.html"
                media-type="application/xhtml+xml" />
        <item id="toc" href="contents.xml"
                media-type="application/xhtml+xml" />
        <item id="oview" href="arch.png"
                media-type="image/png" />
     </manifest>
     <spine toc="ncx">
        <itemref idref="intro" />
        <itemref idref="toc" />
        <itemref idref="c1" />
        <itemref idref="c2" />
        <itemref idref="oview" linear="no" />
     </spine>
 </package>

[edit] Unique ID

A unique id is a required element in the OPF. Here is one way to accomplish this. Just click on the site and it will give you an id: http://www.famkruithof.net/uuid/uuidgen

[edit] OPS

The Open Publication Structure (OPS) Specification describes a standard for representing the content of electronic publications.

Specifically:

  • The specification is intended to give content providers (e.g. publishers, authors, and others who have content to be displayed) and publication tool providers, minimal and common guidelines that ensure fidelity, accuracy, accessibility, and adequate presentation of electronic content over various Reading Systems.
  • The specification seeks to reflect established content format standards.
  • The goal of this specification is to define a standard means of content description for use by purveyors of electronic books (publishers, agents, authors et al.) allowing such content to be provided to multiple Reading Systems and to insure maximum presentational equivalence across Reading Systems.

[edit] XHTML

XHTML is predefined XML and as such it should begin with the line:

<?xml version="1.0" encoding="ISO-8859-1"?>

where the character set to be used in the book is defined in the encoding entry. The default is Unicode UTF-8. UTF-16 must also be supported but all the glyphs need not be present in the font set.

A conforming OPS document must support the following XHTML constructions.

XHTML 1.1 Module Name Elements (non-normative) Notes
Structure body, head, html, title the default rendering for body is consistent with the CSS property page-break-before having been set to right (which behaves like always on one-page Reading Systems), but may be overridden by an appropriate style sheet declaration.
Text abbr, acronym, address, blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var The optional attribute cite may be used in blockquote, q, del and ins to provide a URI citation for the element contents. Reading Systems are not required to process or use the referenced URI resource, whether or not the resource is listed in the Manifest.
Hypertext a Reading Systems may use or render a URI referenced physical resource not listed in the Manifest (i.e., it is not a component of the Publication), but they are not required to do so.
List dl, dt, dd, ol, ul, li
Object object, param The object element is the preferred method for generic object inclusion. When adding objects whose data media type is not drawn from the OPS Core Media Type list or which reference an object implementation using the classid attribute, the object element must specify fallback information for the object, such as another object, an img element, or descriptive text.
Presentation b, big, hr, i, small, sub, sup, tt
Edit del, ins
Bidirectional Text bdo
Table caption, col, colgroup, table, tbody, td, tfoot, th, thead, tr
Image img The inline element img should only be used to refer to images with OPS Core Media Types of GIF (http://www.w3.org/Graphics/GIF/spec-gif89a.txt), PNG (RFC 2083), JPG/JFIF (http://www.w3.org/Graphics/JPEG) or SVG (http://www.w3.org/TR/SVG11/). The required URI attribute, src, is used to reference the image resource, which must be listed in the Manifest.

The required alt attribute should contain a brief and informative textual description of the image. This text may be used by Reading Systems as an alternative to, or in addition to, displaying the image. The text is also an acceptable fallback for an img with src referencing a non-OPS Core Media Type for which no viable fallback was found in the manifest.

Client-Side Image Map area, map
Meta-Information meta
Style Sheet style The type attribute of the style element is required and must be given the value of text/css or the deprecated text/x-oeb1-css.
Style Attribute (deprecated) style attribute
Link link The link element allows for the specification of various relationships with other documents. Reading Systems must recognize external style sheet references specified via the href attribute and the associated rel attribute (for the values rel="stylesheet" and rel="alternate stylesheet".)
Base base The root of an ePUB file is the top of the file hierarchy inside the container.

[edit] DRM

The ePUB standard does not endorse any particular DRM scheme but allows for the creation of DRM. The most popular DRM scheme at this time is the one made and used as part of Adobe Digital Editions. This scheme has been licensed by Overdrive for Library use. Many other publishers also use this scheme. The DRM is applied to individual files within the ePub container. Other DRM systems include an offshoot from Barnes and Noble that is supported by the Adobe DRM server and the Apple FairPlay DRM scheme used on iPad and iPhone devices running iBooks.

It is also possible to have DRM on embedded fonts that are part of an ePUB formatted file by applying DRM directly to the internal file. In addition there are schemes to obfuscate the embedded fonts file. The standard defines one method but Adobe has defined a slightly different method and it seems to dominate at this point. Obfuscated fonts use an XOR (exclusive or) technique to obscure the fonts in an embedded font set so that it cannot be extracted and used by itself. This is done to meet the copyright requirements imposed by the font designer.

[edit] Relationships

Relationship to NVDL

This specification uses the NVDL language (see http://standards.iso.org/ittf/PubliclyAvailableStandards/c038615_ISO_IEC_19757-4_2006(E).zip) as a means to unambiguously define the interaction between the various schemas used in this specification. NVDL allows for interaction and validation between various XML schema languages. See Appendix A for a normative NVDL definition of OPS.

This specification does not require the use of NVDL tools to validate OPS documents, although such tools are available and may be used for validation.

Relationship to XHTML and DTBook

This specification recognizes the importance of current software tools, legacy data, publication practices, and market conditions, and has therefore incorporated certain XHTML 1.1 Document Type Modules and DTBook as Preferred Vocabularies. This approach allows content providers to exploit current XHTML and DTBook content, tools, and expertise.

To minimize the implementation burden on Reading System implementers (who may be working with devices that have power and display constraints), the Preferred Vocabularies do not include all XHTML 1.1 elements and attributes. Further, the modules selected from the XHTML 1.1 specification were chosen to be consistent with current directions in XHTML.

Any construct deprecated in XHTML 1.1 is either deprecated or omitted from this specification; CSS-based equivalents are provided in most such cases. Style sheet constructs are also used for new presentational functionality beyond that provided in XHTML.

Relationship to CSS

This specification defines a style language based on CSS 2 (see http://www.w3.org/TR/CSS2/.) The style sheet MIME type text/x-oeb1-css has been deprecated in favor of text/css. Note that not all of CSS 2 is supported and there are additional extensions which are prefixed with oeb-. (oeb-page-head, oeb-page-foot, oeb-column-number, however, many ePub readers do not support these features anyway.)

Relationship to XML

OPS is based on XML because of its generality and simplicity, and because XML documents are likely to adapt well to future technologies and uses. XML also provides well-defined rules for the syntax of documents, which decreases the cost to implementers and reduces incompatibility across systems. Further, XML is extensible: it is not tied to any particular type of document or set of element types, it supports internationalization, and it encourages document markup that can represent a document’s internal parts more directly, making them amenable to automated formatting and other types of computer processing.

  • Reading Systems must be XML processors as defined in XML 1.1. All OPS Content Documents must be valid XML documents according to their respective schemas.

Relationship to XML Namespaces

Reading Systems must process XML namespaces according to the XML Namespaces Recommendation at http://www.w3.org/TR/xml-names11/. For example:

xmlns:ops="http://www.idpf.org/2007/ops"

Relationship to Dublin Core

Dublin Core is the defined standard for all metadata used in the ePub document. Only the id, title, and language are required but other entries are encouraged. Certainly author should be entered if known.

[edit] Readers

[edit] Software

[edit] Hardware

[edit] ePub Creation and Manipulation software

[edit] Creation tools

These tools can create (export) an ePub file. Editing is accomplished by editing the source and creating a new ePub.

  • Adobe InDesign
  • Apple Pages - Exports ePub files. Apple's Pages '09 Word Processor is included as part of the iWork '09 package. Mac OS X only. Must get latest software updates to enable ePub export.
  • Atlantis Word Processor - can convert any TXT/RTF/DOC/DOCX document to ePub; supports multilevel TOCs, font embedding, and batch conversion.
  • DAISY Pipeline - creation and checking tools.
  • eCub - a simple to use EPUB and MobiPocket ebook creator.
  • ePubSTAR - converts from Word, TXT, CHM to ePub 2 or ePub 3
  • ePUB Tools - A collection of open source tools used to create and check ePUB.
  • ePub Zip for Mac OS X - Drag & drop creation of an ePub file from a folder for Mac OS X.
  • EScape - An add-on for Open Office (ODT), not for commercial use.
  • Jutoh is an inexpensive WYSIWYG ebook editor for Windows, Mac OS X, Linux, FreeBSD and Solaris x86.
  • OpenBerg Rector.
  • Writer2ePub is a macro for Open Office Writer to directly save ePUB files.
  • Scrivener - Writer's tool. Has configurable ePUB exporter. Originally Mac OS X only. Now Windows is also supported. Linux version is beta.
  • RoboHelp - from Adobe for Windows

[edit] Editing Tools

The following tools can open and edit an existing ePub file. They can also be used as a creation tool.

[edit] Conversion tools

  • Calibre Click the "hammer" icon on the toolbar and set the default output format to EPUB.
  • ePuper - A german freeware for converting online editions of newspapers to ePUB.
  • eBook Converter - convert PDF to ePUB, ePUB to PDF, ePUB to Mobi, Mobi to ePUB, etc.
  • Convert uploads to ePUB at Feedbooks.com. You don't have to hit the "publish" button. Just edit your text, and download the ePub preview.
  • Python converter posted by MishaS - His latest version of oeb2epub is at his site.
  • Stanza - converter for PC and Mac, typically strips formatting prior to conversion.
  • Web2FB2 is a web site that will convert a URL to FB2 and ePUB format.
  • Comics2Reader A software for convert a set of image into a Epub file and other various fileformat
  • Wikipedia4epub - A tool downloads Wikipedia articles and creates the ePUB.
  • EPUB to Kindle converter - A tool for convert EPUB files to Amazon Kindle format.
  • Word to ePub is commercial multi-functional software which can convert between nine popular office file format like PDF, DOC, DOCX, Image, HTML, ODT, RTF, TXT, ePub.
  • NEWSTOEBOOK.COM - A web-based tool to convert RSS/Atom feeds (including GoogleReader subscriptions) to books in ePUB and MOBI formats.
  • Text2Epub for OS X - Commercial Mac OS X application for converting plain text files to ePUB

[edit] Checking tools

[edit] Utilities

  • Info ZIP and 7zip can be used to pack and unpack the archive. 7zip can also open the file and will work with .epub without renaming the extension.
  • EPubHub - A tool to work with ePUB containers and modify the contents directly.
  • ePUB Fixer - quickly repair TOC and other modifications to an ePUB file.
  • Metadata editor - specific for ePUB files.
  • ClipboardFusion - supercharge your clipboard. Allows editing and modifying the clipboard data before pasting it.
  • MobileRead forum - The ePub forum contributors add specialized ePub tools from time to time.

[edit] General Tips

  • It is possible to make an eBook that conforms to the standard by placing the entire book contents in one XHTML file but the performance will be impacted by this decision. For best performance a standard size book should be divided into several files as the full file needs to be loading into memory at once. This is usually accomplished by separating the files by chapter.
  • Some mobile devices cannot handle large ePUB files. Generally this is caused by having an XHTML file that is too large. If the file can be expanded the large XHTML file may be able to be broken into multiple files. Typically the files should not exceed about 300K for best performance.
  • The ePub file format has proper support for TOC, through the use of TOC.NCX files. Not all reader applications support this currently. This is documented in the DTBook standard.
  • Make sure all tags are complete (no dangling tags). htmltidy does a great job here
  • Get rid of as many tables as you can! A lot of these CHM type files put the entire content of the page in one table and that causes lots of problems
  • "normal" tables tend to get truncated in the reader due to being too wide. Convert these tables to some intelligent lists with <hr/>'s around them if you can.
  • For Apple's iBooks to identify and use a cover image, it's necessary to add metadata to the opf file identifying the cover image. <meta name="cover" content="[cover image id]" /> where [cover image id] is the id given to the cover image in the manifest section of the OPF file.
  • See Ebook Covers for tips on making a cover for your ePub eBook.
  • See example ePub for tips on ePub usage for specific features.

[edit] CSS Tips

[edit] General

  • Play with the CSS to get the colors cleaned up. A lot of the "color" gets translated to light gray. Best just to change everything to black or white that you can.
  • Avoid placing any text inside destination anchors (<a name="...">). Some readers incorrectly render these as links (underlined and blue).
  • <pre> blocks of code can go off the right side of the page. Use the CSS to shrink their font size or pre-wrap the line. At worst, reformat the blocks to keep them to a 70 character width at 6pt.
  • Some readers get very unhappy when embedded fonts have multiple local names.
  • When creating drop caps, do not use padding-bottom to set the vertical positioning; many readers compute the vertical position incorrectly if padding is nonzero. Instead use margins to set the vertical positioning of drop caps, which seems to be more compatible.
  • When working with drop caps, always explicitly set the line height to 1.2 em (or more), both for the body text and the drop cap block. Some readers set lower bounds on line height, and some fonts have an intrinsic line height that is less than 1.2em. Those readers may force the line height up to 1.2em on your behalf, resulting in the drop cap character appearing too high or too low in those readers.

See CSS HowTo for useful examples of CSS.

[edit] iBooks

  • In iBooks, if you want your font choices to be honored properly, add a com.apple.ibooks.display-options.xml file to the book. See the iBooks page for details.
  • The iBooks reader prevents the previous problem by setting -webkit-line-box-contain: block glyphs replaced;. This may cause unexpected behavior if your drop cap sticks up above the top of a paragraph. If your drop caps appear correctly in ADE and browsers but misbehave in iBooks, try adding -webkit-line-box-contain: block inline replaced !important; to the style for the html tag.
  • Avoid using fractional font sizes in SVG. iBooks may badly misposition the text if you do.
  • If you use a matrix transform on an SVG <text> element, avoid using multiple SVG <tspan> elements within that <text> element. If you attempt to do so, you will often see part of the second <tspan> element's contents placed atop the first <tspan> element's contents. Instead, create a second <text element for the second <tspan> element.

[edit] ADE

  • ADE may ignore the entire CSS file if there are any errors in the file. Make sure it passes a CSS check. Check particularly if there are CSS problems only with ADE applications. ADE also ignores the entire CSS file if it contains features that it does not understand (post-CSS2). To avoid losing styles, always put @media and @page rules into a separate CSS file so that when ADE ignores the entire file, it doesn't ignore other CSS along with it. Ditto for the IE-specific filter property.
  • ADE and Nook (based on ADE) do not support font-variant: small-caps. You can work around this by using a separate small caps font in which the lowercase glyphs are replaced by small caps glyphs.
  • If you are working around this ADE bug by using a separate small caps font instead of a normal font with a smcp OpenType feature, be sure to add font-variant: small-caps not only in any CSS styles that request the font, but also in the @font declaration for the style. If you forget to add it in the @font declaration, some readers (Sony Reader in particular) will fall back to the next font that has either an smcp feature or a separate small-cap variant style.
  • ADE and Nook (based on ADE) do not support the use of the :before pseudo-element with the content: property (unless this has been fixed recently).
  • Don't count on CSS precedence working correctly. Readers based on Adobe Digital Editions sometimes fail to treat classes in a selector as having a higher precedence than containing elements. For example, in spec-compliant readers and browsers, if you have a rule on "div.preface div.section p + p", a contrary rule on "p.classname" should override it, because a class on the selector for the element itself always has higher precedence than any number of elements). In ADE, however, the precedence calculation is broken, and the "div.preface div.section p + p" rule gets precedence.
  • In some iOS readers based on Adobe Digital Editions (seen on Sony Reader and Bluefire), if a paragraph containing a drop cap falls at the top of the screen (common in landscape mode), the top may be cut off. The reason for this is that these readers collapse the paragraph margins into the page margins (ignoring the margins for the drop cap elements inside them) while simultaneously clipping any content that falls beyond the page margins. To fix this, replace the top margin of these paragraphs with padding (which cannot be collapsed into a margin).

[edit] nook

  • Nook does not like to center heading tags (h1, h2, and so on). If you need to center content, use div.
  • Nested block and inline-block elements are problematic on some readers. In particular, Nook on iOS appears to ignore (treat as zero) the margins of inline-block elements when drawing the contents of any block elements that appear inside them. Thus, if you are doing drop caps, you must not use nested block elements for positioning purposes if you care about supporting Nook on iOS.
  • Some readers incorrectly calculate block element height when computed in ems. In particular, Nook on iOS tends to undersize its boxes. This can result in drop caps that overlap text. When setting the height CSS property for drop caps, find the smallest value and the largest value that result in correct behavior on a proper web browser (e.g. WebKit or Firefox), then choose a value that is somewhere near the middle of this range. This bug has been reported to B&N.
  • In Nook on iOS, with publisher defaults disabled, white text becomes black (even when it's on a black background). This bug has been reported to B&N.
  • In Nook on iOS, line spacing can be altered by the user. Unfortunately, Nook fails to multiply the new line spacing by the line-height multiplier specified in the CSS, so drop caps are correct only at the smallest setting. No workaround is possible. This bug has been reported to B&N.
  • Nook's non-publisher-defaults mode appears to determine whether to apply a font based on whether the immediately enclosing tag has a specified font, ignoring fonts on elements farther up the tree. (Either that or they specify default font settings for every element; I'm not sure which.) This can cause problems if you're using the <span> hack to support early versions of iBooks. For example:
       <!-- This is styled -->
       <p class="styledwithfont">Foo</p>
       
       <!-- This is not -->
       <p class="styledwithfont"><span>Foo</span></p>
  • By adding an extra span as shown above and combining it with a transparent font on the paragraph, you can make text that is invisible in publisher defaults mode, but shows up in Nook's default font hack mode.
  • Because of this behavior, if you must override the user-chosen font for a stretch of text when in the non-publisher-defaults mode, you must do so on the immediately enclosing element—that is, at the paragraph level, and again on any tags that might be included in those paragraphs—with as much specificity as possible, and certainly more than just single-element-name specificity.

[edit] Kobo

  • In Kobo reader (at least on iOS), if you set the left and right margins of the body tag to zero, you will see part of page 2 on page 1, and so on, and you will be unable to reach the last page of a chapter.

[edit] Apabi Reader

  • In Apabi Reader, a white, transparent border becomes black.

[edit] Stanza

  • In Stanza on iOS, text-align: center is ignored, but text-align: center !important is respected correctly. If centering is critical to your content's styling, always add !important. (Stanza for IOS appears to be discontinued as of August 2013.)

[edit] Tutorials

[edit] ePub 3

Here is some quick links on EPUB version 3. We also have a page in this wiki with more details.

The specification is made up of HTML5, CSS, SVG, images, multi-media support, MathML, SMIL and more.

[edit] For more information

Personal tools
Namespaces

Variants
Actions
Navigation
MobileRead Networks
Toolbox
Advertisement