E-book formats

From MobileRead
Jump to: navigation, search

There are many file formats used for eBooks. Usually, but not always, the file extension matches the name of the format. In some cases the extension may have two forms, where one form is limited to 3 characters (consistent with early Windows requirements).

Contents

[edit] eBook Formats

This section attempts to define and identify all (or most) of the eBook formats. With the great proliferation of formats for etext a person want to read an eBook can easily become confused. The most important ones are always the ones that work on the device or devices you own, but if you have a choice the most important ones are the ones that have the most eBook dealers or most eBooks available. Today there are essentially two kinds of ebook format: The various formats provided by Amazon and ePub versions 2 and 3. A third popular format is PDF from Adobe, but it does not tend to work as well on portable mobile devices due to the smaller screen size. PDF is more suited to computers since it often expects full paper size pages. Check Popular Formats Statistics for a list of popular formats as determined by the number of views made in this wiki.

As many reading devices settle on a sub-list of formats, ePUB and PDF have emerged as the leading list, with Adobe DRM support on both. Amazon does not support ePUB and insists on AZW and its variants. AZW format is no longer just one format, but a whole series of different formats (see the next section]. You often also see TXT, with TXT there is only minimal formatting, and HTML with eBook readers often ignoring any complicated formatting. More general purpose portable devices such as tablets will have loadable applications for these and other formats.

[edit] Main formats

These are the formats most commonly available commercially.

  • AZW - An Amazon proprietary format. This is usually the MOBI format with or without DRM. The DRM is unique to the Amazon Kindle. Files with this extension can be any of the Kindle formats.
  • AZW1 - An Amazon proprietary format. It is the TPZ format always with a custom DRM.
  • AZW3 - See KF8.
  • AZW4 - An Amazon proprietary format. It is the PDF format in a PDB wrapper, and usually (always?) with DRM.
  • EPUB An open format defined by the Open eBook Forum of the International Digital Publishing Forum (<idpf>). It is based on XHTML, XML and CSS2. It is an evolving standard. Current specifications are found at the idpf web site. Adobe, Barnes & Noble and Apple all have their own (incompatible) DRM systems for this format. There is now a new version of this format called ePub 3 but it is not yet in wide use.
  • KF8 - (Also called AZW3) It is basically ePub compiled with the PDB wrapper and with Amazon DRM. This format is supported by all Amazon readers from the Kindle Keyboard 3 onwards.
  • KFX - A semi-compiled format from Amazon designed to give better typography on Kindle devices, comes with a new DRM system.
  • MOBI - MobiPocket format, usable with MobiPocket's own reading software on almost any PDA and Smartphones. Mobipocket's Windows PC software can convert .chm, .doc, .html, .ocf, .pdf, .rtf, and .txt files to this format. Kindle uses this format, as well.
  • PDB - Palm Database File. Can hold several different e-book formats targeting Palm-enabled devices, commonly used for PalmDOC (AportisDoc) e-books and eReader formats as well and many others.
  • PDF - Portable Document Format created by Adobe for their Acrobat products. It is the defacto standard for document interchange. Software support exists for almost every computer platform and handheld device. Some devices have problems with PDF since most content available is scaled for either A4 or letter format, both of which are not easily readable when reduced to fit on small screens. Some Readers can reflow some PDF documents, including the Sony PRS505, to accommodate the small screen. Some eBook readers, including the iRex iLiad, have a pan-and-zoom feature that aids readability, but extracts a price in ergonomics.
  • PRC - Palm Resource File. Often holds a Mobipocket eBook but occasionally holds an eReader or AportisDoc eBook.
  • TPZ - Topaz file extension used on Amazon Kindle. Topaz is a collection of glyphs arrange on pages, along with an unproofed OCR text version. An Amazon proprietary format, used to make older books available quickly, since conversion is essentially automatic from scans of the pages of a book, but it reflows very well.

[edit] Other formats

Some of the formats in the list below are only available for a few or even one type of device. Some are more standardized. Be sure your device will read the format you choose.

  • ABW - File format used by AbiWord. If compressed it is ZABW.
  • ACSM - File format used to provide DRM on Adobe eBooks PDF and ePUB.
  • AEH - File format used by eBooksWriter.
  • AZK - Kindle format specifically for iOS devices.
  • BBeB - The Sony proprietary format. Stands for Broadband EBook, also known as LRF because of the file extension.
  • CBR/CBZ - Compressed container for images. The R means it is RAR compressed while Z means it is zipped compressed. The internal format can be any of several image formats and CBR/CBZ readers display these as multiple pages of a book. The name means Comic Book Reader but the use is for any book where the content contains basically pictures.
  • CHM - Compressed HTML, often used for Windows help files. It has become very popular for distribution of texts and other support materials over the Web.
  • DJVU - format by Lizardtech that is more and more widely used for scientific publications. Its main characteristics is that the compress ratio is about 10x better than in .pdf format at the same quality. Nothing beats it at the moment for b&w text and pictures.
  • DNL - A digital webbook format used by Desktop Author.
  • DOC - This format could be a document in Microsoft Word format, which uses a .doc extension, or in PalmDOC.
  • DOCX - This is the latest Office Open XML format used in Word 2007 and later. Older versions of Word can be upgraded to support this format.
  • DTB - Digital Talking Books are books for blind, visually impaired, physically handicapped, learning-disabled, or otherwise print-disabled readers. The DTBook establishes specifications for digital talking books (DTBs) as part of DAISY.
  • EBA - proprietary eBook format. Specifically supported by eBook Readers for the Chinese language.
  • EBAML same as EBA 2.0 - Used on the Dr. Yi Reader and other China products.
  • -ER.PDB - A Palm database format for the eReader reader. The ER is used to distinguish this format.
  • EXE format means that the eBook is actually a PC executable. Basically the eBook includes the program to display it.
  • FB2 - FictionBook format, based on XML and viewable by various e-book software solutions for Windows, Linux, PocketPC and Palm OS. Used by the HaaliReader, FBReader and the PalmFiction.
  • FB3 - Overhaul of FictionBook format. Based on XML. FB3 can be used for almost any type of content as compared to FB2 that can be used only for fiction.
  • FUB - Franklin eBook format.
  • GPF - Ganaxa Publishing Format, allowing hot spots and embedded rich media content.
  • GPX - A protected Ganaxa document.
  • HTML - Hyper Text Markup Language is the backbone of the World Wide Web. Many texts are distributed in this format. In addition, some eBook readers support Cascading Style Sheets (CSS) that are basically a master style guide for HTML pages.
  • HTXT seems to be encrypted TXT files that are DRM locked to a particular reader. They are unique to Hanvon devices. An earlier version was called MTXT.
  • IMP - an eBook format used by eBook Technologies ETI-1 (REB 1200/Softbook Reader) or ETI-2 (EBookwise-1150/Gemstar 1150). Some programs convert to it. It is considered a terminal format but see: IMP software.
  • -IS.PDB - A Palm database format for the ISilo reader. The IS is used to distinguish this format.
  • KFX - Amazon's latest format, only available on some Kindles.
  • KML - HieBook eBook format.
  • LIT - Microsoft's native format for its Microsoft Reader.
  • LRF - also: BBeB book. Sony's proprietary format. Supported by the Sony Librie and Sony Reader.
  • LRS - also: BBeB Xylog XML. Source format for BBeB books, which is compiled into LRF for reading on the device.
  • LRX - A protected BBeB document. The Sony Librie and Sony Reader use mutually incompatible formats.
  • MART - This is a proprietary format used only on the Martview web site for distributing books consisting of images.
  • NP - format is the Newspaper download format used by NewspaperDirect for their PressDisplay product.
  • OEB - Open eBook format. An ebook format used by EBookwise-1150, MobiPocket, and Microsoft Reader. This standard is an older version of ePUB.
  • ODT - This is an open standard, document format used in OpenOffice.org, Star Office and many other word processors.
  • OSIS - This is an XML Schema definition for Bibles and other Biblical research texts.
  • PKG a format used on the Apple Newton.
  • PNPd -€ format used by the eReader program. This is a popular format (Also known as PML).
  • PS - Postscript is supported on a few reading programs but is intended for sending information to a printer.
  • RB - Rocket eBook format made for the Rocket eBook device and the Gemstar RCA REB 1100.
  • RTF - Rich Text Format is a document interchange format available supported by some e-book readers, and also by many Word processing applications including MS Word and OpenOffice. It is the preferred format for many users who create their own content for the Sony Reader.
  • SGF - Native format for Sigil, a direct editor for ePUB. (no longer used)
  • STK - STAReBOOK's proprietary format.
  • TCR - eBook for EPOC.
  • TeBR - Tiny Ebook Reader format also used by Fictionwise.
  • TXT - Text is the base type, with no formatting applied other than space, paragraph, end of line, new line, and tab. It is usable in many e-book devices.
  • TR - Tome Raider file. Tome Raider is an eBook format that features support for very large books such as reference books, encyclopedias, and dictionaries. Their latest format is called TR3.
  • VBK - eBook format from VitalSource that features graphics support. These are usually textbooks.
  • XDXF - Dictionary exchange format based on XML.
  • XEB - format used by Apbi eBooks in Chinese primarily.
  • XHTML - specialized version of HTML designed to conform to XML rules of construction. It is the standard format for epub data.
  • XML - general purpose markup language for exchange of data. In the context of eBooks it is generally confined to XHTML and RSS feeds although some other formats have been defined.
  • WOLF - Proprietary format used by HanLin eBook in their V2B, V3, and V8 eBook readers. Usually a .wol extension is used. Also used by JCNIP on their Dr. Yi ebook reader.
  • zTXT - format used by the WeaselReader on Palm devices. Has a .pdb extension.
  • ZNO - proprietary format for Zinio electronic subscription magazines. These magazines include multimedia material like photos and videos. It is rumored that the format is based on DJVU.

[edit] Other File Types

A number of e-book readers and PDAs play music (usually in MP3) and display graphics (usually in JPG) even outside of eBooks.

  • LRC - A annotation format originally intended for Lyrics. It can also be used for read along eBooks. It is a text file that can be synced with audio or video files with an appropriate program.
  • Some eBook readers also provide support for reading typical business office files such as RTF, DOC/DOCX, XLS/XLSX, PPT/PPTX.

[edit] Audio Formats

See also: Sound - These are for Music and audio books. Some are specific to speech.

  • AA - Audible.com Audio proprietary format with four different levels of DRM.
  • AAC - Advanced Audio Codec is more of a container than a format as within an AAC the music can be encoded in multiple ways from iTunes M4P all the way to a lossless compression.
  • AAX - Enhanced audio from Audible.com. It is embedded with other features, such as images, graphs, maps, or links.
  • MP3 - the currently most popular music compression format. It is widely used throughout the Internet and plays on almost every portable music player. This format is also used for some audio books.
  • WMA - Windows Media Audio is an audio format developed by Microsoft to compete with MP3.
  • OGG - Free, open standard container for Vorbis audio compression codec files, as well as for Free Lossless Audio Codec (FLAC), Speex speech compression codec, and Theora lossy video codec.

[edit] Graphic Formats

See also: Graphics

  • BMP - BitMaP image file is an uncompressed graphics format developed by Microsoft.
  • GIF - Graphics Interchange Format was developed in 1987 by CompuServe and is a lossless graphics format designed for the reproduction of line drawings rather than photographs. Widely used on the Internet for logotypes and drawings. The coding scheme is patented as it uses the LZW lossless compression scheme however the patents ran out in 2004. PNG was developed to replace GIF and has no patent issues. Main drawbacks are color support (max. 256 colors) and only 1-bit alpha channel (transparency bit).
  • JPG - (or JPEG) stands for Joint Photographic Experts Group and a lossy compressed graphics format designed to support photographs rather than line art. Developed in 1992 and issued as the ISO 10918-1 standard in 1994, the quality depends directly on the amount of compression employed. Widely used on the Internet and by most digital camera manufacturers. A newer format is called JPEG 2000.
  • PNG - Portable Network Graphics format is a bitmapped graphic format that employs a lossless compression system. Designed to improve upon and replace GIF files, PNG does not require a patent license. Main drawback is the complexity of its color model.
  • SVG - a vector graphics format that is supported by ePUB.
  • SWF - Shockwave Flash is currently the dominant format for displaying "animated" vector graphics on the Web.
  • TIF - (or TIFF) Tagged Image File Format is a container that can hold images in a wide variety of bitmapped or even vector formats. They can also be compressed or uncompressed. If compressed they can use RLE, JPG, LZW, Zip or potentially other formats. This standard is owned by Adobe. Main drawback is that it is so versatile that saying that TIF is a supported format may mean nothing since there are really many TIFF formats.
  • IW44 - A subset simplified version of DJVU.

[edit] Compression Formats

These are lossless compression formats that reduce the amount of space required to store a document. Text, unlike music, can be compressed a great deal. Sometimes the compression can be as much as 90%. These formats are considered to be containers in that they can hold multiple files. In some case the ability to hold multiple files is more important than the actual compression.

Don't confuse compression formats with eBook formats. Although listed by some eBook readers as a supported format, these readers have only the ability to extract the compressed file and to get to the file or files inside. The reader must still support the actual underlying eBook format. Also, some eBook formats already include compression.

  • RAR - a file compression system providing one of the most compact resultant files current available in wide distribution. The premier tool for RAR is WinRAR but 7ZIP works as well.
  • ZIP - the most universal of the compression tools. Slightly less efficient than RAR files, ZIP files have been around longer and enjoy more support.
  • LHA - a Japanese developed compressed archive file format. A Microsoft Compressed (LZH) Folder Add-on is included with the Japanese version of Windows to use this format.
  • GZIP - A zip format (.gz) that was developed by the GNU team. It is designed to be zipped or unzipped on the fly and only supports one file. Often the file is a tar (.tar) format which is a container (archive) format. When used together the file extension is usually .tgz or .tar.gz.
  • BZIP2 - compresses files using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding. Compression is generally considerably better than that achieved by more conventional LZ77 and LZ78-based compressors, and approaches the performance of the PPM (prediction by partial mapping) family of statistical compressors. The file extension is generally .bz2

[edit] Supported Format Matrix

Brandname
HanLin eBook
HanLin eBook
HanLin eBook
Sony Portable Reader
iRex
Amazon Kindle
STAReBOOK
Bookeen
eBookwise 1150
REB 1200
Model
V8
V3
V2
PRS-500 PRS-505 PRS-700
iLiad BE DR
D00111
STK-101
Gen 3
1150
1200
Manufacturer
Jinke
Jinke
Jinke
Sony
iRex
Hon Hai Precision
eREAD
eREAD
ETI
ETI
Display Type
e-ink
e-ink
e-ink
e-ink
e-ink
e-ink
e-ink
e-ink
Backlit Gray LCD
Backlit Color LCD
eBooks
AZW
Yes
BBeB
Yes
CHM
Yes
Yes
DOC
Yes
Yes
Note 1
Note 2
Note 1
Note 1
ePUB
Yes
Note 4
FB2
Yes
HTML
Yes
Yes
Yes
Yes
Yes
Note 2
Note 2
IMP
Yes
Yes
MOBI
Yes
Yes
Yes
Yes
PalmDOC
Yes
Yes
PRC
Yes
Yes
Yes
Yes
PDF
Yes
Yes
Yes
Yes
Note 2
Yes
Yes
RB
Note 2
Note 2
RTF
Yes
Yes
Yes
Note 2
Note 2
STK
Yes
TXT
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Note 2
Note 2
Wolf
Yes
Yes
Yes
XEB
Yes
Note 5
Music
AAC
Yes
MP3
Yes
Yes
Yes
Yes
Yes
Yes
Yes
AA
Yes
Graphics
BMP
Yes
Yes
Note 3
Note 3
DJVU
Yes
GIF
Yes
Yes
Yes
Note 3
Note 3
JPG
Yes
Yes
Yes
Yes
Yes
Yes
Note 3
Note 3
PNG
Yes
Yes
Yes
Yes
Note 3
Note 3
TIF
Yes
Compressed
RAR
Yes
Yes
ZIP
Yes
Yes
  • Note 1 - Requires a manufacturer supplied conversion program and Word.
  • Note 2 - Requires a manufacturer supplied conversion program.
  • Note 3 - Only supported inside of a document.
  • Note 4 - Only supported on PRS-505, PRS-700, PRS-600, PRS-300
  • Note 5 - Chinese version of iLiad only

[edit] For more information

[edit] External Links

Personal tools
Namespaces

Variants
Actions
Navigation
MobileRead Networks
Toolbox