MOBI
From MobileRead
MOBI is the format used by the the MobiPocket Reader. It may have a .mobi extension or it may have a .prc extension. The extension can be changed by the user to either of the accepted forms. In either case it may be DRM protected or non-DRM. The .prc extension is used because the PalmOS doesn't support any file extensions except .prc and .pdb. Note that Mobipocket prohibits their DRM format to be used on dedicated eBook readers that support other DRM formats.
Contents |
[edit] Description
MOBI format was originally an extension of the PalmDOC format by adding certain HTML like tags to the data. Many MOBI formatted documents still use this form. However there is also a high compression version of this file format that compresses data to a larger degree in a proprietary manner. There are some third party programs that can read the eBooks in the original MOBI format but there are only a few third party program that can read the eBooks in the new compressed form. The higher compression mode is using a huffman coding scheme that has been called the Huff/cdic algorithm. For a description in python check huffdic.py available as part of the Calibre project.
From time to time features have been added to the format so new files may have problems if you try and read them with a down level reader. Currently the source files follow the guidelines in the Open eBook format.
Note that AZW for the Amazon Kindle is the same format as MOBI except that it uses a slightly different DRM scheme.
[edit] Format
Like PalmDOC, the Mobipocket file format is that of a standard Palm Database Format file. The header of that format includes the name of the database (usually the book title and sometimes a portion of the authors name) which is up to 31 bytes of data. The files are identified as Creator ID of MOBI and a Type of BOOK.
Mobipocket have some minimal file format info, mainly about the html encoding they use in the text of the book, at http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen
[edit] PalmDOC Header
The first record in the Palm Database Format gives more information about the Mobipocket file. The first 16 bytes are almost identical to the first sixteen bytes of a PalmDOC format file.
| bytes | content | comments |
|---|---|---|
| 2 | Compression | 1 == no compression, 2 = PalmDOC compression, 17480 = HUFF/CDIC compression |
| 2 | Unused | Always zero |
| 4 | text length | Uncompressed length of the entire text of the book |
| 2 | record count | Number of PDB records used for the text of the book. |
| 2 | record size | Maximum size of each record containing text, always 4096 |
| 4 | Current Position | Current reading position, as an offset into the uncompressed text |
There are two differences from a Palm DOC file. There's an additional compression type (17480), and the Current Position bytes are used for a different purpose:
| bytes | content | comments |
|---|---|---|
| 2 | Encryption Type | 0 == no encryption, 1 = Old Mobipocket Encryption, 2 = Mobipocket Encryption |
| 2 | Unknown | Usually zero |
The old Mobipocket Encryption scheme only allows the file to be registered with one PID, unlike the current encryption scheme that allows multiple PIDs to be used in a single file. Unless specifically mentioned, all the encryption information on this page refers to the current scheme.
[edit] MOBI Header
Most Mobipocket file also have a MOBI header in record 0 that follows these 16 bytes, and newer formats also have an EXTH header following the MOBI header, again all in record 0 of the PDB file format.
The MOBI header is of variable length and is not documented. Some fields have been tentatively identified as follows:
| offset | bytes | content | comments |
|---|---|---|---|
| 16 | 4 | identifier | the characters M O B I |
| 20 | 4 | header length | the length of the MOBI header, including the previous 4 bytes |
| 24 | 4 | Mobi type | The kind of Mobipocket file this is
2 Mobipocket Book 3 PalmDoc Book 4 Audio 257 News 258 News_Feed 259 News_Magazine 513 PICS 514 WORD 515 XLS 516 PPT 517 TEXT 518 HTML |
| 28 | 4 | text Encoding | 1252 = CP1252 (WinLatin1); 65001 = UTF-8 |
| 32 | 4 | Unique-ID | Some kind of unique ID number (random?) |
| 36 | 4 | Generator version | Potentially the version of the Mobipocket-generation tool. Always >= the value of the "format version" field and <= the version of mobigen used to produce the file. |
| 40 | 40 | Reserved | all 0xFF. In case of a dictionary, or some newer file formats, a few bytes are used from this range of 40 0xFFs |
| 80 | 4 | First Non-book index? | First record number (starting with 0) that's not the book's text |
| 84 | 4 | Full Name Offset | Offset in record 0 (not from start of file) of the full name of the book |
| 88 | 4 | Full Name Length | Length in bytes of the full name of the book |
| 92 | 4 | Language | Book language code. Low byte is main language 09= English, next byte is dialect, 08 = British, 04 = US |
| 96 | 4 | Input Language | Input language for a dictionary |
| 100 | 4 | Output Language | Output language for a dictionary |
| 104 | 4 | Format version | Potentially the version of the Mobipocket format used in this file. Always >= 1 and <= the value of the "generator version" field. |
| 108 | 4 | First Image index? | First record number (starting with 0) that contains an image. Image records should be sequential. |
| 112 | 16 | ? | sizteen bytes, often zeros |
| 128 | 4 | EXTH flags | bitfield. if bit 6, 0x40 is set, then there's an EXTH record |
| 132 | 36 | ? | 32 unknown bytes, if MOBI is long enough |
| 168 | 4 | DRM Offset | Offset to DRM key info in DRMed files. 0xFFFFFFFF if no DRM |
| 172 | 4 | DRM Count | Number of entries in DRM info. |
| 174 | 4 | DRM Size | Number of bytes in DRM info. |
| 176 | 4 | DRM Flags | Some flags concerning the DRM info. |
| 180 | ? | ? | Bytes to the end of the MOBI header, including the following if the header length >= 232. ( 248 from start of record) |
| 242 | 2 | Extra Data Flags | A set of binary flags, some of which indicate extra data at the end of each text block. This only seems to be valid for Mobipocket format version 6 (and higher?) |
[edit] EXTH Header
If the MOBI header indicates that there's an EXTH header, it follows immediately after the MOBI header. since the MOBI header is of variable length, this isn't at any fixed offset in record 0. Note that some readers will ignore any EXTH header info if the mobipocket version number specified in the MOBI header is 2 or less (perhaps 3 or less).
The EXTH header is also undocumented, so some of this is guesswork.
| bytes | content | comments |
|---|---|---|
| 4 | identifier | the characters E X T H |
| 4 | header length | the length of the EXTH header, including the previous 4 bytes |
| 4 | record Count | The number of records in the EXTH header. the rest of the EXTH header consists of repeated EXTH records to the end of the EXTH length. |
| EXTH record start | Repeat until done. | |
| 4 | record type | Exth Record type. Just a number identifying what's stored in the record |
| 4 | record length | length of EXTH record = L , including the 8 bytes in the type and length fields |
| L-8 | record data | Data. |
| EXTH record end | Repeat until done. |
There are lots of different EXTH Records types. Ones found so far in Mobipocket files are listed here, with possible meanings. Hopefully the table will be filled in as more information comes to light.
| record type | usual length | name | comments |
|---|---|---|---|
| 1 | drm_server_id | ||
| 2 | drm_commerce_id | ||
| 3 | drm_ebookbase_book_id | ||
| 100 | author | ||
| 101 | publisher | ||
| 102 | imprint | ||
| 103 | description | ||
| 104 | isbn | ||
| 105 | subject | ||
| 106 | publishingdate | ||
| 107 | review | ||
| 108 | contributor | ||
| 109 | rights | ||
| 110 | subjectcode | ||
| 111 | type | ||
| 112 | source | ||
| 113 | asin | ||
| 114 | versionnumber | ||
| 115 | sample | ||
| 116 | startreading | ||
| 118 | retail price (as text) | ||
| 119 | retail price currency (as text) | ||
| 201 | coveroffset | ||
| 202 | thumboffset | ||
| 203 | hasfakecover | ||
| 204 | 204 Unknown | ||
| 205 | 205 Unknown | ||
| 206 | 206 Unknown | ||
| 207 | 207 Unknown | ||
| 208 | 208 Unknown | ||
| 300 | 300 Unknown | ||
| 401 | clippinglimit | ||
| 402 | publisherlimit | ||
| 403 | 403 Unknown | ||
| 404 | 404 ttsflag | ||
| 501 | 4 | cdetype | PDOC - Personal Doc; EBOK - ebook; |
| 502 | lastupdatetime | ||
| 503 | updatedtitle |
And now, at the end of Record 0 of the PDB file format, we usually get the full file name, the offset of which is given in the MOBI header.
[edit] Variable-width integers
Some parts of the Mobipocket format encode data as variable-width integers. These integers are represented big-endian with 7 bits per byte in bits 1-7. They may be either forward-encoded, in which case only the LSB has bit 8 set, or backward-encoded, in which case only the MSB has bit 8 set. For example, the number 0x11111 would be represented forward-encoded as:
0x04 0x22 0x91
And backward-encoded as:
0x84 0x22 0x11
[edit] Trailing entries
The Extra Data Flags field of the MOBI header indicates which, if any, trailing entries are appended to the end of each text record. Each set bit in the field indicates a trailing entry. The entries appear to occur in bit-order; e.g., trailing entry 1 immediately follows the text content and entry 16 occurs at the very end of the record. The effect and exact details of most of these entries is unknown. The trailing entries indicated by bits 2-16 appear to follow a common format. That format is:
<data><size>
Where <size> is the size of the entire trailing entry (including the size of <size>) as a backward-encoded Mobipocket variable-width integer.
Only a few bits have been identified
| bit | Data at end of records |
|---|---|
| 0x0001 | Multi-byte character overlaps |
| 0x0002 | Some data to help with indexing |
| 0x0004 | Some data about uncrossable breaks |
[edit] Multibyte character overlap
When bit 1 of the Extra Data Flags field is set, each record is followed by a trailing entry containing any extra bytes necessary to complete a multibyte character which crosses the record boundary. The bytes do not participate in compression regardless which compression scheme is used for the file. The overlapping bytes then re-appear as normal content at the beginning of the following record. The trailing entry ends with a byte containing a count of the overlapping bytes plus additional flags.
| offset | bytes | content | comments |
|---|---|---|---|
| 0 | 0-3 | N terminal bytes of a multibyte character | |
| N | 1 | Size & flags | bits 1-2 encode N, use of bits 3-8 is unknown |
[edit] PalmDOC Compression
PalmDOC uses LZ77 compression techniques. DOC files can contain only compressed text. The format does not allow for any text formatting. This keeps files small, in keeping with the Palm philosophy. However, extensions to the format can use tags, such as HTML or PML, to include formatting within text. These extensions to PalmDoc are not interchangeable and are the basis for most eBook Reader formats on Palm devices.
LZ77 algorithms achieve compression by replacing portions of the data with references to matching data that has already passed through both encoder and decoder. A match is encoded by a pair of numbers called a length-distance pair, which is equivalent to the statement "each of the next length characters is equal to the character exactly distance characters behind it in the uncompressed stream." (The "distance" is sometimes called the "offset" instead.)
In the PalmDoc format, a length-distance pair is always encoded by a two-byte sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding the distance, 3 go to encoding the length, and the remaining two are used to make sure the decoder can identify the first byte as the beginning of such a two-byte sequence. The exact alforithm needed to decode the compressed text can be found on the PalmDOC page.
PalmDOC data is always divided into 4096 byte blocks and the blocks are acted upon independently.
PalmDOC does have support for bookmarks. These pointers are named and refer to an offset location in a file. If the file is edited these locations may no longer refer to the correct locations. Some reading programs allow the user to enter or edit these bookmarks while others treat them as a TOC. Some reading programs may ignore them entirely. They are stored at the end of the file itself so the full file needs to be scanned when loaded to find them.
[edit] MBP
This is the extension used on a side file (auxiliary) for MOBI formatted eBooks. It is used to store metadata used by the library software and also to store user entered data like bookmarks, annotations, last read position. This file is created automatically by the reader program when the eBook is first opened and has a .mbp extension. The Library management software in MobiPocket uses this file to get information displayed in the library window such as title and author so that it won't have to open the larger eBook file.
There is an ongoing effort to describe the binary MBP file format (see this site).
[edit] eBook Creation
There are several ways to create eBooks in the MOBI format. The rules for the format of the source files need to create eBooks in MOBI are spelled out in documents on the MobiPocket web site. The recommended tool called MobiPocket Creator is available as a download from the web site.
EBooks can also be converted from other forms using the Windows version of the MobiPocket Reader. Once converted the file can be used on any device supported by MobiPocket Reader.
[edit] Guidelines
In order to better support the features of the MobiPocket Reader there are some guidelines that need to be followed when creating a book in this format.
- Do not specify a default font family, font size or other font attributes such as weight or color. This is a choice the person reading the eBook should be able to make. Fonts Sizes and Attributes can be specified for special headings and other specific items. Use only generic font families.
- Do not impose justification for standard text. It may be needed for captions and other special text.
- Do not use tables for anything except table data. Nested tables are not supported.
- Do not use blank lines to try and force page changes. Use the <mbp:pagebreak/> tag.
- Do not use multiple books for different devices. Instead use advanced features such as multi resolution images and platform specific frames.
[edit] Adapting images to various PDA screen resolutions
The IMG tag in Mobipocket publications supports up to three source attributes for various resolutions: src, losrc and hisrc. This makes it possible to optimize the same ebook for various devices. The image to be displayed is dynamically selected by the Reader according to the resolution of the screen on the actual device:
| losrc | <= 239 pixels | Low rez 160x160 Palm devices (PalmVx, Treo 600, Zire) Smartphones (Nokia 3650, Sony Ericson P800/900, Microsoft smartphones) |
| src | >= 240 pixels (handhelds) | Pocket PC, Hi rez Palm devices (Sony Clie, Tungsten, Zire 71) |
| hirsc | >= 480 pixels | any desktop or tablet PC |
Example:
<img hisrc="cover480x640.gif" src="cover220x300.gif" losrc="cover140x140.gif"/>
Please also notice that there is a 63KB internal limitation for images (this is a restriction of the Mobipocket .PRC format). GIFs have to be smaller than 63KB. You can use GIF optimization programs such as Ulead Smart Saver to get GIFs smaller than 63KB. (If images are bigger than 63KB, they are automatically resized to fit in the limit by MobiGEN but you might not like the result). Jpeg images will use a lower Quality setting to get the image size down without reducing the pixel size.
[edit] Format limitations
There are many limitations in the MOBI format. A few are listed here.
- Blocks of text can never have a greater than normal margin on their right side.
- Left margins can only be specified in 1em increments. Text can only have a hanging indent if it has no left margin.
- Text cannot flow around images taller than one line of text.
- Image sizes cannot be scaled with font size.
- In some -- but not all -- Mobipocket renderers, text with a left margin changes that margin value per line based upon the font-size at which point the preceding line-break occurred.
- Many measures, such as the indent of a hanging indent, cannot be specified in ems.
- Individual items of text cannot be displayed in a monospace font.
- Tables display wildly differently on different Mobipocket renderers, especially tables which cross more than one screen.
- Nested tables are not supported at all.
- In addition you only get the full range of Mobipocket's formatting capabilities if you have markup written to use Mobipocket's non-standard, extended, and under-documented implementation of HTML 3.2. See: File tag reference on the mobipocket web site.
[edit] MOBI DRM
Mobi DRM can optionally be applied to this file format. There is the standard scheme supported by Mobipocket and Overdrive servers. This is based on an ID derived from the reading device or program. This PID must be known to the server when an eBook is purchased and will be embedded in the file and locked to the device. The licensing scheme does permit multiple devices (usually up to 4) to be supported. In this case the server needs to know device id of all the devices. If you add a device you must tell the server and redownload the eBook to be able to read it on the new device. Normally there is no charge to add a device or for redownloading the eBook. If the dealer goes out of business you may not be able to add a device since there would be no way to redownload the file.
A second, simpler scheme, only requires knowledge of the account login name and password used to purchase the eBook. Once this data is entered the eBook can be read. Entering this data is only required once per device. This is a new scheme and some readers may not have support for this method.
A third method used on some ebooks is to use a generic MOBI key. It has encryption but only using the generic MOBI key (not a PID-specific key). This means that can be read by any MobiPocket Reader software, on any device, but not by any non-MobiPocket software.
The DRM applies only to the eBook itself and not to the metadata. A library routine can read the metadata without having to unlock the eBook. Some programs have been devised to even be able to change this information without touching the DRM portion of the file.
[edit] MOBI eBook Readers and converters
In addition to the MobiPocket supplied Readers there are also 3rd party readers and converters. This include:
- Stanza
- Calibre
- FBReader
- Book Designer
- MBP_reader (program that can extract MBP notes to text files).
[edit] MOBI eBook Hardware Readers
- Bookeen Cybook Gen3
- Hanlin V3 / Bebook / EZ Reader
- iRex iLiad
- iRex Digital Reader
- Amazon Kindle Readers
Not all eBook readers that support Mobi format have the same features. Check Mobi Comparison for details on what is actually supported.

