MOBI
MOBI is the name given to the format developed for the MobiPocket Reader. It is currently used by Amazon with a slightly different DRM scheme and called AZW. Amazon uses this extension for files created by KindleGen even though they actually have both a MOBI format, sometimes called KF7, and a KF8 format inside the same file.
[edit] Overview
MOBI is the format used by the MobiPocket Reader and Amazon Kindle Readers. It may have a .mobi extension or it may have a .prc extension. The extension can be changed by the user to either of the accepted forms. In either case it may be DRM protected or non-DRM. The .prc extension is used because the PalmOS doesn't support any file extensions except .prc or .pdb. Note that Mobipocket prohibits their DRM format to be used on dedicated eBook readers that support other DRM formats. Mobi source files are based on the OEB, Open eBook standard.
[edit] Description
MOBI format was originally an extension of the PalmDOC format by adding certain HTML like tags to the data (See EBook HTML‎). Many MOBI formatted documents still use this form. However there is also a high compression version of this file format that compresses data to a larger degree in a proprietary manner. There are some third party programs that can read the eBooks in the original MOBI format but there are only a few third party programs that can read the eBooks in the new compressed form. The higher compression mode is using a Huffman coding scheme that has been called the Huff/cdic algorithm. For a description in Python check huffcdic.py available as part of the Calibre project.
From time to time features have been added to the format so new files may have problems if you try to read them with a down level reader. Currently the source files follow the guidelines in the Open eBook format.
Note that AZW for the Amazon Kindle is the same format as MOBI except that it uses a different DRM scheme. Amazon owns MobiPocket. The format description below applies to both file types.
[edit] Format
Like PalmDOC, the Mobipocket file format is that of a standard Palm Database Format file. The header of that format includes the name of the database (usually the book title and sometimes a portion of the authors name) which is up to 31 bytes of data. The files are identified as Creator ID of MOBI and a Type of BOOK.
Mobipocket have some minimal file format info, mainly about the HTML encoding they use in the text of the book, at http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen (replaced with archive copy). Also see EBook HTML for Mobi7 version of HTML.
[edit] PalmDOC Header
The first record in the Palm Database Format gives more information about the Mobipocket file. The first 16 bytes are almost identical to the first sixteen bytes of a PalmDOC format file.
offset | bytes | content | comments |
---|---|---|---|
0 | 2 | Compression | 1 == no compression, 2 = PalmDOC compression, 17480 = HUFF/CDIC compression |
2 | 2 | Unused | Always zero |
4 | 4 | text length | Uncompressed length of the entire text of the book |
8 | 2 | record count | Number of PDB records used for the text of the book. |
10 | 2 | record size | Maximum size of each record containing text, always 4096 |
12 | 4 | Current Position | Current reading position, as an offset into the uncompressed text |
There are two differences from a Palm DOC file. There's an additional compression type (17480), and the Current Position bytes are used for a different purpose:
offset | bytes | content | comments |
---|---|---|---|
12 | 2 | Encryption Type | 0 == no encryption, 1 = Old Mobipocket Encryption, 2 = Mobipocket Encryption |
14 | 2 | Unknown | Usually zero |
The old Mobipocket Encryption scheme only allows the file to be registered with one PID, unlike the current encryption scheme that allows multiple PIDs to be used in a single file. Unless specifically mentioned, all the encryption information on this page refers to the current scheme.
[edit] MOBI Header
Most Mobipocket file also have a MOBI header in record 0 that follows these 16 bytes, and newer formats also have an EXTH header following the MOBI header, again all in record 0 of the PDB file format.
The MOBI header is of variable length and is not documented. Some fields have been tentatively identified as follows:
offset | hex | bytes | content | comments |
---|---|---|---|---|
16 | 0x10 | 4 | identifier | the characters M O B I |
20 | 0x14 | 4 | header length | the length of the MOBI header, including the previous 4 bytes |
24 | 0x18 | 4 | Mobi type | The kind of Mobipocket file this is
2 Mobipocket Book 3 PalmDoc Book 4 Audio 232 mobipocket? generated by kindlegen1.2 248 KF8: generated by kindlegen2 257 News 258 News_Feed 259 News_Magazine 513 PICS 514 WORD 515 XLS 516 PPT 517 TEXT 518 HTML |
28 | 0x1c | 4 | text Encoding | 1252 = CP1252 (WinLatin1); 65001 = UTF-8 |
32 | 0x20 | 4 | Unique-ID | Some kind of unique ID number (random?) |
36 | 0x24 | 4 | File version | Version of the Mobipocket format used in this file. |
40 | 0x28 | 4 | Ortographic index | Section number of orthographic meta index. 0xFFFFFFFF if index is not available. |
44 | 0x2c | 4 | Inflection index | Section number of inflection meta index. 0xFFFFFFFF if index is not available. |
48 | 0x30 | 4 | Index names | 0xFFFFFFFF if index is not available. |
52 | 0x34 | 4 | Index keys | 0xFFFFFFFF if index is not available. |
56 | 0x38 | 4 | Extra index 0 | Section number of extra 0 meta index. 0xFFFFFFFF if index is not available. |
60 | 0x3c | 4 | Extra index 1 | Section number of extra 1 meta index. 0xFFFFFFFF if index is not available. |
64 | 0x40 | 4 | Extra index 2 | Section number of extra 2 meta index. 0xFFFFFFFF if index is not available. |
68 | 0x44 | 4 | Extra index 3 | Section number of extra 3 meta index. 0xFFFFFFFF if index is not available. |
72 | 0x48 | 4 | Extra index 4 | Section number of extra 4 meta index. 0xFFFFFFFF if index is not available. |
76 | 0x4c | 4 | Extra index 5 | Section number of extra 5 meta index. 0xFFFFFFFF if index is not available. |
80 | 0x50 | 4 | First Non-book index? | First record number (starting with 0) that's not the book's text |
84 | 0x54 | 4 | Full Name Offset | Offset in record 0 (not from start of file) of the full name of the book |
88 | 0x58 | 4 | Full Name Length | Length in bytes of the full name of the book |
92 | 0x5c | 4 | Locale | Book locale code. Low byte is main language 09= English, next byte is dialect, 08 = British, 04 = US. Thus US English is 1033, UK English is 2057. |
96 | 0x60 | 4 | Input Language | Input language for a dictionary |
100 | 0x64 | 4 | Output Language | Output language for a dictionary |
104 | 0x68 | 4 | Min version | Minimum mobipocket version support needed to read this file. |
108 | 0x6c | 4 | First Image index | First record number (starting with 0) that contains an image. Image records should be sequential. |
112 | 0x70 | 4 | Huffman Record Offset | The record number of the first huffman compression record. |
116 | 0x74 | 4 | Huffman Record Count | The number of huffman compression records. |
120 | 0x78 | 4 | Huffman Table Offset | |
124 | 0x7c | 4 | Huffman Table Length | |
128 | 0x80 | 4 | EXTH flags | bitfield. if bit 6 (0x40) is set, then there's an EXTH record |
132 | 0x84 | 32 | ? | 32 unknown bytes, if MOBI is long enough |
164 | 0xa4 | 4 | Unknown | Use 0xFFFFFFFF |
168 | 0xa8 | 4 | DRM Offset | Offset to DRM key info in DRMed files. 0xFFFFFFFF if no DRM |
172 | 0xac | 4 | DRM Count | Number of entries in DRM info. 0xFFFFFFFF if no DRM |
176 | 0xb0 | 4 | DRM Size | Number of bytes in DRM info. |
180 | 0xb4 | 4 | DRM Flags | Some flags concerning the DRM info. |
184 | 0xb8 | 8 | Unknown | Bytes to the end of the MOBI header, including the following if the header length >= 228 (244 from start of record).
Use 0x0000000000000000. |
192 | 0xc0 | 2 | First content record number | Number of first text record. Normally 1. |
194 | 0xc2 | 2 | Last content record number | Number of last image record or number of last text record if it contains no images. Includes Image, DATP, HUFF, DRM. |
196 | 0xc4 | 4 | Unknown | Use 0x00000001. |
200 | 0xc8 | 4 | FCIS record number | |
204 | 0xcc | 4 | Unknown (FCIS record count?) | Use 0x00000001. |
208 | 0xd0 | 4 | FLIS record number | |
212 | 0xd4 | 4 | Unknown (FLIS record count?) | Use 0x00000001. |
216 | 0xd8 | 8 | Unknown | Use 0x0000000000000000. |
224 | 0xe0 | 4 | Unknown | Use 0xFFFFFFFF. |
228 | 0xe4 | 4 | First Compilation data section count | Use 0x00000000. |
232 | 0xe8 | 4 | Number of Compilation data sections | Use 0xFFFFFFFF. |
236 | 0xec | 4 | Unknown | Use 0xFFFFFFFF. |
240 | 0xf0 | 4 | Extra Record Data Flags | A set of binary flags, some of which indicate extra data at the end of each text block. This only seems to be valid for Mobipocket format version 5 and 6 (and higher?), when the header length is 228 (0xE4) or 232 (0xE8).
Setting bit 2 (0x2) disables <guide><reference type="start"> functionality. |
244 | 0xf4 | 4 | INDX Record Offset | (If not 0xFFFFFFFF)The record number of the first INDX record created from an ncx file. |
248 | 0xf8 | 4 | Unknown | 0xFFFFFFFF In new MOBI file, the MOBI header length is 256, skip this to EXTH header. |
252 | 0xfb | 4 | Unknown | 0xFFFFFFFF In new MOBI file, the MOBI header length is 256, skip this to EXTH header. |
256 | 0x100 | 4 | Unknown | 0xFFFFFFFF In new MOBI file, the MOBI header length is 256, skip this to EXTH header. |
260 | 0x104 | 4 | Unknown | 0xFFFFFFFF In new MOBI file, the MOBI header length is 256, skip this to EXTH header. |
264 | 0x108 | 4 | Unknown | 0xFFFFFFFF In new MOBI file, the MOBI header length is 256, skip this to EXTH header. |
268 | 0x10b | 4 | Unknown | 0 In new MOBI file, the MOBI header length is 256, skip this to EXTH header, MOBI Header length 256, and add 12 bytes from PalmDOC Header so this index is 268. |
[edit] EXTH Header
If the MOBI header indicates that there's an EXTH header, it follows immediately after the MOBI header. Since the MOBI header is of variable length, this isn't at any fixed offset in record 0. Note that some readers will ignore any EXTH header info if the mobipocket version number specified in the MOBI header is 2 or less (perhaps 3 or less).
The EXTH header is also undocumented, so some of this is guesswork.
bytes | content | comments |
---|---|---|
4 | identifier | the characters E X T H |
4 | header length | the length of the EXTH header, including the previous 4 bytes - but not including the final padding. |
4 | record Count | The number of records in the EXTH header. the rest of the EXTH header consists of repeated EXTH records to the end of the EXTH length. |
EXTH record start | Repeat until done. | |
4 | record type | Exth Record type. Just a number identifying what's stored in the record |
4 | record length | length of EXTH record = L , including the 8 bytes in the type and length fields |
L-8 | record data | Data. |
EXTH record end | Repeat until done. | |
p | padding | Null bytes to pad the EXTH header to a multiple of four bytes (none if the header is already a multiple of four). This padding is not included in the EXTH header length. |
There are lots of different EXTH Records types. Ones found so far in Mobipocket files are listed here, with possible meanings. Hopefully the table will be filled in as more information comes to light.
record type | usual length | name | comments | opf meta tag |
---|---|---|---|---|
1 | drm_server_id | |||
2 | drm_commerce_id | |||
3 | drm_ebookbase_book_id | |||
100 | author | <dc:Creator> | ||
101 | publisher | <dc:Publisher> | ||
102 | imprint | <Imprint> | ||
103 | description | <dc:Description> | ||
104 | isbn | <dc:Identifier scheme='ISBN'> | ||
105 | subject | Could appear multiple times | <dc:Subject> | |
106 | publishingdate | <dc:Date> | ||
107 | review | <Review> | ||
108 | contributor | <dc:Contributor> | ||
109 | rights | <dc:Rights> | ||
110 | subjectcode | <dc:Subject BASICCode="subjectcode"> | ||
111 | type | <dc:Type> | ||
112 | source | <dc:Source> | ||
113 | asin | Kindle Paperwhite labels books with "Personal" if they don't have this record. | ||
114 | versionnumber | |||
115 | 4 | sample | 0x0001 if the book content is only a sample of the full book | |
116 | startreading | Position (4-byte offset) in file at which to open when first opened | ||
117 | 3 | adult | Mobipocket Creator adds this if Adult only is checked on its GUI; contents: "yes" | <Adult> |
118 | retail price | As text, e.g. "4.99" | <SRP> | |
119 | retail price currency | As text, e.g. "USD" | <SRP Currency="currency"> | |
121 | 4 | KF8 BOUNDARY Offset | ||
122 | fixed-layout | "true" | ||
123 | book-type | "comic" | ||
124 | orientation-lock | "none", "portrait", "landscape" | ||
125 | 4 | count of resources | ||
126 | original-resolution | "1072x1448" | ||
127 | zero-gutter | "true" | ||
128 | zero-margin | "true" | ||
129 | Metadata Resource URI | |||
131 | 4 | Unknown | ||
132 | Unknown | "true" | ||
200 | 3 | Dictionary short name | As text | <DictionaryVeryShortName> |
201 | 4 | coveroffset | Add to first image field in Mobi Header to find PDB record containing the cover image | <EmbeddedCover> |
202 | 4 | thumboffset | Add to first image field in Mobi Header to find PDB record containing the thumbnail cover image | |
203 | hasfakecover | |||
204 | 4 | Creator Software | Known Values: 1=mobigen, 2=Mobipocket Creator, 200=kindlegen (Windows), 201=kindlegen (Linux), 202=kindlegen (Mac). Warning: Calibre creates fake creator entries, pretending to be a Linux kindlegen 1.2 (201, 1, 2, 33307) for normal ebooks and a non-public Linux kindlegen 2.0 (201, 2, 0, 101) for periodicals. |
|
205 | 4 | Creator Major Version | ||
206 | 4 | Creator Minor Version | ||
207 | 4 | Creator Build Number | ||
208 | watermark | |||
209 | tamper proof keys | Used by the Kindle (and Android app) for generating book-specific PIDs. | ||
300 | fontsignature | |||
401 | 1 | clippinglimit | Integer percentage of the text allowed to be clipped. Usually 10. | |
402 | publisherlimit | |||
403 | Unknown | |||
404 | 1 | ttsflag | 1 - Text to Speech disabled; 0 - Text to Speech enabled | |
405 | 1 | Unknown (Rent/Borrow flag?) | 1 in this field seems to indicate a rental book | |
406 | 8 | Rent/Borrow Expiration Date | If this field is removed from a rental, the book says it expired in 1969 | |
407 | 8 | Unknown | ||
450 | 4 | Unknown | ||
451 | 4 | Unknown | ||
452 | 4 | Unknown | ||
453 | 4 | Unknown | ||
501 | 4 | cdetype | PDOC - Personal Doc; EBOK - ebook; EBSP - ebook sample; | |
502 | lastupdatetime | |||
503 | updatedtitle | |||
504 | asin | I found a copy of ASIN in this record. | ||
524 | language | <dc:language> | ||
525 | writingmode | I found horizontal-lr in this record. | ||
535 | Creator Build Number | I found 1019-d6e4792 in this record, which is a build number of Kindlegen 2.7 | ||
536 | Unknown | |||
542 | 4 | Unknown | Some Unix timestamp. | |
547 | InMemory | String 'I\x00n\x00M\x00e\x00m\x00o\x00r\x00y\x00' found in this record, for KindleGen V2.9 build 1029-0897292 |
[edit] Remainder of Record 0
At the end of Record 0 of the PDB file format, we usually get the full file name, the offset of which is given in the MOBI header.
There might be data of unknown use between the end of the EXTH records and the name.
The name is followed by two null bytes, and then padded with null bytes to a four-byte boundary. For example, if the name is 16 bytes long, with two null bytes, that makes 18 bytes, and it then gets another two null bytes added to make it up to 20 bytes in total. However, the length stored in the header is only 16. If the name was 19 bytes, it would be followed by two null bytes to make it up to 21 bytes, and then padded with three more null bytes to make it up to 24 bytes.
The name and padding is followed by more data of unknown use, usually null bytes, to the end of section 0.
[edit] Index meta record
The first record of an index contains the meta data of the index.
offset | hex | bytes | content | comments |
---|---|---|---|---|
0 | 0x00 | 4 | Identifier | the characters I N D X |
4 | 0x04 | 4 | header length | the length of the INDX header, including the previous 4 bytes |
8 | 0x08 | 4 | index type | the type of the index. Known values: 0 - normal index, 2 - inflections |
12 | 0x0c | 4 | ? | ? |
16 | 0x10 | 4 | ? | ? |
20 | 0x14 | 4 | idxt start | the offset to the IDXT section |
24 | 0x18 | 4 | index count | the number of index records |
28 | 0x1c | 4 | index encoding | 1252 = CP1252 (WinLatin1); 65001 = UTF-8 |
32 | 0x20 | 4 | index language | the language code of the index |
36 | 0x24 | 4 | total index count | the number of index entries |
40 | 0x28 | 4 | ordt start | the offset to the ORDT section |
44 | 0x2c | 4 | ligt start | the offset to the LIGT section |
48 | 0x30 | 4 | ? | ? |
52 | 0x34 | 4 | ? | ? |
The remaining INDX header values are unknown.
[edit] TAGX section
The TAGX section follows the INDX header and is essential for decoding the index values, as it defines which how many control bytes an entry contains, which bits correspond to which tag and how many values a tag requires (most tag need one value, but some have two, maybe more).
offset | hex | bytes | content | comments |
---|---|---|---|---|
0 | 0x00 | 4 | Identifier | the characters T A G X |
4 | 0x04 | 4 | header length | the length of the TAGX header, including the previous 4 bytes |
8 | 0x08 | 4 | control byte count | the number of control bytes |
12 | 0x0c | n | tag table | the tag table entries (n = header length - 12, must be multiple of 4 bytes) |
The tag table entries are multiple of 4 bytes. The first byte is the tag, the second byte the number of values, the third byte the bit mask and the fourth byte indicates the end of the control byte. If the fourth byte is 0x01, all other bytes of the entry are zero.
[edit] Variable-width integers
Some parts of the Mobipocket format encode data as variable-width integers. These integers are represented big-endian with 7 bits per byte in bits 1-7. They may be either forward-encoded, in which case only the LSB has bit 8 set, or backward-encoded, in which case only the MSB has bit 8 set. For example, the number 0x11111 would be represented forward-encoded as:
0x04 0x22 0x91
And backward-encoded as:
0x84 0x22 0x11
[edit] Trailing entries
The Extra Data Flags field of the MOBI header indicates which, if any, trailing entries are appended to the end of each text record. Each set bit in the field indicates a trailing entry. The entries appear to occur in bit-order; e.g., trailing entry 1 immediately follows the text content and entry 16 occurs at the very end of the record. The effect and exact details of most of these entries is unknown. The trailing entries indicated by bits 2-16 appear to follow a common format. That format is:
<data><size>
Where <size> is the size of the entire trailing entry (including the size of <size>) as a backward-encoded Mobipocket variable-width integer.
Only a few bits have been identified
bit | Data at end of records |
---|---|
0x0001 | Multi-byte character overlaps |
0x0002 | Some data to help with indexing |
0x0004 | Some data about uncrossable breaks |
[edit] Multibyte character overlap
When bit 1 of the Extra Data Flags field is set, each record is followed by a trailing entry containing any extra bytes necessary to complete a multibyte character which crosses the record boundary. The bytes do not participate in compression regardless which compression scheme is used for the file. However, unlike the trailing data bytes, the multibytes (including the count byte) do get included in any encryption. The overlapping bytes then re-appear as normal content at the beginning of the following record. The trailing entry ends with a byte containing a count of the overlapping bytes plus additional flags.
offset | bytes | content | comments |
---|---|---|---|
0 | 0-3 | N terminal bytes of a multibyte character | |
N | 1 | Size & flags | bits 1-2 encode N, use of bits 3-8 is unknown |
[edit] PalmDOC Compression
PalmDOC uses LZ77 compression techniques, an implementation for PalmDOC can be found at Github . DOC files can contain only compressed text. The format does not allow for any text formatting. This keeps files small, in keeping with the Palm philosophy. However, extensions to the format can use tags, such as HTML or PML, to include formatting within text. These extensions to PalmDoc are not interchangeable and are the basis for most eBook Reader formats on Palm devices.
LZ77 algorithms achieve compression by replacing portions of the data with references to matching data that has already passed through both encoder and decoder. A match is encoded by a pair of numbers called a length-distance pair, which is equivalent to the statement "each of the next length characters is equal to the character exactly distance characters behind it in the uncompressed stream." (The "distance" is sometimes called the "offset" instead.)
In the PalmDoc format, a length-distance pair is always encoded by a two-byte sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding the distance, 3 go to encoding the length, and the remaining two are used to make sure the decoder can identify the first byte as the beginning of such a two-byte sequence. The exact algorithm needed to decode the compressed text can be found on the PalmDOC page.
PalmDOC data is always divided into 4096 byte blocks (uncompressed size) and the blocks are acted upon independently; no information from previous or later blocks is needed when a block is being compressed or decompressed.
PalmDOC does have support for bookmarks. These pointers are named and refer to an offset location in a file. If the file is edited these locations may no longer refer to the correct locations. Some reading programs allow the user to enter or edit these bookmarks while others treat them as a TOC. Some reading programs may ignore them entirely. They are stored at the end of the file itself so the full file needs to be scanned when loaded to find them.
[edit] Image Records
If the file contains images, they follow the text blocks, with each image using a single block. The 4096-byte record size in the PalmDoc header applies only to text records; image records may be larger.
[edit] Magic Records
In some cases, MobiPocket Creator adds a 2-zero-byte record after the text records in a file. This record is not included in the "record count" of text records in the PalmDoc header, and is also not used as the "first non-book index" in the MOBI header. (If the 2-zero-byte record is present, the index of the following block is used as the "first non-book index".)
MobiPocket Creator also ends files with three records: 'FLIS', 'FCIS', and 'end-of-file', in that order. The 'FLIS' and 'FCIS' records do not seem to be necessary for MobiPocket Reader or the Amazon Kindle 2 to read the file. The 'end-of-file' record might be necessary.
[edit] FLIS Record
The FLIS record appears to have a fixed value. The meaning of the values is not known.
offset | bytes | content | comments |
---|---|---|---|
0 | 4 | identifier | the characters F L I S (0x46 0x4c 0x49 0x53) |
4 | 4 | ? | fixed value: 8 |
8 | 2 | ? | fixed value: 65 |
10 | 2 | ? | fixed value: 0 |
12 | 4 | ? | fixed value: 0 |
16 | 4 | ? | fixed value: -1 (0xFFFFFFFF) |
20 | 2 | ? | fixed value: 1 |
22 | 2 | ? | fixed value: 3 |
24 | 4 | ? | fixed value: 3 |
28 | 4 | ? | fixed value: 1 |
32 | 4 | ? | fixed value: -1 (0xFFFFFFFF) |
[edit] FCIS Record
The FCIS record appears to have mostly fixed values.
offset | bytes | content | comments |
---|---|---|---|
0 | 4 | identifier | the characters F C I S (0x46 0x43 0x49 0x53) |
4 | 4 | ? | fixed value: 20 |
8 | 4 | ? | fixed value: 16 |
12 | 4 | ? | fixed value: 1 |
16 | 4 | ? | fixed value: 0 |
20 | 4 | ? | text length (the same value as "text length" in the PalmDoc header) |
24 | 4 | ? | fixed value: 0 |
28 | 4 | ? | fixed value: 32 |
32 | 4 | ? | fixed value: 8 |
36 | 2 | ? | fixed value: 1 |
38 | 2 | ? | fixed value: 1 |
40 | 4 | ? | fixed value: 0 |
[edit] End-of-file Record
The end-of-file record is a fixed 4-byte record. While the last two bytes appear to be a CRLF marker, the meaning of the first two bytes is unknown.
offset | bytes | content | comments |
---|---|---|---|
0 | 1 | ? | fixed value: 233 (0xe9) |
1 | 1 | ? | fixed value: 142 (0x8e) |
2 | 1 | ? | fixed value: 13 (0x0d) |
3 | 1 | ? | fixed value: 10 (0x0a) |
[edit] Compilation Records
KindleGen creates records of the compilation source (KindleGen 1.2-2.5) and the compilation source and compiler output (Kindle Gen 2.7-) just before the #End-of-file Record (KindleGen version 1.2-2.2), or just before the BOUNDARY record (KindleGen version 2.3-).
MOBI files created with Mobipocket creator, Amazon's Personal Document Service, or Kindle Direct Publishing (former Amazon DTP) don't include SRCS record. In a past, kindlegen had an undocumented option to suppress this record, but the option was removed in 2010.
A SRCS record is a record whose content is a zip archive of all source files (i.e., .opf, .ncx, .htm, .jpg, ...) given to the command and puts it in the generated MOBI file. The record begins with the "SRCS" signature and looks as follows:
offset | bytes | content | comments |
---|---|---|---|
0 | 4 | identifier | "SRCS" (0x53 0x52 0x43 0x53) |
4 | 4 | ? | fixed value(?): 0x00000010 |
8 | 4 | ? | fixed value(?): 0x0000002f |
12 | 4 | ? | fixed value(?): 0x00000001 |
16 | zip | The zip archive continues to the end of this record |
A CMET record is a record whose content is the output of the compilation operation, and perhaps extra info. The record begins with the "CMET" signature and looks as follows:
offset | bytes | content | comments |
---|---|---|---|
0 | 4 | identifier | "CMET" (0x43 0x4D 0x45 0x54) |
4 | 4 | ? | fixed value(?): 0x0000000C |
8 | 4 | text length | (big endian) |
12 | variable | text | compilation output text, line endings are CRLF |
variable | variable | ? | unknown data to the end of the record |
[edit] Media Records (AUDI/VIDE)
kindlegen supports embedded audio and video for some Kindle platforms. Each media file is stored in a separate AUDI (audio) or VIDE (video) record.
A media record looks as follows:
offset | bytes | content | comments |
---|---|---|---|
0 | 4 | identifier | "AUDI" (0x41 0x55 0x44 0x49) or "VIDE" (0x56 0x49 0x44 0x45) |
4 | 4 | ? | unkown value |
8 | 4 | ? | unknown value |
12 | media | The media data continues to the end of this record |
[edit] MBP
This is the extension used on a side file (auxiliary) for MOBI formatted eBooks. It is used to store metadata used by the library software and also to store user entered data like bookmarks, annotations, last read position. This file is created automatically by the reader program when the eBook is first opened and has a .mbp extension. The Library management software in MobiPocket uses this file to get information displayed in the library window such as title, author, and description so that it won't have to open the larger eBook file.
There is an ongoing effort to describe the binary MBP file format (see this site). There is also a mbp reader program that will extract notes from a mbp file
[edit] eBook Creation
There are several ways to create eBooks in the MOBI format. The rules for the format of the source files need to create eBooks in MOBI are spelled out in documents on the MobiPocket web site. The recommended tool called MobiPocket Creator is available as a download from the web site.
EBooks can also be converted from other forms using the Windows version of the MobiPocket Reader. Once converted the file can be used on any device supported by MobiPocket Reader.
[edit] Guidelines
In order to better support the features of the MobiPocket Reader there are some guidelines that need to be followed when creating a book in this format.
- Do not specify a default font family, font size or other font attributes such as weight or color. This is a choice the person reading the eBook should be able to make. Fonts Sizes and Attributes can be specified for special headings and other specific items. Use only generic font families.
- Do not impose justification for standard text. It may be needed for captions and other special text.
- Do not use tables for anything except table data. Nested tables are not supported.
- Do not use blank lines to try and force page changes. Use the <mbp:pagebreak/> tag.
- Do not use multiple books for different devices. Instead use advanced features such as multi resolution images and platform specific frames.
[edit] Adapting images to various PDA screen resolutions
Note that the following section only applies to the original mobi format and is not used by Amazon AZW files.
The IMG tag in Mobipocket publications supports up to three source attributes for various resolutions: src, losrc and hisrc. This makes it possible to optimize the same ebook for various devices. The image to be displayed is dynamically selected by the Reader according to the resolution of the screen on the actual device:
losrc | <= 239 pixels | Low rez 160x160 Palm devices (PalmVx, Treo 600, Zire) Smartphones (Nokia 3650, Sony Ericson P800/900, Microsoft smartphones) |
src | >= 240 pixels (handhelds) | Pocket PC, Hi rez Palm devices (Sony Clie, Tungsten, Zire 71) |
hisrc | >= 480 pixels | any desktop or tablet PC |
Example:
<img hisrc="cover480x640.gif" src="cover220x300.gif" losrc="cover140x140.gif"/>
Please also notice that there is a 63KB internal limitation for images (this is a restriction of the Mobipocket .PRC format). GIFs have to be smaller than 63KB. You can use GIF optimization programs such as Ulead Smart Saver to get GIFs smaller than 63KB. (If images are bigger than 63KB, they are automatically resized to fit in the limit by MobiGEN but you might not like the result). Jpeg images will use a lower Quality setting to get the image size down without reducing the pixel size.
[edit] HTML and CSS Tips for MOBI creation
- Kindle and Kindle DX do not handle the soft hyphen HTML entity correctly. Use the <shy/> tag instead.
- Grey text is displayed as white on some devices. To avoid this problem, add:
- @media amzn-mobi and (monochrome) {
- .mygreytextclass {
- color: black;
- }
- .mygreytextclass {
- }
- @media amzn-mobi and (monochrome) {
- The kindlegen tool ignores padding-left. If necessary, you can work around this by adding an element inside that element and setting its left margin.
- The kindlegen tool's CSS parser is sometimes buggy. As a result, if you have:
- div.foo p {
- ...
- }
- div.foo p {
- the kindlegen tool often incorrectly interprets it as:
- div.foo, div.foo p {
- ...
- }
- div.foo, div.foo p {
- In situations where you want to apply the style to only the inner tag, you must add a custom class to the paragraphs inside the outer tag and use that CSS selector by itself.
For additional tips specific to developing content for KF8-capable readers, see the KF8 CSS Tips.
[edit] Format limitations
There are many limitations in the MOBI format. A few are listed here.
- Blocks of text can never have a greater than normal margin on their right side.
- Left margins can only be specified in 1em increments. Text can only have a hanging indent if it has no left margin. More recent kindle renderers have increased the left margin increment to roughly 2em increments.
- Text cannot flow around images taller than one line of text.
- Image sizes cannot be scaled with font size.
- In some -- but not all -- Mobipocket renderers, text with a left margin changes that margin value per line based upon the font-size at which point the preceding line-break occurred.
- Many measures, such as the indent of a hanging indent, cannot be specified in ems.
- Individual items of text cannot be displayed in a monospace font.
- Tables display wildly differently on different Mobipocket renderers, especially tables which cross more than one screen.
- Nested tables are not supported at all.
- In addition you only get the full range of Mobipocket's formatting capabilities if you have markup written to use Mobipocket's non-standard, extended, and under-documented implementation of HTML 3.2. See: File tag reference on the mobipocket web site.
[edit] MOBI DRM
Mobi DRM can optionally be applied to this file format. There is the standard scheme supported by Mobipocket and Overdrive servers. This is based on an ID derived from the reading device or program. This PID must be known to the server when an eBook is purchased and will be embedded in the file and locked to the device. The licensing scheme does permit multiple devices (usually up to 4) to be supported. In this case the server needs to know device id of all the devices. If you add a device you must tell the server and redownload the eBook to be able to read it on the new device. Normally there is no charge to add a device or for redownloading the eBook. If the dealer goes out of business you may not be able to add a device since there would be no way to redownload the file.
A second, simpler scheme, only requires knowledge of the account login name and password used to purchase the eBook. Once this data is entered the eBook can be read. Entering this data is only required once per device. This is a new scheme and some readers may not have support for this method.
A third method used on some ebooks is to use a generic MOBI key. It has encryption but only using the generic MOBI key (not a PID-specific key). This means that can be read by any MobiPocket Reader software, on any device, but not by any non-MobiPocket software.
The DRM applies only to the eBook itself and not to the metadata. A library routine can read the metadata without having to unlock the eBook. Some programs have been devised to even be able to change this information without touching the DRM portion of the file.
[edit] MOBI eBook Readers and converters
In addition to the MobiPocket supplied Readers there are also 3rd party readers and converters. This include:
- Calibre
- Stanza
- FBReader
- Book Designer
- BookMedia
- STDU Viewer
- Sumatra PDF
- MBP_reader (program that can extract MBP notes to text files).
- Kindle for PC or Mac
- EPUB to Kindle converter
- PDF/ePUB to Kindle Tool
- Kindle Book Development Tool
- KindleUnpack - previously called MobiUnpack - KindleUnpack will explode a mobi file into its original form. Also called a mobi decoder.
- PDF to ePUB/Mobi Converter
- KindleGen - Official Amazon tool to convert ePub to Mobi (AZW) or otherwise generate Mobi format.
[edit] MOBI eBook Hardware Readers
- Bookeen Cybook Gen3
- Bookeen Cybook Opus
- Hanlin V3 / Bebook / EZ Reader
- iRex iLiad
- iRex Digital Reader
- Amazon Kindle Readers
- Onyx BOOX readers
Not all eBook readers that support Mobi format have the same features. Check Mobi Comparison for details on what is actually supported.
[edit] Create a MOBI file from an ePub file
Here is one method to create a mobi file from an ePub file.
- Make sure you only use headers h1-h2-h3 for the TOC entries you want (reason for this later, or use 10, below);
- Make your entire ePub in Sigil, importing your HTML files as you go, using the "add existing item" option.
- Finish up your ePUB, don't add the Cover page.
- Download MobiPocket Creator and install.
- Unzip your ePUB.
- Double-click the OPF.
- The book will open in front of you in MobiPocket Creator.
- Drag-and-drop your cover into MBP Creator.
- Use MBP Creator to make a html.TOC with headers 1-2-3 only, or,
- Alternatively: Point MPCreator to an existing html.toc by editing the Guide Properties section. (n.b.--the toc.ncx will already be in the appropriate folder inside the "My Publications" folder of your MBPCreator dir)
- Click "Build."
- You have a fully-functional PRC file.
[edit] For more information
- Mobipocket Creator - free download, also see MobiPocket Creator
- Mobipocket Development center - creation documentation
- content generation - see paragraph formatting for CSS like features although there is no CSS in MOBI.
- Amazon KindleGen - an upgrade to mobiGen but works fine for mobi books.
- KindleGen - our wiki page on using KindleGen.
- MobileRead forum - Mobi unpack, take apart a mobi file.
- Java Mobi Metadata Editor - edit, add, and remove EXTH tags in mobi files.