APNX

From MobileRead
Jump to: navigation, search

APNX is a new file format for Amazon Kindle 3 and is used to store the page numbers as related to a paper version of the book. This can allow referencing page numbers in an academic document.

Contents

[edit] Overview

APNX provides a method to achieve an exact match between the eBook pages and a specific edition of the paper book. It is also possible to generate approximate matches automatically in some cases using a program. Once the APNX file is generated there is no way to know if it is an exact match or just an approximate one.

[edit] The format

The file format has been decoded by MobileRead user:User_none The file begins with a header similar to below.

{"contentGuid":"8d3d16e0","asin":"B002RHGYOA","cdeType":"EBOK","fileRevisionId":"1296868639127"} ) = {"asin":"1906694184","pageMap":"(4,a,1)"}

Following this header is a list of 4 byte sequences of big endian ints. They are in an increasing order. The total number of the 4 byte sequences is 573. The first 3 ints are all 0 which leads me to believe they are padding. The total number of pages within the book are 570 (as shown on the Kindle itself). The apnx file is a list of file locations where each is the beginning of a new page.


A more in-depth & up to date look at user-none's work on the APNX format is available in Calibre's docs.

[edit] Kindle publishing

KindleGen version 1.2 does not generate an APNX file directly; it creates a PAGE section in the MobiPocket file which is then stripped and converted to an APNX file by Amazon's publishing service. The KindleGen input can use either a NCX pageList or page-map xml.

Kindle Previewer, as of version 1.5, does not display page numbers. In addition, Kindle for PC and Kindle for Mac, while they do display page numbers using .apnx files, are unable to read them from a MobiPocket file generated by KindleGen. This makes testing the page number feature somewhat problematic for publishers.

[edit] An example

Actual epub page-map.xml

<page-map xmlns="http://www.idpf.org/2007/opf">
<page name="i"  href="chapter_01.html#page_i"/>
<page name="ii"  href="chapter_01.html#page_ii"/>
<page name="1"  href="chapter_01.html#page_1"/>
<page name="2"  href="chapter_01.html#page_2"/>
<page name="3"  href="chapter_01.html#page_3"/>
<page name="4"  href="chapter_01.html#page_4"/>
<page name="5"  href="chapter_01.html#page_5"/>
<page name="A-1"  href="chapter_01.html#page_A1"/>
<page name="A-2"  href="chapter_01.html#page_A2"/>
<page name="I-1" href="chapter_01.html#page_I1"/>
</page-map>

Kindlegen PAGE map info stored at the front of the SRCS section for both Mobi 7 and Mobi 8 parts. Below is the information from the Mobi 8 (KF8) PAGE information:

PAGE^@^@^@^H^@^A^@^A^@^@^@*^@^@^@^^{
  "fileRevisionId" : "1"
}
^@^A^@n^@
^@^P{
  "description" : "PageMap from source by kindlegen",
  "pageMap" : "(1,r,1),(3,a,1),(8,c,A-1|A-2|I-1)"
}
^C\236^F^L^H{
\257^L\304^Oi^Q\327^S]^V^_^X\201

Here is the Hex Representation of this Page info from the Mobi 8 part

87654321  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789abcdef                                                             
00000000: 5041 4745 0000 0008 0001 0001 0000 002a  PAGE...........*
00000010: 0000 001e 7b0a 2020 2022 6669 6c65 5265  ....{.   "fileRe
00000020: 7669 7369 6f6e 4964 2220 3a20 2231 220a  visionId" : "1".
00000030: 7d0a 0001 006e 000a 0010 7b0a 2020 2022  }....n....{.   "
00000040: 6465 7363 7269 7074 696f 6e22 203a 2022  description" : "
00000050: 5061 6765 4d61 7020 6672 6f6d 2073 6f75  PageMap from sou
00000060: 7263 6520 6279 206b 696e 646c 6567 656e  rce by kindlegen
00000070: 222c 0a20 2020 2270 6167 654d 6170 2220  ",.   "pageMap"
00000080: 3a20 2228 312c 722c 3129 2c28 332c 612c  : "(1,r,1),(3,a,
00000090: 3129 2c28 382c 632c 412d 317c 412d 327c  1),(8,c,A-1|A-2|
000000a0: 492d 3129 220a 7d0a 039e 060c 087b 0aaf  I-1)".}......{..
000000b0: 0cc4 0f69 11d7 135d 161f 1881            ...i...]....

[edit] Analysis

00000000 - 0000000f  Section header PAGE
00000010 - 00000011  0
00000012 - 00000013  30: Length of rev string in bytes (Big Endian Half Word)
{
  "fileRevisionId" : "1"
}
00000032 - 00000033  1:  Always 1?
00000034 - 00000035  110: Length of PageMap in bytes (Big Endian Half Word)
00000036 - 00000037  10: Number of Page names (Big Endian Half Word)
00000038 - 00000039  16: Number of bits used in offsets to page href destination
         - typically this is 32 (0x20) but my example was small enough 
           Kindlegen used only 16 bit offset
0000003A - 000000A7  PageMap showing a tupple for each numbering scheme used in the document

Has the following format:

(entry_number, numbering_scheme, values)
where:
  • entry_number is which entry in page-map.xml (starting with 1)
  • numbering_scheme is c - character, r - roman, a - arabic
  • values is starting page number for "r" and "a" schemes otherwise it is a pipe-separated list "|" of page names
{
  "description" : "PageMap from source by kindlegen",
  "pageMap" : "(1,r,1),(3,a,1),(8,c,A-1|A-2|I-1)"
}

000000A8 - 000000BB Table of 16 bit offsets (see above for bit widths) into assembled text (Big Endian Half Words - 16 bits or Big Endian Words - 32bits)

0x039e - offset in bytes to page i anchor
0x060c - offset in bytes to page ii anchor
0x087b - offset in bytes to page 1 anchor
0x0aaf - offset in bytes to page 2 anchor
0x0cc4 - offset in bytes to page 3 anchor
0x0f69 - offset in bytes to page 4 anchor
0x11d7 - offset in bytes to page 5 anchor
0x135d - offset in bytes to page A-1 anchor
0x161f - offset in bytes to page A-2 anchor
0x1881 - offset in bytes to Page I-1 anchor

[edit] For more information

Personal tools
Namespaces

Variants
Actions
Navigation
MobileRead Networks
Toolbox