AZW4

From MobileRead
Jump to: navigation, search

AZW4 is an Amazon proprietary format for textbooks. It is a PDF format in a PDB wrapper, and usually (always?) with DRM. The wrapper provides the same DRM method as is used on the Kindle's Mobipocket format ebooks.

Contents

[edit] Overview

This is a new format for Amazon and is targeted specifically for textbooks. So far this is only supported on the Kindle for PC and Kindle for Mac applications. It is expected to be rolled out on other Kindle Reading Apps and on the Kindle DX at some point.

Amazon calls this format "Print Replica" which means it is full image of the printed page and does not reflow. Of course, this has always been a feature of PDF files in full page mode. They say: Kindle Print Replica books have most of the same features as PDF formatted books, including advanced zoom and pan functions. They also have unique Kindle features including annotations, highlights, and the ability to syncing your last page read across multiple Kindle applications.

Amazon identifies these eTextbooks with the term Print Replica in the title.

[edit] Description

For details of the PDB wrapper see the MOBI format description, including the MOBI Header and the EXTH records. But for AZW4, instead of the text records containing (optionally compressed) Mobipocket-style HTML, they contain one (or more?) (uncompressed?) PDF file(s), wrapped with some extra info. Only one sample has been analysed so far, so some of the following is guesswork.

offset hex bytes content comments
0 0x00 4 Identifier the characters %MOP
4 0x04 4 Table Count the number of tables that follow in the record, each table including one PDF file
8 0x08 4 SectionCount the number of sections in the first table
Section Index Repeat until done.
n n 4 SectionOffset The offset from the very start of this record to the start of this section
n+4 n+4 4 SectionLength The length of this section
Section Index End
m m x Data All the data - PDF and other info

The first section in each Table probably holds the PDF. The meaning of the contents of the other sections is currently unknown. It's not known whether there really can be more than one table, and if there can be, whether the section counts are all at the start of the record, followed by all the section indexes for all the tables, followed by the data for all the tables, or whether each table has section count/indexes/data kept together. It is possible to extract the PDF from the document.

[edit] History of Availability

The format was quietly made available by Amazon shortly after the release of Kindle for Mac/PC 1.7 in late August 2011.

The first mention on-line seems to be in a message at Mobileread on 24th August. A thread about the new format was started in the news section on 27th August 2011, with confirmation that internally the format contains a standard PDF file. The first tool able to extract the PDF from drm-free versions of the file format appeared at Mobileread on Thursday, 1st September 2011.

[edit] For more information

Personal tools
Namespaces

Variants
Actions
Navigation
MobileRead Networks
Toolbox