KindleUnpack

From MobileRead
Jump to: navigation, search

KindleUnpack is based on the reverse-engineered MOBI database described in the article on the MOBI format.

Contents

[edit] Overview

KindleUnpack (originally MobiUnpack but also known as Mobi Decode) is a Python script that extracts MOBI or AZW (Amazon Kindle) source files from the final compiled database format. The filenames used for the source files are not necessarily the same as those that were originally used to create the database, as this information is not preserved in the database, but an unpacked set of files should be able to be used to recreate the same database using the standard MOBI or Kindle generating tools.

For KF8 files and combined Mobipocket and KF8 files built by KindleGen, it can also produce separated Mobipocket and KF8 files, as well as the original source files if those have been included in the eBook. Additionally, for KF8 files it can produce an 'ePub', though if the HTML isn't compliant with ePub standards, the 'ePub' won't be either.

For Amazon's .AZW4 files, it will extract the PDF that's been wrapped up in Amazon's .AZW4 file format.

A Calibre plugin version of the scripts is available in the official thread.

Note: The KindleUnpack program requires Python 2.7.X or Python 3.4 or later to function properly.

[edit] Big picture

MOBI-formatted eBooks were designed to be compatible with the Palm Database format (PDB). This basic structure has been retained up to the present day format as used in the Amazon Kindle, though there have been changes to some of the data and metadata records, as well as a modified version of the original MOBI DRM. MOBI itself went through some changes over time. Originally the compression mode used was the standard Palm DOC compression, but a second method based on HUFF/CDIC was developed to provide a higher level of compression.

Amazon uses the basic MOBI structure for AZW, TPZ, AZW4, and the newest KF8 format. There may be multiple formats inside the same file, and Amazon compilers may additionally add the original source files to the database. This can cause an eBook to be quite large. Older software is designed to read the first database entry and ignore any subsequent entries, though newer software can recognize and decode the best format available in the archive.

[edit] The program

The KindleUnpack scripts may undergo many revisions as more information is learned about the internal format. This is all done through reverse engineering as neither Amazon nor Mobipocket have published details on the internal format. A user expecting to run KindleUnpack will need a copy of Python 2.x or Python 3.4 or later loaded on their machine. It can then be used as a command-line driven program or a GUI application.

Note that the filenames used inside ePub or MOBI source files are not preserved during the compilation of KF8 or MOBI eBooks, thus KindleUnpack must generate new names for these files. The file structure may also vary from the original source, as it too is not retained by the MOBI format. It is possible that some data from the CSS and OPF files is also lost, however, the unpacked collection of files will still be able to be regenerated into a well-formed ePub or recompiled into a KF8 file.

AZW4 files are a similar treatment for PDF data which can also be extracted into a PDF file.

Note: This program does not work on DRM'd files.

[edit] Using the program

If you are on Windows and would prefer to use a GUI interface and not the command line, you need to fully install the free community edition of ActiveState ActivePython 2.7.x or Python 3.4.

  1. Download the latest KindleUnpack from the thread
  2. Unzip it (right-click and "Extract All" in Windows)
  3. Inside the newly extracted KindleUnpack folder, double-click KindleUnpack.pyw
  4. In the window that pops up:
  • Hit the first Browse... button and select your input MOBI eBook
  • Hit the second Browse... button and select a destination folder for the unpacked files
  • If you want to split combination MOBIs, examine the raw markup language, or turn on verbose debugging, and check the appropriate boxes
  • Hit the "Start" button – the unpacking will start and progress messages and any errors will be indicated in the scrollable Log window. If you run into problems, this Log output may be useful in finding and fixing the issue.

Then look in your destination folder for a mobi7 folder and inside of that you can find the HTML file, images directory, toc.ncx, and content.opf that were processed and stored inside your MOBI file. You can edit the HTML any way you like and then use KindleGen on the content.opf file to recreate your modified MOBI eBook.

This script can also be used as a Calibre plugin.

[edit] Changes

  • Version 45 of the program works with both older MOBI and newer KF8 MOBI formats, and includes a graphical user interface.
  • Version 47 adds support for obfuscated fonts and OpenType fonts, plus a number of other bug fixes.
  • Version 75 adds more dictionary support.
  • Version 81 adds Python 3 support.

[edit] For more information

Personal tools
Namespaces

Variants
Actions
Navigation
MobileRead Networks
Toolbox