From MobileRead
(Redirected from Mobi unpack)
Jump to: navigation, search

Kindle_unpack is based on the reverse engineered mobi database described in the article on the MOBI format.


[edit] Overview

Kindle unpack (originally called Mobi unpack), also known as Mobi decode, is a python script that creates MOBI or AZW (Amazon Kindle) source files from the compiled database. The filenames used in the source file are not necessarily the same as those that were originally used to create the database as this information is not preserved in the database but an unpacked set of files should be able to be used to recreate the same database using standard mobi or Kindle generating tools.

For KF8 files and combined Mobipocket and KF8 files built by KindleGen, it also can produce separated mobipocket and KF8 files, and also the original source files if those are included in the eBook. In addition, for KF8 files it can produce an 'ePub', although if the HTML isn't compliant with ePub standards, the 'ePub' won't be either.

For Amazon's .azw4 files, it will extract the PDF that's been wrapped up in Amazon's .azw4 file format.

A Calibre plugin version of the scripts is available in this thread.

Note: This script only works with Python version 2.X

[edit] Big Picture

A Mobi formatted eBook was originally designed to work as a Palm DataBase file (PDB). This basic compilation structure has been retained up to the present day files used in the Amazon Kindle. There have been changes to some of the metadata and data structures and a method of DRM is modified from the MOBI version. Mobi itself has gone through some changes itself over time. Originally the compression mode used was the standard Palm compression but a second method based on Huff-cdic was developed to provide a higher level of compression.

Amazon uses the Mobi basic structure for AZW, TPZ, AZW4, and the newest KF8 format. There may be multiple formats inside the same file and in addition Amazon compilers will add the original source file, usually an ePub, to the database. This can cause an eBook to be quite large. Older software is designed to read the first database entry and ignore any subsequent entries. Newer software can recognize and decode the best format available in the archive.

[edit] The program

The Mobi_unpack is undergoing lots of revisions as more information is learned about the internal format. This is all reverse engineering as neither Amazon nor Mobipocket publishes the internal format. A user expecting to run Mobi_unpack will need a copy of Python loaded on their machine. It should be a pre-3.0 version. This can be used as a command line driven program although a GUI can also be used.

Note that the filenames are contained inside an ePub file or MOBI source files are not retained in the compilation of KF8 or MOBI thus mobi_unpack must generate new names for these file. Structure will also vary from the original ePub as it, too, was not retained in the KF8. It is possible that some other data from CSS and OPF were also lost however, the unpacked collection of files will be able to be regenerated into a well formed ePub or recompiled into a KF8 file.

AZW4 files are a similar treatment for PDF data which can also be extracted into a PDF file.

Note: This program does not work on DRM'd files.

[edit] Using the program

If you are on Windows and would prefer to use a GUI interface and not the command line, you need to fully install the free community edition of ActiveState ActivePython 2.7.X.

  1. Download the attached
  2. Unzip it (right-click and "Extract All" in Windows)
  3. Inside the newly extracted Mobi_Unpack_v0.39 folder double-click Mobi_Unpack.pyw
  4. In the window that pops up:
  • Hit the first Browse... button and select your input mobi ebook file
  • Hit the second Browse... button and select a destination folder for the unpacked files
  • If you want to split combination mobis, examine the raw markup language, or turn on verbose debugging check the appropriate boxes
  • Hit the "Start" button - The unpacking will start and progress messages and any errors will be indicated in the scrollable Log window. If you run into problems, this Log output may be useful in finding and fixing the issue.

Then look in your destination folder for a mobi7 folder and inside of that you can find the html file, images directory, toc.ncx, content.opf that were processed and stored inside your mobi. You can edit the html any way you like and then use KindleGen on the content.opf file to recreate your modfied mobi ebook.

There is a plugin for Calibre for this program.

[edit] Changes

  • Version 45 of the program which works with both older mobi and newer KF8 mobi formats. It includes a Graphical User Interface frontend.
  • Version 47 adds support for obfuscated fonts and opentype fonts plus a number of other bug fixes.
  • Version 75 adds more dictionary support

[edit] For more information

Personal tools

MobileRead Networks