PalmDOC

From MobileRead

Revision as of 05:07, 21 April 2009 by 93.44.100.72 (Talk)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search

PalmDOC is often known in the Palm community as the DOC format, short for document, and is a term that can be confusing. DOC is used to identify Microsoft Word Documents because they use a .doc extension and it is also used to identify an eBook format originally used on PalmOS devices by Aportis and called variously DOC, Aportis DOC, and most recently PalmDoc or PalmDOC. Aportis is no longer in business but the format they introduced is a very popular format for eBooks.

Contents

[edit] PalmDOC

The default text file database format on Palm devices is the "DOC" file. It starts as a plain ASCII text file using line ends to mark the end of paragraph and is converted using a form of RLE compression to save space in RAM. Available DOC readers decompress the file on the fly as they display it. The compression method is unbalanced, it can be decompressed much faster than it can be compressed. Most doc files are created on desktops and synced to the target device, as compressing on the Palm is slow. RLE compression results in an approximate 40% reduction in file size.

PalmDOC files expect to reflow the document to the edges of the screen display. There is very little formatting available in PalmDOC files but they can have bookmarks. Bookmark support is often used to provide a semblance of a Table of Contents and the data is saved at the end of the file. Not all readers support this bookmark capability. For more information on Palm Database files and other Palm eBook formats see PDB.

[edit] History

In 1996, Rick Bram developed a method to compress a text files for the Palm OS. He called the format "Palm Doc". In 1997, Aportis Technologies Corporation bought the rights to PalmDoc, and renamed it AportisDoc. As of December 31, 2002, Aportis has ceased operations. The format is very popular and the original name is now the current name for the format. PalmDOC files will typically have a .pdb extension but will occasionally be found with a .prc extension.

[edit] Format

The format is that of a standard Palm Database Format file. The header of that format includes the name of the database (usually the book title and sometimes a portion of the authors name) which is up to 31 bytes of data. This string of characters is terminated with a 0 in the C style. The files are identified as Creator ID of REAd and a Type of TEXt.

The first record in the Palm Database Format gives more information about the PalmDOC file, and contains 16 bytes

bytescontentcomments
2Compression 1 == no compression, 2 = PalmDOC compression (see below)
2UnusedAlways zero
4text lengthUncompressed length of the entire text of the book
2record countNumber of PDB records used for the text of the book.
2record sizeMaximum size of each record containing text, always 4096
4Current PositionCurrent reading position, as an offset into the uncompressed text

PalmDOC uses LZ77 compression techniques. DOC files can contain only compressed text. The format does not allow for any text formatting. This keeps files small, in keeping with the Palm philosophy. However, extensions to the format can use tags, such as HTML or PML, to include formatting within text. These extensions to PalmDoc are not interchangeable and are the basis for most eBook Reader formats on Palm devices.

LZ77 algorithms achieve compression by replacing portions of the data with references to matching data that has already passed through both encoder and decoder. A match is encoded by a pair of numbers called a length-distance pair, which is equivalent to the statement "each of the next length characters is equal to the character exactly distance characters behind it in the uncompressed stream." (The "distance" is sometimes called the "offset" instead.)

In the PalmDoc format, a length-distance pair is always encoded by a two-byte sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding the distance, 3 go to encoding the length, and the remaining two are used to make sure the decoder can identify the first byte as the beginning of such a two-byte sequence.

PalmDOC data is always divided into 4096 byte blocks and the blocks are acted upon independently.

[edit] Bookmarks

PalmDOC does have support for bookmarks. These pointers are named and refer to an offset location in a file. If the file is edited these locations may no longer refer to the correct locations. Some reading programs allow the user to enter or edit these bookmarks while others treat them as a TOC. Some reading programs may ignore them entirely. They are stored at the end of the file itself so the full file needs to be scanned when loaded to find them. In a standard Palm platform the number of entries is limited to 15. This is due to the fact that the category feature is used as a drop down provide access the bookmark feature.

[edit] Reading PalmDOC

PalmDOC format can be read by a wide variety of programs. Almost every reader that is supported on Palm Devices can read PalmDOC files. In addition there are programs for most other OS's that can also read these files. There are thousands of files available in this format but generally not commercial files. The lack of its ability to support DRM and any kind of formatting generally relegates it to casual reading of Public Domain eBooks. See eBook Software for a list.

Personal tools
MobileRead Networks