PalmDOC
PalmDOC is often known in the Palm community as the DOC format, short for document, and is a term that can be confusing. DOC is used to identify Microsoft Word Documents because they use a .doc extension and it is also used to identify an eBook format originally used on PalmOS devices by Aportis and called variously DOC, Aportis DOC, and most recently PalmDoc or PalmDOC. Aportis is no longer in business but the format they introduced is a very popular format for eBooks. PalmDOC is one of several PalmOS document formats.
Contents |
[edit] PalmDOC
The default text file database format on Palm devices is the "DOC" file. It starts as a plain ASCII text file using line ends to mark the end of paragraphs and is converted using a form of RLE compression to save space in RAM. Available DOC readers decompress the file on the fly as they display it. The compression method is unbalanced, it can be decompressed much faster than it can be compressed. Most doc files are created on desktops and synced to the target device, as compressing on the Palm is slow. RLE compression results in an approximate 40% reduction in file size.
PalmDOC files expect to reflow the document to the edges of the screen display. There is very little formatting available in PalmDOC files but they can have bookmarks. Bookmark support is often used to provide a semblance of a Table of Contents and the data is saved at the end of the file. Not all readers support this bookmark capability. For more information on Palm Database files and other Palm eBook formats see PDB.
[edit] History
In 1996, Rick Bram developed a method to compress a text files for the Palm OS. He called the format "Palm Doc". In 1997, Aportis Technologies Corporation bought the rights to PalmDoc, and renamed it AportisDoc. As of December 31, 2002, Aportis has ceased operations. The format is very popular and the original name is now the current name for the format. PalmDOC files will typically have a .pdb extension but will occasionally be found with a .prc extension.
[edit] Format
The format is that of a standard Palm Database Format file. The header of that format includes the name of the database (usually the book title and sometimes a portion of the authors name) which is up to 31 bytes of data. This string of characters is terminated with a 0 in the C style. The files are identified as a Type of TEXt and a Creator ID of REAd.
The first record in the Palm Database Format gives more information about the PalmDOC file, and contains 16 bytes
bytes | content | comments |
---|---|---|
2 | Compression | 1 == no compression, 2 = PalmDOC compression (see below) |
2 | Unused | Always zero |
4 | text length | Uncompressed length of the entire text of the book |
2 | record count | Number of PDB records used for the text of the book. |
2 | record size | Maximum size of each record containing text, always 4096 |
4 | Current Position | Current reading position, as an offset into the uncompressed text |
PalmDOC uses LZ77 compression techniques. DOC files can contain only compressed text. The format does not allow for any text formatting. This keeps files small, in keeping with the Palm philosophy. However, extensions to the format can use tags, such as HTML or PML, to include formatting within text. These extensions to PalmDoc are not interchangeable and are the basis for most eBook Reader formats on Palm devices.
[edit] LZ77 compression
LZ77 algorithms achieve compression by replacing portions of the data with references to matching data that has already passed through both encoder and decoder. A match is encoded by a pair of numbers called a length-distance pair, which is equivalent to the statement "each of the next length characters is equal to the character exactly distance characters behind it in the uncompressed stream." (The "distance" is sometimes called the "offset" instead.)
In the PalmDoc format, a length-distance pair is always encoded by a two-byte sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding the distance, 3 go to encoding the length, and the remaining two are used to make sure the decoder can identify the first byte as the beginning of such a two-byte sequence.
[edit] PalmDoc byte pair compression
PalmDoc combines LZ77 with a simple kind of byte pair compression.
In this Algorithm the PalmDoc files are decoded as follows:
Read a byte from the compressed stream. If the byte is:
- 0x00: "1 literal" copy that byte unmodified to the decompressed stream.
- 0x09 to 0x7f: "1 literal" copy that byte unmodified to the decompressed stream.
- 0x01 to 0x08: "literals": the byte is interpreted as a count from 1 to 8, and that many literals are copied unmodified from the compressed stream to the decompressed stream.
- 0x80 to 0xbf: "length, distance" pair: the 2 leftmost bits of this byte ('10') are discarded, and the following 6 bits are combined with the 8 bits of the next byte to make a 14 bit "distance, length" item. Those 14 bits are broken into 11 bits of distance backwards from the current location in the uncompressed text, and 3 bits of length to copy from that point (copying n+3 bytes, 3 to 10 bytes).
- 0xc0 to 0xff: "byte pair": this byte is decoded into 2 characters: a space character, and a letter formed from this byte XORed with 0x80.
Repeat from the beginning until there is no more bytes in the compressed file.
PalmDOC data is always divided into 4096 byte blocks and the blocks are acted upon independently.
Occasionally, to obfuscate the data, some PalmDOC creation software will take the final, compressed data, and XOR each byte with 0xA5. So, if you're trying to decode a .pdb and the data looks all wrong, try doing an XOR with 0xA5 on all the data, then applying the rules listed above.
[edit] Bookmarks
PalmDOC does have support for bookmarks. These pointers are named and refer to an offset location in a file. If the file is edited these locations may no longer refer to the correct locations. Some reading programs allow the user to enter or edit these bookmarks while others treat them as a TOC. Some reading programs may ignore them entirely. They are stored at the end of the file itself so the full file needs to be scanned when loaded to find them. In a standard Palm platform the number of entries is limited to 15. This is due to the fact that the category feature is used as a drop down provide access the bookmark feature.
[edit] Reading PalmDOC
PalmDOC format can be read by a wide variety of programs. Almost every reader that is supported on Palm Devices can read PalmDOC files. In addition there are programs for most other OS's that can also read these files. There are thousands of files available in this format but generally not commercial files. The lack of its ability to support DRM and any kind of formatting generally relegates it to casual reading of Public Domain eBooks. See eBook Software for a list. Sometimes a Reader or program that is designed to read MOBI will also read PalmDOC as MOBI is really a superset of this format.
Dedicated Readers that can read this format include several models of Hanlin eBook Readers.