DOC
From MobileRead
DOC, short for document, is a term that can be confusing. It is used to identify Microsoft Word Documents because they use a .doc extension and it is also used to identify an eBook format originally used on PalmOS devices by Aportis and called variously DOC, Aportis DOC, and most recently PalmDoc or PalmDOC. Aportis is no longer in business but the format they introduced is a very popular format for eBooks.
Contents |
[edit] Microsoft Word DOC files
When DOC is used to refer to Microsoft Documents is can refer to any of several versions of .doc files from the original DOS files to Word 2003 files. (The native format for Word 2007 is Microsoft Open Office). There is no guarantee that a product that claims to read DOC files will be able read any particular version of doc file or support all of the features of that version.
To aid in the exchange of doc files between programs every Word version since Word 97 has been able to save their doc files in a Word 97 format. Saving a Word file in this format will lose any of the database features that were added to the file format after Word 97.
Word .doc files are used as an source format for some eBook files. In addition there are converters that will convert .doc files directly into a particular eBook format. Many of these converters actually require that a copy of word be present on the system to help with the conversion.
Some users might wonder why .doc isn't used more often as an eBook format. This is because using Word to read the document is not a very satisfying experience since it is so easy to accidentally change the document and scolling is not the best way to read a book. Some versions of Word have a review mode that can provide a reasonable reading experience for these documents.
There are many other word processing programs that can read and write files that are .doc compatible however some of the .doc features may or may not be supported. This can cause problems when trying to read a .doc file with one of these programs. In addition the eBook translation programs will not look for one of these programs to aid in the translation. The main work around is to use RTF files as source documents. This format is designed to allow the exchange of documents between different programs and operating systems. All Word programs can save files in RTF format.
Word is also considered part of the suite called Microsoft Office files.
[edit] PalmDOC
The default text file database format on Palm devices is the "DOC" file. It starts as a plain ASCII text file using line ends to mark the end of paragraph and is converted using a form of RLE compression to save space in RAM. Available DOC readers decompress the file on the fly as they display it. The compression method is "asynchronous" -- it can be decompressed much faster than it can be compressed. Most doc files are created on desktops and synced to the target device, as compressing on the Palm is slow. RLE compression results in an approximate 40% reduction in file size.
PalmDOC files expect to reflow the document to the edges of the screen display. There is very little formatting available in PalmDOC files but they can have bookmarks. Bookmark support is often used to provide a semblance of a Table of Contents and the data is saved at the end of the file. Not all readers support this bookmark capability. For more information on Palm Database files and other Palm eBook formats see PDB.
[edit] History
In 1996, Rick Bram developed a method to compress a text files for the Palm OS. He called the format "Palm Doc". In 1997, Aportis Technologies Corporation bought the rights to PalmDoc, and renamed it AportisDoc. As of December 31, 2002, Aportis has ceased operations. The format is very popular and the original name is now the current name for the format. PalmDOC files will typically have a .pdb extension but will occasionally be found with a .prc extension.
[edit] Format
The initial header for the file is the standard Palm Database Format which includes the name of the database (usually the book title and sometimes a portion of the authors name) which is up to 31 bytes of data. The files are identified as Creator ID of REAd and a Type of TEXt.
PalmDOC uses LZ77 compression techniques. DOC files can contain only compressed text. The format does not allow for any text formatting. This keeps files small, in keeping with the Palm philosophy. However, extensions to the format can use tags, such as HTML or PML, to include formatting within text. These extensions to PalmDoc are not interchangeable and are the basis for most eBook Reader formats on Palm devices.
LZ77 algorithms achieve compression by replacing portions of the data with references to matching data that has already passed through both encoder and decoder. A match is encoded by a pair of numbers called a length-distance pair, which is equivalent to the statement "each of the next length characters is equal to the character exactly distance characters behind it in the uncompressed stream." (The "distance" is sometimes called the "offset" instead.)
In the PalmDoc format, a length-distance pair is always encoded by a two-byte sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding the distance, 3 go to encoding the length, and the remaining two are used to make sure the decoder can identify the first byte as the beginning of such a two-byte sequence.
PalmDOC data is always divided into 4096 byte blocks and the blocks are acted upon independently.
[edit] Reading PalmDOC
PalmDOC format can be read by a wide variety of programs. Almost every reader that is supported on Palm Devices can read PalmDOC files. In addition there are programs for most other OS's that can also read these files. There are thousands of files available in this format but generally not commercial files. The lack of its ability to support DRM and any kind of formatting generally relegates it to casual reading of Public Domain eBooks.

