Making eReader files using Open Office and Dropbook
Original by Chris Meadows (Robotech_Master)
[edit] Overview
This page describes how to use a macro that can be installed in Open Office Writer application. Once install you will find the odt2pml tool-bar which can used to convert an ODT file to PML format. Once this is done the macro will use Dropbook (a free conversion tool) to create an eReader compatible file.
[edit] Introduction
eReader has long been one of my favorite e-book readers for any platform, and it particularly shines on the iPhone. It has a number of advantages: it is free, it has a better user interface and offers a much more polished reading experience than either Stanza or Bookshelf (although it does not read as many formats as either), and it is integrated with the eReader and Fictionwise stores far better than Stanza.
But it has another advantage, at least to me: it is remarkably easy to mark up and compile books in PML, eReader’s markup language. (PML originally stood for Peanut Markup Language, having been created back when eReader was still Peanut Press. They were fortunate when they were bought by Palm and changed their name to Palm Digital Media, since they could retrofit the acronym to Palm Markup Language...but now that they’re eReader, the “p” has been more or less orphaned.)
You can create PML books simply by placing the right codes in the right places in the document then dragging and dropping. You do not have to worry about puzzling out Mobipocket’s obscure table-of-contents builder, or trying to find a program to build ePub. You just do a bit of formatting in the text file, drag it and drop it to DropBook, and its done.
Thus, I have marked up a number of books in PML, both publicly-distributed books and for my own private use. Since eReader has recently amended their EULAs to allow for royalty-free business use of the eReader format, this seems like a good time to go over my personal Peanut mark-up conversion process.
I expect there may be easier ways to do some of the things I do, as I am somewhat self-taught in this process. I used to do it entirely by hand, much like hand-editing HTML bookmark lists. (Fortunately, I have since found an OpenOffice script to automate part of the process.) I will welcome suggestions for improvement in the comments.
Note that, though Fictionwise and eReader sell an “Ebook Studio” program, you really don’t need anything more than Palm’s free DropBook software and a little patience.
[edit] Install DropBook
The first thing you should do is download and install the DropBook software (Windows, Mac) from eReader. DropBook is the book conversion and compilation utility. All you need do is drag the PML file onto its window, then “drop” it.
The conversion macro that I use (see below) will try to launch DropBook automatically. For it to be able to do that, you should be sure you have it installed first.
[edit] May the Source Be With You
The first thing to consider in making an eReader book is what condition your source file is in. Is it a text file, HTML, RTF?
It used to be necessary to do some pretty complicated search-and-replace procedures to change a file from whatever format it was in (by replacing HTML <i> notations with PML \i for instance) but thankfully technology has improved since then.
There are now macros for MS Word and OpenOffice Writer that can take a source file from the word processor and output a raw PML file. I use the OpenOffice macro, odt2pml, which seems to be kept more current than any version of the Word macro I can find. Even if you are importing plain ASCII text files, such as you might find in Project Gutenberg, using the OpenOffice macros can still be quite helpful.
[edit] Begin
The first step, then, is to load the source file into OpenOffice. If multiple versions of the source are available, using a version that contains the entire book within a single file is preferable. Otherwise, you will have to insert or merge all the files together into a single long file before you can begin.
Once the file is loaded, save it as an ODT file (OpenOffice Writer’s default format). The file must be saved into that format for the macro to be able to process it. Then find the odt2pml toolbar (which you should have if you have installed the macro).
[edit] odt2pml
The odt2pml toolbar has four buttons on it:
- Format Ascii Text: This removes excess paragraph breaks if you are reformatting a text file.
- Extract Pictures: This extracts any pictures from the file (such as cover images) and places them in a properly-named subfolder, linked to your document.
- Apply Header Styles: This searches for headers that start with a particular word, such as “CHAPTER,” and applies a given level of header style to them. For most books, you should only ever need to use header level 1.
The above three buttons must be used before the fourth one:
- Export to eReader: This runs the main macro and generates the PML and PDB files.
The PML file is the raw markup, the “source” file. The PDB is the compiled form that is loaded into your reader.
Once you have exported the PML, your work may only just be beginning. Depending on how clean your source file was, there may be a number of errors that need cleaning up.
[edit] Peanut Markup Language
For the next stage of editing, I like to use emacs, or an emacs-like text editor (for Windows, I recommend NotGNU) because they allow you to search-and-replace on carriage returns. (This does, of course, require some knowledge of emacs’s somewhat obscure command syntax, which I have picked up over time. Unfortunately, I can’t help with that other than to direct you to a cheat sheet.) Some people may have created macros or scripts to do this sort of thing, but I have come to rely on simple searching and replacing.
For this stage of editing, you will need to bookmark these pages and keep them around for reference: the Peanut Markup Language formatting reference, and PML special characters reference. (Oddly enough, the characters reference does not appear to be linked from any of the other help pages anymore; I had to find it through a search engine.) They will be handy in figuring out your manual markup.
As you will see from looking at those references, PML formatting is handled by a sort of pidgin HTML code, and is actually reminiscent of old pre-WYSIWYG word processors such as WordStar or early WordPerfect. You place a code like \i at the beginning of text that is to be italicized, and then another \i at the end to turn it off. There is an example of formatted text included with DropBook that demonstrates some of the more common markup.
In this stage of editing, we will be correcting the markup in the macro-generated document.
[edit] Chapter Headings
The first step is to take a look at the PDB file that OpenOffice has generated. To save time in syncing, I generally use the desktop eReader client for this. The first thing I do is look at the table of contents.
In creating chapter titles, odt2pml uses the document’s heading styles. If there are more than one level of headings, there will be more than one level of chapter headers in the table of contents. These may or may not be appropriate. Happily, there is an easy way to get rid of extras.
The chapter titles are defined using \X1, \X2, \X3, etc. in descending levels of nesting. To get rid of anything below the main chapter titles, just search and replace \X2, \X3, etc. with nothing.
One decision made by the author of the macro that I don’t agree with is to use the \X1 header for basic chapters instead of \x. This ends up leaving all the chapter titles indented a notch in the table of contents—\X1 is really meant for chapter sub-headings that should not be considered a whole new chapter. I generally use search-and-replace to change these into \x, at the same time removing the pagebreak that \X1 requires but \x does not.
For instance, in a book where the chapter titles are centered, I search for \p<carriage return>\c\X1 and replace it with \c\x. (To produce a carriage return in the search box in emacs, you press Control-Q Control-J.) Then I replace all remaining \X1s with \x.
[edit] Tabs
If your document originally had tab-indented paragraphs, then it will have preserved the tab indents in the PML file by starting each line with \t \t (with one or more spaces between the \ts). You may wish to keep those if it is your preference.
I much prefer to have untabbed but separated paragraphs on a screen the size of a PDA or iPhone’s (especially since the OpenOffice macro already inserts a blank line between each paragraph anyway), so I simply remove them by search-and-replacing them with nothing.
[edit] Quotation Marks and Apostrophes
One small thing that I like to change has to do with quotation marks. A lot of source documents will only feature the straight up-and-down quotation marks and apostrophes—not the “curly” smartquotes that you see in professionally-published books and magazines.
I think it adds a lot to a book to go in and change those to the curly kind. But they have to be changed to the right curly-kind: the ones that point up at the beginning of a statement, and the ones that point down at the end. The codes you need to know for this are:
- \a145 ‘
- \a146 ’
- \a147 “
- \a148 ”
I will often do the single-quotation marks first, because there are usually fewer of them. In books that don’t use apostrophes to render dialect, this is usually pretty simple: I just search on <space><apostrophe> and <quotation mark><apostrophe>, because those are generally the only cases where an apostrophe needs to become an \a145 left-single-quote.
In non-dialect books, there are usually so few of these that I can get away with just searching on them manually, and replacing the ones that need to be replaced. This is important because words like “’til” and “’bout” need to start with right single-quotes.
In books with lots of dialect, I will generally start searching manually, but do mass S&Rs on given words as I find them. “’til” gets S&R’d to “\a146til” and so on, until I run out of dialect words. Then I search and replace all the space and quotation mark apostrophes with a left one, and every other apostrophe with a right one.
After that, double-quotation-marks are easy. Replace any <carriage-return><quotation mark> and <space><quotation mark> instances with left quotation marks (\a147), and then all the rest with right quotation marks (\a148).
Invariably, some quotation marks will be missed—I’ll get a word wrong, or there will be strange cases of dialogue being preceded by an emdash. But I’ll catch this when I proofread through the book when I’m “finished.”
[edit] Emdashes, Ellipses, Non-Breaking Spaces, & Miscellaneous
The OpenOffice macro should generally replace emdashes and ellipses with their special-character code equivalents (\a151 and \a133 respectively). But if they did not, or if the source document has two hyphens instead of an emdash, it might not hurt to do a quick search-and-replace to take care of it.
Also, in some cases a document may use four periods, or an ellipsis followed by a period, or put spaces around the ellipsis. These should be replaced with just an ellipsis and no spaces.
Depending on the quality of the source, there may also be some non-breaking spaces embedded in the document, to make things look peculiar. These will have been rendered as \a160 by the macro; search on that and see if they need to be replaced with regular spaces, or just removed altogether.
There may be other miscellaneous issues of formatting depending on how “clean” the source documents were. Subheaders may have been bolded in some places but italicized in others, and so on. If you spot one such header, you can search on the markup code used for it to try to find others. But you will probably end up spotting most of these glitches during the proofreading process.
[edit] Invisible Publication Information Header
At the top of the document will be an invisible header, denoted by a \v at the start and the end of the field. Within this header will be a TITLE field: TITLE=”<name of your document>”. Make sure that the document name is spelled right, as this is what will appear in the book list when your book is loaded into eReader.
You may also want to add AUTHOR, PUBLISHER, and COPYRIGHT headers, so that information will be included into the final book. For AUTHOR, it is all right to put “first-name last-name” or “last-name, first-name”. For books with multiple authors, put “last-name, first-name and first-name2 last-name2” (e.g. “Lee, Sharon and Steve Miller”).
If this information is not included in the headers, it will not show up in the book list. If even the TITLE line is removed, you will be prompted with a pop-up asking you to enter the information every time you recompile the book. This is annoying, so be sure you put all this information in now.
[edit] Compiling
To compile the book, simply drag the text file icon from the folder in which it resides over to the “DropBook” screen, and drop it where it says “Drop Files Here.” Assuming there are no coding errors, a PDB file will pop right out. If there are errors, look at the line numbers and try to find out what they are and fix them. If you will be recompiling the book multiple times (and you probably will be), make sure to place a check in the “Overwrite existing file” box.
When recompiling a book, make sure that the book file is not currently opened by an instance of the desktop eReader—if it is, it will not overwrite the file.
[edit] Proofreading
Once the book is compiled, load it into your reader (be it desktop or on your PDA) and read it all the way through. Keep the PML file handy in a text editor, so that you can make corrections whenever you run across a mistake—and you probably will run across plenty of them.
Every so often, recompile and reload the book so you can glance back and make sure your errors were fixed properly, and that you didn’t accidentally introduce a worse error as part of one of your fixes.
By the time you are done with this stage, you should have a properly-formatted eReader book! You can sync it to any device that has an eReader reader, or upload it to the Personal Content section of your eReader.com, Fictionwise.com, or Stanza.Fictionwise.com stores to download to your iPhone.
(Stanza readers should note that a number of formatting commands in PML, such as pagebreak and center alignment, will not be interpreted by Stanza. Your eReader book will still look best on eReader.)
Posted in Chris Meadows
[edit] For more information
- To discuss this topic Go here.
- Download for ODT2PML