PDF

From MobileRead

Jump to: navigation, search

PDF stands for Portable Document Format and was created in 1993 by Adobe Systems for the interchange of documents. Initially it was designed as a print format similar to PostScript and even today is is often used to exchange data that will be printed. Since it was designed as a print format it specifies the size of the paper that is needed to reproduce (render) the original. This article will focus on the use of PDF files for Mobile eBook reading.

Contents

[edit] Content

A PDF file may contain several types of data. These include Text, Raster Graphics, Vector Graphics (SVG, Fonts (Glyphs), and meta data. Not all PDF files contain all of the types of data and some PDF files are different than they seem. For example a PDF file might look like it has text in it while in fact the PDF is displaying an image (picture) that contains text. Text, displayed in this fashion, behaves differently than regular text when the file is manipulated by the display software.

A PDF may contain tables but there is usually no intelligence to the table construction. It is made of text placed at specific places on the page and a bunch of graphic lines. It looks like a table to the user but it is not possible to extract a table from the data base. A table may also exist as an image in the file.

A PDF, by its very nature, contains pages of information that are to be displayed or printed. The size and rotation of these pages is determined when the file is built although it can change from page to page. It is possible to zoom in or out on a page of data to make it seem larger or smaller using the capabilities of some viewing software. This sets PDF files apart from many other eBook formats because most of them provide data that is not constrained to a page boundary or required to conform to placement on a fixed size page.

[edit] Fonts and Text

Text in a PDF is referenced to a particular font and font size. This reference may to be to fonts that are enclosed in the file itself or to external fonts that are expected to be available to the rendering software. If the fonts are not available the output may not render properly. Having the font internal does increase the size of the file but means that the characters will be available. This can be especially important for unusual character sets so sometimes only a subset of fonts are included in the document. Both Adobe Type 1 (TTF) and Type 3 (bitmapped fonts) can be embedded.

[edit] Images

Images (Graphics) may be either raster or vector based. A raster image (also known as bitmapped) is often created by scanning a picture. Even a digital camera scans the image to create the file containing the picture. The resolution of the image (pixel width and height) is determined when the file is created but may be larger than is needed to permit zooming in with high fidelity. The rendering software scales the picture based on information in the file. Normally zooming in beyond 100% is done by replicating pixels and results in a blocky appearance to the image. Zooming out can sometimes leave out narrow lines in the image.

Vector based images are built with lines and mathematical curves. These types of images can be zoomed without losing the quality of the image.

As already mentioned some documents are built totally from scanned images using the PDF format as a container for the images. There may not be any content text inside the file.

[edit] Tags

Text may or may not be tagged in a PDF file. Tags are meta data that provide intelligence about the text itself. They basically allow the rendering software to be able to move or resize the data in an intelligent way where the content is not lost.

[edit] reflow

Being able to rearrange the text is called reflowing the document and permits a PDF designed for a full sized piece of paper to be easily read on a small devices such as a PDA or eBook Reader. Tags are best put in when the document is created but some tools can be used to add tags after the fact. Some editing tools that create PDF files only when saving or printing the document cannot create tag data and these files will not support reflow.

Just because the tags are present does not mean that the rendering program (reader) is able to use them. In fact most non-Adobe readers cannot reflow documents. They depend, instead, on zooming and panning the document. Zooming out a document to make the page fit on a page is likely to make the text too small to read easily.

Reflow is related to zooming in the sense that the text is resized to the original requested text size in the document even when the page is smaller. As a matter of fact the reflow option on the PC version of Adobe Reader is part of the zoom menu item. Even if a document is reflowed the zoom feature can still be used to vary the text size.

One good use of reflow is to display a multicolumn document in one column which can make it much easier to read on electronic devices where the full page cannot be seen at once.

[edit] Other features

A PDF file can contain hot links to other places in the file or links to objects outside the file. In addition a PDF can contain a TOC or index with links to places or data in the file. Not all readers can support these and not all editors are capable of adding this data to the file in the first place. Some documents support comments that can be added by reviewers.

A PDF can also be an archive document for preserving the content. These kinds of documents always have embedded fonts and are not compressed.

[edit] PDF Creation

PDF is a ISO standard and the makeup is documented so that there are many tools that can create PDF files.

  • Adobe Acrobat, Adobe InDesign, Adobe Pagemaker, and Adobe Framemaker can make intelligent PDF files with TOCs and Tagging.
  • PDF Printers - many third party programs work by intercepting the print command and creating a PDF image instead of actual Printer output. These PDF files cannot have much inherent intelligence about the content.
  • Neevia makes a product called docuprinter LT that can use Macros in Word and other Microsoft Office files to generate a PDF with TOC and links. It also has variable compression ability.

[edit] PDF Viewers

There are many PDF viewers available. Adobe makes free viewers called Adobe Reader for many platforms and many 3rd party programs exist. All of the versions from Adobe, other than Palm, can read PDF files directly without requiring conversion. A new third party application called PalmPDF is available to read PDF files directly on Palm OS 5 units. It even supports reflow.

Adobe Digital Editions is specifically targeted for eBook Reading. It can read ePUB and PDF documents.

Most 3rd party rendering programs do not support reflow and may not even have TOC capabilities. Some also have very limited zoom and do not support panning.

[edit] Tips

The most important tip for eBook users is that you can create custom sized pages for PDF use when you build your own PDF files. Generally this custom sized paper should be 5.24" x 6.63" (6.69?) for good results on a 6" reader (800x600 pixel). You can build this once as a template for your printer and then reference it when needed. Margins can be set to whatever you like.

Microsoft ActiveSync will add tags needed for reflow to a PDF file that does not have them while transferring it to your PDA. Once added these tags are permanent. Adobe Acrobat can also add tagging to a PDF file after the fact.

[edit] Limitations

While PDF is a very popular format for sharing files on computer systems it does have limitations when used with a portable eBook Reader. Some of these are inherent in the format and some are because the rendering software does not support all of the features available in PDF.

  • A PDF file will generally be much larger than a file in many other eBook formats. This can cause problems in how many eBooks you can have on your device, and it can cause the rendering software to behave sluggishly or even not work on some files. A document in PDF can be significantly different in size depending on how it was created. Some of the issues are:
    • Graphic images mimicking text are much larger.
    • Graphics can be much larger (higher resolution) than is needed.
    • Compression can be adjusted when the file is built to provide for typical viewing or professional typesetting.
    • Editing of the PDF can leave artifacts in the file.
    • Embedding fonts will increase the file size.
  • The complexity of the supported capabilities can be beyond the ability of some rendering programs. This can be very confusing to the user since some PDF files work fine and some will not display properly or even may not even load. Some of these reasons include:
    • PDF supports a wide variety of graphics formats and some rendering software may not support all of them. SVG and JPEG2000 are two suspects.
    • A PDF file can be built from multiple PDF files. This ability to append and insert files can cause data to not be linear in a file and metadata to be scattered all over the file. This can require loading an entire document into memory to find and display the information properly.
    • The fonts used by the rendering program may not match the source file.
    • PDF file formats have gone through many revisions and some readers may not handle all versions.
  • Page numbers referenced by the program may not match the printed document. This is usually a result of front matter in the document being identified with roman numeral numbered pages while the main document starts the numbering again or uses a chapter based numbering system.
  • Outside of PDA's there are no eBook readers that support reflow.
  • PDF files are sometimes read with a different program on the reader causing the user interface to be different and have different features. This can make it inconsistent with user expectations.
  • Readers will typically squeeze the PDF page to fit the reader screen size. This can make many documents impossible to read due to extremely small print.
  • Multicolumn PDF files can be difficult to read due to the need to backup to get back to the top of the page when the reader splits the page.
  • Images may be distorted or unreadable due to resizing in the rendering software. Good software should provide the ability to separately zoom a particular image.
Personal tools
MobileRead Networks