Gutenberg

From MobileRead
Jump to: navigation, search

Project Gutenberg was the first producer of free electronic books (eBooks). They have a goal of releasing every public domain book in existence. They release all of their eBooks in .txt format for simple reading using a text editor. (They also release some books in other formats.)

Contents

[edit] Overview

Project Gutenberg began in 1971 when Michael Hart was given an operator's account with $100,000,000 of computer time in it by the operators of the Xerox Sigma V mainframe at the Materials Research Lab at the University of Illinois. (Actually there was a surplus of time available and Michael knew several of the operators.) An hour and 47 minutes later, he announced that the greatest value created by computers would not be computing, but would be the storage, retrieval, and searching of what was stored in our libraries.

He then proceeded to type in the "Declaration of Independence" and tried to send it to everyone on the network. Initially all of the eBooks in Project Gutenberg were typed in but today the are generally entered via OCR and then corrected. This has lead to much fewer errors. All of the books except very specialized ones are available in TXT format. Many are also available in other more specific eBook formats.

[edit] Problems and solutions

The problem with .txt file eBooks is that they do not lend themselves to elaborate or easy to read formatting options. They often have fix length lines of data that do not wrap well on the PPC or other small screens. If they don't wrap the line then they require scrolling sideways. In addition there is no graphics support, font size control, or character set choices. It is for these reasons that Project Gutenberg has released many eBooks in more advanced formats.

For Pocket PC .txt files mean that these files can be easily read by Pocket Word. However, an editor is likely not the best tool to read books with. It is typically not oriented toward just reading a page at a time and does not support such features as bookmarking your progress. It is also easy to accidentally modify a book you are trying to read with an editor. You can set the file to read only to prevent saving accidental modification. Very few, if any, serious eBook readers consider .txt files to be their preferred reading format.

[edit] Programs to help convert etext files

E-Book Tidy is a useful conversion program to aid in translating to and from Palm docs and to other formats. It is particularly useful in convert Gutenberg text files. It can be used as PC reader for Palm Docs. It will also convert word and rtf files but only fairly simple ones. It can be used as part of an html conversion.

Book Designer can be used to convert Gutenberg .txt files and many other formats.

GuteBook is a script that can automate fixing some Gutenberg problems and create nicely formatted eBooks.

Here is a perl script to fix broken lines in a paragraph

#!/usr/bin/perl -w
die "USAGE\n$0 filein fileout\n\n" if $#ARGV!=1;
open(A,"<$ARGV[0]");my @a=<A>; close(A);
open(B,">$ARGV[1]");
foreach $l(@a)
{
       $l=~/(.*)\n$/;
       if (not defined $1) {print"problems at line -$l-\n"}
       else
       {
               $l=$1;
               $l=~s/\r//g;  # if the file was in DOS mode
               if ($l!~/[\.:,;\"!\?\'\)-]$/)
               { print(B "$l ") }
               else{print(B "$l\n")}
       }
}
close(B);

Here is a Python script to fix broken lines in a paragraph, similar to the Perl script above

#!/usr/bin/python
from Tkinter import Tk
from tkCommonDialog import Dialog

class OpenFile(Dialog):
    command = "tk_getOpenFile"

rootwin = Tk(); rootwin.withdraw()
fname = OpenFile().show().split('/')[-1]
if fname != "":
    print >>open('out_'+fname, 'w'), open(fname).read().replace('\n\n','#uNiQuE#').replace('\n',' ').replace('#uNiQuE#','\n')

[edit] Conventions for TXT

Conventions for TXT used in Project Gutenberg have changed over the years. The most current version was formulated in 2004. It dictates 7 bit ASCII text be supported unless it is impossible to do so. An 8 bit ASCII (extended text) can be also submitted using ISO-8859-1 standards when accented characters are needed. The standard calls for:

[edit] For more information

[edit] Gutenberg sites

Copyright laws change from country to country. Be sure the download is legal in your country.

Personal tools
Namespaces
Variants
Actions
Navigation
MobileRead Networks
Toolbox
Advertisement