OCR villains

From MobileRead
Jump to: navigation, search

This page lists many of the typical OCR errors found when proof reading a book. Some of these can be found with spell checking and a few more with grammar checking programs but some will just need a keen eye. In some cases you can search through the document and replace the ones that don't belong.

Contents

[edit] Numbers, symbols and letters

0 <--> O {zero <--> Uppercase o}

1 l I i ! <--> each other
{digit One, lowercase L, uppercase i, lowercase i, exclamation mark}

2 <--> Z
5 <--> S
6 <--> uppercase G
7 <--> ? {question mark}
7 and / = I {uppercase I in italic}

] = J
square bracket = uppercase J
]ane = Jane

[edit] letters

e <--> c
are <--> arc

cl <---> d
clock <--> dock
close <--> dose

f ligatures confusion
ff, fi, fl, ffi

h <--> b
back <--> hack
harrow <--> barrow

H = ll
weH = well

H or h = li
Hbrary = library
hke = like

hn = lm
ahnost = almost

j <--> J {lowercase <--> uppercase J }
jane = Jane
Jury = jury

rn <--> m
Mom <--> Morn
stem <--> stern
earnest = camest {this also had the e=c combo}
modem = modern
corner = comer

ri <--> n
arid <--> and

r = f
ringers = fingers

m <--> in
stein <--> stem
rmg = ring
inoth = moth

im <--> un
unport = import
imdone = undone

n <--> u
bnt = but
teut = tent
uest = nest

ii = u
iinder = under

B <--> R {uppercase}
DEABEST = DEAREST
Robby <--> Bobby

F <--> P {uppercase}
Full <--> Pull

ih = th
feaiher = feather

di = th {weird, but it happens a lot}
die = the

tii = th
tiie = the

tli = th
tlie = the

Tm == "I'm (also with no leading quote)
T = I {uppercase i}

U = double ell, li, il
WeU = Well
Ufe = life
untU = until

vv = w
vvhen = when

\V = W

y <--> v
yery = very
verv = very

[edit] Punctuation errors

/' = ," or .” {or single quote}

* = quote mark
** *' '*

'' = " {two single quotes, should be a double quote}

Space following opening quote mark
Space preceding closing quote or punctuation mark.
He did this ; then he did that ; then he said : “ You aren’t ready ! ”

Apostrophe goes missing, stranding the last letter
I m = I’m, don t = don’t, Bob s = Bob’s

These following often occur with a "Smarten Punctuation" action:

Backward quote marks:
” close quote at start of paragraph
“ open quote at end of paragraph

Reversed single and double quotes in nested quotations:
“And I said to him, ‘Quit that!”’
‘“O what a tangled web we weave,’” she said.

’ Right single quote should replace "straight" apostrophe, not ‘ Left single quote. 
Happens often at start of a word:
 ‘em should be ’em, ‘tis should be ’tis

- hyphenation problems. The source has hyphens when the word breaks at the end of a line 
  but the hyphen is left in when the document reflows. (A search can usually find these.)

) with a space in front. Sometime ( will have a space after it. Search for these.

[edit] For more information

Personal tools
Namespaces

Variants
Actions
Navigation
MobileRead Networks
Toolbox