Metadata

From MobileRead

Jump to: navigation, search

Metadata is data about data. That is: if an eBook is considered to be data then the information describing the book would be metadata. It is very important that this data be identified separately with keywords so that it can be used by library software to describe the eBook and by retailers to identify the contents.

Contents

Overview

Most library management programs extract this data from the original eBook file or container so that it can be referenced separately and in some cases they may get data from other sources instead of the file itself.

This is the equivalent of a library card from the library catalog in a public library.

Metadata should include at least:

  • TITLE for the title of the book.
  • AUTHOR for the authors name
  • PUBLISHER for the publisher name
  • COPYRIGHT for copyright information or published date
  • EISBN or ISBN for the book identifier

It may also include many other things such as a cover image, a description (subject) of the book, the genre and the language of the book. See Open eBook for an example of a package file containing metadata.

HTML metadata

The <head> section of an HTML document can be the source of metadata for some applications. Here is a sample of some of the data that may appear.

  • <meta name="gemstar-legacy" content="publisher 2.0" />
  • <title>The Title</title>
  • <meta name="Author" content="Authors Name" />
  • <meta name="Description" content="Mystery, Suspense, History, Gothic, Literature, Books, Arts" />
  • <meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
  • <meta name="GENERATOR" content="rbmake v0.99d using the rbmake library v0.99d" />

Metadata on files

eBook files that are recognized by the appropriate operating system include a feature that can be used to add metadata to the file. For example on Windows you can right click on the file in a browser window and select properties. A general tab and a Summary tab will appear. Select the Summary tab to fill in the data.

  • Title
  • Subject
  • Author
  • Category (Genre)
  • Keywords (used for searches)
  • Comments
  • source (only in advanced view)
  • Revision Number (only in advanced view)

In some OS's (MacOS X, Linux) this information can be used to correct a title or author's name in the metadata contained in the file itself but Windows does not store it in the file. This data will show up if the mouse cursor is placed over the icon on systems that support the hover feature.

PDF files will also show their metadata in a tab on the properties selection but you may need to open the file itself to change the data. This data is not sync'd to the Summary tab on Windows systems so it can be different.

Culture data

Culture data is an important component of metadata about an eBook. The most obvious information is the language in which the eBook is written which implies the character set. Other culture items are usually subdivided from the language. Culture data includes but is not limited to, sorting order of the alphabet, conventions used in writing dates, and formatting numbers.

The culture names follow the RFC 1766 standard in the format "<languagecode2>-<country/regioncode2>", where <languagecode2> is a lowercase two-letter code derived from ISO 639-1 and <country/regioncode2> is an uppercase two-letter code derived from ISO 3166. For example, U.S. English is "en-US". In cases where a two-letter language code is not available, the three-letter code derived from ISO 639-2 is used.

ePUB metadata

The latest and most complete eBook publishing standard is called ePUB. ePUB rules allow users to add publication information according to DCMI terms. It follows the rules of Dublin Core (DC). Order is not significant, and duplicate tags are allowed. A language attribute can be added where needed. The standard defines an assortment of meta data including the following:

Required terms:

  • title
  • language — use a RFC3066 language code
  • identifier — use a probably unique string: URI or ISBN would be good choices. An attribute can be used to specify the type of identifier such as: "ISBN" or "DOI." Multiple identifiers are allowed. A list of choices might be:
    • DEWEY Dewey Decimal System
    • DOI Digital Object Identifier
    • ISBN International Standard Book Number
    • ISSN International Standard Serial Number
    • LCC Library of Congress Classification
    • LCCN Library of Congress Control Number (also known as "Library of Congress Card Number")
    • OSIS Open Scriptural Information Standard
    • SICI Serial Item and Contribution Identifier
    • URI Uniform Resource Identifier
    • URL Uniform Resource Locator
    • URN Uniform Resource Name

Optional terms:

While optional these terms should be present if known. The standard permits reusing a term. For example: if there are two authors then there would be separate creator entries for each.

  • creator - This is the principle author of the work. One author per entry.
    • creator and contributor can have an attribute that defines the roll opf:role — see http://www.loc.gov/marc/relators/ for values. These include Author, Illustrator, editor (for compilations), etc.
  • contributor - This identifies other persons who had a less important role, such as writing the prelude or illustrating where images play a minor roll in the publication.
  • publisher
  • subject - no standard form but it could use the Library of Congress Subject Heading System for example.
  • description - This field normally contains the description that is used to describe the book contents on a retail site or press release.
  • date - The format is defined as ISO 8601 on which it is based. In particular, dates without times are represented in the form YYYY[-MM[-DD]]: a required 4-digit year, an optional 2-digit month, and if the month is given, an optional 2-digit day of month.
    • The date can have an attribute with undefined contents. opf:event is used.
  • type - general categories, functions, genres
  • format - mimetype
  • source - Information regarding a prior resource from which the publication was derived.
  • relation - auxiliary resource and its relationship to the publication.
  • coverage - The extent or scope of the publication’s content.
  • rights (copyright, CC creative commons, public domain, etc.

See The ePUB specification Section 2.2 for more information on this topic.

Example

An example would be:

<package version="2.0" xmlns="http://www.idpf.org/2007/opf"
        unique-identifier="BookId">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
               xmlns:opf="http://www.idpf.org/2007/opf">
          <dc:title>Alice in Wonderland</dc:title>
          <dc:language>en</dc:language>
          <dc:identifier id="BookId" opf:scheme="ISBN">
           123456789X
          </dc:identifier>
          <dc:creator opf:role="aut">Lewis Carroll</dc:creator>
    </metadata>
        ...
</package>

Further References

Personal tools
MobileRead Networks