XML

From MobileRead

Jump to: navigation, search

XML is a generalized Markup up Language for the exchange of information. It is generalized in that it allows users to define their own tags and thus there is a data definition table required to decode the data. This DTD defined using a pointer near the top of the file.

Historically this was intended as a simplification of SGML (Standard Generalized Markup Language) and was based on earlier work on HTML. As compared to HTML it differs in that it can have custom definitions of tags and the tag structure has to be 'well formed' meaning that there must be a close tag for every open tag unless the tag is self closing. XML is also case sensitive.

Contents

[edit] XML for Metadata

XML is the language of choice for defining metadata. The main use of this is in OPF (Open eBook Package Files).

[edit] XML eBook Formats

In practice, for eBook devices XML has taken the lead in defining the structure of the source document for most modern books.

  • The standardization effort in the International Community is contained in the XHTML 1.1 specification maintained by the International Digital Publishing Forum (<idpf>) See http://www.idpf.org/specs.htm. This standard defines the book data and also a container mechanism to hold all of the various pieces of a book called ePUB.
  • A second standard is RSS which is used as a distribution standard for many news releases and blogs on the Internet. As eBooks attempt to move into daily news reading the RSS format will become very important. It is also based on XML.
  • A fourth XML standard is used in the publication of the Sony BBeB format for eBooks. This LRS format is compiled into LRF files or, if protected with DRM, LRX files. This format is also known as the Xylog XML format.

[edit] Other XML formats for documents

Besides the formats being proposed and implemented in the eBook community there is an ongoing debate on XML based formats for Document exchange. These are similar to the eBook formats so they are listed here for items to be aware of.

  • ODF - The Oasis Open Document Format is an xml based format being proposed by several companies. The parent ODF committee has recently jumped ship in favor of the CDF format proposed by W3C. This format is backed by Sun, IBM and others. It encapsulates xml in a zip file to avoid large file sizes.
  • CDF - The Compound Data Format is proposed as an xml format by the W3C committee that controls such important standards as html and xhtml. See http://www.w3.org/2004/CDF/
  • CDFML - The Common Data Format XML exchange format is proposed by NASA http://cdf.gsfc.nasa.gov/ for the open exchange of documents.
  • Microsoft Open Office XML - The exchange format being promoted for Document exchange. It is being used as a save format in Word 2007. The file is compressed by zip and used in its compressed form to save space.

[edit] Related Information

While not specific to the XML format as used in eBooks the following articles are related.

  • Metadata is used to describe eBooks and is generally in XML format even if the eBook isn't.
  • MathML is an XML format specifically designed to add mathematic equations for use in eBooks and Browers.
  • ePUB is the current focus of a eBook format that embodies and embraces the XML capabilities.
Personal tools
MobileRead Networks