XML

From MobileRead

Revision as of 18:20, 21 May 2009 by DaleDe (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search

XML is a generalized Markup up Language for the exchange of information. It is generalized in that it allows users to define their own tags and thus there is a data definition table required to decode the data. This DTD defined using a pointer near the top of the file.

Historically this was intended as a simplification of SGML (Standard Generalized Markup Language) and was based on earlier work on HTML. As compared to HTML it differs in that it can have custom definitions of tags and the tag structure has to be 'well formed' meaning that there must be a close tag for every open tag unless the tag is self closing. XML is also case sensitive.

Contents

[edit] XML for Metadata

XML is the language of choice for defining metadata. The main use of this is in OPF (Open eBook Package Files).

[edit] XML eBook Formats

In practice, for eBook devices XML has taken the lead in defining the structure of the source document for most modern books.

  • The standardization effort in the International Community is contained in the XHTML 1.1 specification maintained by the International Digital Publishing Forum (<idpf>) See http://www.idpf.org/specs.htm. This standard defines the book data and also a container mechanism to hold all of the various pieces of a book called ePUB.
  • A second standard is RSS which is used as a distribution standard for many news releases and blogs on the Internet. As eBooks attempt to move into daily news reading the RSS format will become very important. It is also based on XML.
  • A fourth XML standard is used in the publication of the Sony BBeB format for eBooks. This LRS format is compiled into LRF files or, if protected with DRM, LRX files. This format is also known as the Xylog XML format.

[edit] Other XML formats for documents

Besides the formats being proposed and implemented in the eBook community there is an ongoing debate on XML based formats for Document exchange. These are similar to the eBook formats so they are listed here for items to be aware of.

  • ODF - The Oasis Open Document Format is an xml based format being proposed by several companies. The parent ODF committee has recently jumped ship in favor of the CDF format proposed by W3C. This format is backed by Sun, IBM and others. It encapsulates xml in a zip file to avoid large file sizes. This format uses .ODT as the file name extension.
  • CDF - The Compound Data Format is proposed as an xml format by the W3C committee that controls such important standards as html and xhtml. See http://www.w3.org/2004/CDF/
  • CDFML - The Common Data Format XML exchange format is proposed by NASA http://cdf.gsfc.nasa.gov/ for the open exchange of documents.
  • Microsoft Office Open XML - The exchange format being promoted for Document exchange. It is being used as a save format in Word 2007. The file is compressed by zip and used in its compressed form to save space. This file format uses a .DOCX extension for the file name.
  • OSIS is an XML Schema definition for Bibles and other Biblical research texts. It finds its way into several Bible study tools.
  • ABW is an XML Schema for AbiWord which is a freely available word processor program released under the Gnu license.
  • XPS and XML Paper specification used in Vista as the printer spool format.

[edit] XML character entity references

Unlike traditional HTML with its large range of character entity references, in XML there are only five predefined character entity references. These are used to escape characters that are markup sensitive in certain contexts:

  • &amp; → & (ampersand, U+0026)
  • &lt; → < (less-than sign, U+003C)
  • &gt; → > (greater-than sign, U+003E)
  • &quot; → " (quotation mark, U+0022)
  • &apos; → ' (apostrophe, U+0027)

All other character entity references have to be defined before they can be used. However, use of &apos; in XHTML should generally be avoided for compatibility reasons. &#39; or &#x0027; may be used instead.

Here is the syntax for creating an ENTITY:

<!ENTITY greeting1 "Hello world">

[edit] Related Information

While not specific to the XML format as used in eBooks the following articles are related.

  • Metadata is used to describe eBooks and is generally in XML format even if the eBook isn't.
  • MathML is an XML format specifically designed to add mathematic equations for use in eBooks and Browers.
  • ePUB is the current focus of a eBook format that embodies and embraces the XML

capabilities.

  • DocBook is a program that aid in XSL styleheets for XML documents.
  • DTBook is a standard using XML to support Digital Talking Books.
Personal tools
MobileRead Networks