PBO

From MobileRead
Jump to: navigation, search

The PBO, Parallel Book, format is used by the Aglona Reader.

[edit] Overview

This description comes from the Aglona Reader web site.

PBO, Parallel Book, file format was specifically designed for storing parallel texts used for language learning. Files in this format have PBO extension and essentially are XML files containing texts of an original book ("source") and its translation in another language ("target") as a set of so called "fragment pairs". The format has the following features:

  • Every pair can contain corresponding text on a sub- or super-sentence level, that is, correspondence can be established between sentences, parts of sentences or even several sentences at once. That means it is possible to break longer sentences into shorter parts or combine shorter sentences into bigger fragments.
  • Either side of a pair can act as a "paragraph-starter" independently of the other side. This information helps to keep the original structure of the "source" and "target" texts even after having partitioned them into fragment pairs.
  • Every fragment pair has a "structure level" of 0, 1, 2, or 3, where 0 stands for plain text, and 1 to 3 mean headers of level 1 to 3 correspondingly. This information provides storing of the contents of the book (like chapters, volumes etc.).
  • metadata information is available for the book, like Author, Title, Info and Language ISO code for both sides.

[edit] The internals

This description is based on looking inside of the book "Picture of Dorian Gray" by Oscar Wilde. The internal format is not intended to be edited directly. There are no line feeds or carriage returns in the format and the reader will not work if these are in the file. It is not compressed. It is based on XML but would not pass the rules of XML formats.

There are just two tags. The first tag defines the format. It is <ParallelBook> It contains all of the metadata for the book using assignments. They are: lang1=, author1=, title1=, info1=, lang2=, author2=, title2=, info2=, info=. The data for these assignments are always placed inside quotes.

The second tag is a <p />. There is no data apart from the assignments contained inside the p tag. Like the tags above the content is always placed inside quotes. There is always an s= containing a source fragment and a t= target fragment. Optionally there can be an l= (level) tag at the beginning. It contains a 3 for paragraph boundaries and a 4 for TOC entries. I have seen a 1 as well but it had no visible effect.

There seems to be no way to provide a forced page break. There is no special treatments such as a title page. There does not seem to be support for figures. The whole book is in one file and ends with the closing </ParallelBook> tag.

The content of the first tag can be displayed at any time using the Book/Information command. The lang1 and lang2 entries define the languages using standard nomenclature. The info= entry usually contains the person who created the document or other information.

An example of the p elements for English/German version, shows a chapter heading and the start of a paragraph.

<p l="4" s="CHAPTER 1" t="Erstes Kapitel" /><p l="3" s="The studio was filled with the rich odour of roses," t="Das Atelier schwamm in einem starken Rosendufte," />

Personal tools
Namespaces

Variants
Actions
Navigation
MobileRead Networks
Toolbox