This page covers the Sound capabilities that are found in some eBook Readers. It is not intended to be a complete technical page on digital sound.
 Sound Use
Many eBook reading devices do not support sound or only support beeps. Those that do tend to support sound for music background while reading eBooks. Some, however, consider it another format for eBooks themselves (called audio books). For example MP3 books are readily available as an alternative to reading. Audible is a company that specializes in producing audio books using a highly compressed proprietary format.
For readers that are trying to learn a new language having the audio playing while reading a book can aid in learning the language pronunciation. Some eBook readers even support reading a book aloud by synthesized TTS (text to Speech) technology. This permits the text pages to be synced exactly with the sound although some may argue that synthesized sound does not produce good pronunciation for many words. Other Read aloud techniques can be used to provide synchronized sound recordings.
This page covers the digital formats used in the recording and reproduction of sounds also called audio formats. It is also possible to define music using a notation system and then create music directly from the notations. See the article on MIDI for a discussion of this system.
Many eBook readers advertise long battery life based on weeks between charges. However, this is not true if you use the reader for music or other sounds. The long life is based on the fact that the internal electronics can be mostly shut down while you are reading the page but that is not possible for audio. Expect battery life to be 4 to 8 hours when audio is in use although a few may stretch it out. The volume effects battery life so feeding the sound output into external amplified speakers will often extend the battery life, while using un-amplified speakers loud enough to hear without headphones will shorten it.
Of course audio is also an integral part of most video system. A video format and a usually synchronized audio format are combined and placed inside a container format. These container formats are specified in the video article.
 Sound Formats
Advanced Audio Coding (AAC) is a standardized, lossy compression and encoding scheme for digital audio. AAC is promoted as the successor to the MP3 format by MP3’s creator, Fraunhofer IIS.
AAC generally achieves better sound quality than MP3 at the same bitrate, particularly below 192 kbit/s. AAC uses a modular approach in encoding providing the flexibility to use different tools depending on the application and complexity requirements. AAC offers different profiles by making use of tools such as TNS (Temporal Noise Shaping), PNS (Perceptual Noise Substitution), Mid-Side stereo, Intensity stereo, and frequency domain prediction.
AAC’s most famous use is as the default audio format of Apple's iPhone, iPod, iTunes, and the format used for all iTunes Store audio (with extensions for proprietary Digital Rights Management, DRM, where used).
AAC is also the standard audio format for Sony’s PlayStation 3 and the MPEG-4 video standard. HE-AAC is part of digital radio standards like DAB+ and Digital Radio Mondiale. AAC-LC is a low complexity version of AAC, and does not support frequency domain prediction tool, it is standardized by MPEG-2 (part 7) and MPEG-4 (part 3)
The extensions that can be used for AAC files include:
- .m4a, no DRM
- .m4b, m4a files with bookmarking feature
- .m4p, using FairPlay DRM
- .m4v, video files
- .m4r, ring tones
- .mp4, full video and audio
A new improved version of AAC is called High efficiency AAC, HE-AAC, and has better compression and improved audio performance, particularly at lower bitrates.
MPEG-1 Audio Layer 3, more commonly referred to as MP3, is an audio encoding format. It usually has a .mp3 file extension.
It uses a lossy compression algorithm that is designed to greatly reduce the amount of data required to represent the audio recording, yet still sound like a faithful reproduction of the original uncompressed audio to most listeners. It was invented by a team of European engineers at Philips, CCETT (Centre commun d'études de télévision et télécommunications), IRT and Fraunhofer Society, who worked in the framework of the EUREKA 147 DAB digital radio research program, and it became an ISO/IEC standard in 1991.
MP3 is an audio-specific format. The compression removes certain parts of sound that are outside the hearing range of most people. It provides a representation of pulse-code modulation — encoding audio in much less space than straightforward methods, by using psychoacoustic models to discard components less audible to human hearing, and recording the remaining information in an efficient manner. This is quite different from the principles used by, say, JPEG, an image compression format, which are purely frequency domain based.
The lower the bitrate of an MP3 recording the lower the frequency response and the smaller the file. For reading the text a low bitrate, small file is adequate. A monophonic format will also reduce the file size and should always be used for audio books.
The standard bitrate for music in MP3 is 128K bps. This offers good quality but many audiophiles prefer higher bitrates like 192K or even 320K. There are less artifacts in these higher bitrates but, of course, they will have a larger file. The quality is dependent on the bitrate but also on the quality of the encoder used to create the file. In addition the sample rate effects the quality of the file. Some files use a variable bit rate to increase the quality for difficult waveforms while using a lower bitrate for simple waveforms to provide better quality while decreasing the size of the file.
For audio books the use of 64K provides high quality voice and even lower bitrates can be used with good results. Bitrates supported in the MP3 standard include 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160, 192, 224, 256 and 320 kbit/s. The available sampling frequencies are 32, 44.1 and 48 kHz with 44.1 being the most used since it matches a CD.
MP3 metadata can also be appended to the file or prepended in some cases to allow its display while streaming.
Windows Media Audio is an audio format developed by Microsoft to compete with MP3. Currently it is a widely accepted format available on a number of hardware platforms. WMA can refer to the data or to the codec that encodes and decodes the data.
WMA consists of four distinct codecs.
- The original WMA codec, known simply as WMA, was conceived as a competitor to the popular MP3 and RealAudio codecs. Today it is one of the most popular codecs, together with MP3 and MPEG-4 AAC.
- WMA Pro, a newer and more advanced codec, supports multichannel and high resolution audio.
- A lossless codec, WMA Lossless, compresses audio data without loss of audio fidelity. This is a newer more efficient replacement for the original Microsoft WAV format.
- WMA Voice, targeted at voice content, applies compression using a range of low bit rates.
The file itself is a container with data in one of the 4 codec forms and a header that specifies which one is used. The container can optionally contain DRM. Not all players can play all 4 forms or support DRM. WMA with DRM is often used by lending libraries that lend audio books.
WAV or WAVE files, short for Waveform audio format, is a Microsoft and IBM audio file format standard for storing an audio bitstream on PCs using the .wav extension. It is a variant of the RIFF bitstream format method for storing data in “chunks”, and thus also close to the AIFF format used on Macintosh computers. It is the main format used on Windows systems for raw and typically uncompressed audio. Though a WAV file can hold compressed audio, the most common WAV format contains uncompressed audio in the pulse-code modulation (PCM) format. PCM audio is the standard audio file format for CDs, containing two channels of 44,100 samples per second, 16 bits per sample. Since PCM uses an uncompressed, lossless storage method, which keeps all the samples of an audio track, professional users or audio experts may use the WAV format for maximum audio quality.
 Ogg Vorbis
OGG is the file extension used for Ogg Vorbis files. This is a public domain open music format that is intended as an alternative to MP3. It claims to sound better than MP3 for the same bit rate. For voice audio Ogg uses Speex format.
Audible provides audio books for playing on a wide variety of platforms. Audible files have a .AA extension. They have 4 formats available depending the fidelity required by the user. The higher the format number the higher the quality and the larger the file. Not all players can read all 4 formats nor are all books available in all formats. DRM is always employed on audible files even if they are free. Multiple devices can use the same audible file. Any or all of the available formats can be downloaded and the site has a backup of all files owned by the user.
Some Magazines, Newspapers, and radio shows are supported using a subscription plan.
|2||3.7 MB||AM Radio|
|3||7.2 MB||FM Radio|
|4||14.4 MB||MP3 standard|
- Format 4 is roughly the equivalent of MP3 with 128K encoding.
- e - Enhanced Audio for CD quality sound.
Audible has developed a new format (.AAX) aimed at children and audio books that need images for completeness. Enhanced audio is audio embedded with other features, such as images, graphs, maps, or links. The "enhancements" play along on your computer or iPod screen as you listen. At AudibleKids, they have two types of Enhanced Audiobooks: Picture Books and Graphic Novels, both of which have still images that play along with the audiobooks.
Some of the regular audio books are also available in this format. Only certain devices can support this new format and not all enhancements will be supported in all devices. Typically the device needs to be capable of displaying video as well as audio. The Amazon Kindle 2 and Kindle DX support this format. This format is considered experimental and may be subject to change.
The Compact Disk standard specifies the digital format used to record audio on a CD. It uses PCM (Pulse Code Modulation) to record 2 channels on the CD. It is uncompressed and represents each sample with 16 bits of data. The sample frequency is 44.1 KHz which is high enough to be more than twice the highest frequency expected to be recorded. This standard has been around for a long time and has very high quality but does have its limitations. For example, because there is no compression, only the loudest sounds use all 16 bits. As the sound gets softer there are fewer bits used to encode the data. Thus quiet portions could be only encoded with a few bits. One way that CD players get around this problem is to fake additional bits by assuming a smooth transition between volume changes and waveform changes and add a bit (or perhaps 2 or even more) to the least significant portion of the data. With 16 bits each bit represents 6 dB of volume change which is how the (6 x 16) 96 dB of signal to noise ratio is produced.
Typically a CD contains about 60 minutes (give or take) of music but some CD's fudge that amount by squeezing more data on the smaller inner tracks. (The inner tracks are much shorter distance than the outer tracks as the CD spins at a constant rate.) Some quality degradation can be expected but for voice recording as much as 80 minutes can be made to work. These high capacity CDs are often used for audio books.
There are many programs available to convert CD's to a format such as MP3 that is usable on portable eBook reading devices. Compared to CD's an MP3 file is typically 10 times the compression so 10 hours of MP3 could be placed on a CD. CD's are digital devices and can be formatted to accept any standard digital content. It is possible to even mix CD music content with digital content on the same CD.
DVD's are typically used for video/audio but can also be used for any digital data with far more capacity than a CD.
G.711 is a codec that also uses PCM (like CD's) but it is primarily for Voice communication such as Telephony (monophonic). Its formal name is PCM for Voice Frequencies. It can also be used for FAX communication. It has a sampling frequency 8 kHz and a 64 kbit/s bitrate (8 kHz sampling frequency x 8 bits per sample). The audio frequency response is 300–3400 Hz. It is not compressed but there are two extensions: 711.0 which adds lossless compression (2x) and 771.1 which has more frequency response using wider bandwidth. This format is monaural.
The G.711.1 encoder codec can encode signals at 16 kHz with a bandwidth of 50–7000 Hz at 80 and 96 kbit/s, and for 8-kHz sampling the output may produce signals with a bandwidth ranging from 50 up to 4000 Hz, operating at 64 and 80 kbit/s.
The format is used with some video formats to provide the audio component and is thus a required standard in a variety of applications.
Dolby® Digital (AC-3) is an industry-standard surround sound audio codec. It is designed to provide more that 2 channels.
AMR is an proprietary adaptive format that uses variable bitrate and restricted frequency filters to provide support for speech.
Opus which is based on CELP and and Speex is the newest format for use with Video formats in multi-media use. It can smoothly transition from speech quality compression to music compression and has a low latency permitting it to be used for live broadcasts.
Nearly all of the above mentioned sound formats can be used for music although the Audible format is normally not used specifically for music content and some of the formats are specifically target to voice. Generally music needs the higher fidelity range of the format in question while voice requires much less fidelity to be understood but requires the same quality if singing is being preserved.
In addition the above formats there is the MIDI format that is designed specifically for music reproduction. It does not record the sound of music as it is being played but rather records the notes of the music from an electronic instrument. These notes can then be used to play a song similar to the way a player piano can use a recording of the notes to play music on the piano itself although midi uses a synthesized sound.
LRC is a lyric format that can be used with Karaoke devices or when text needs to be synchronized with music or other sounds. It makes use of the fact that sound formats have internal timing available.
SMIL (Synchronized Multimedia Integration Language) is a format that is also used to control the playing of music. It is used in .wpl (Windows Play List) files to support playing more than one file. Many players can use a play list to determine both what files to play and the order that they are to be played in. Of course SMIL is overkill for the the trivial case where you could just list all the filenames in a simple text file to be read by the player. other formats for playlists are also in use. Often the player program generates and reads the playlist using interactive commands by the user.
Music Notation, also known as Sheet Music, can be generated with an appropriate program such as MuseScore which can edit scores and produce music and notation output in various formats including PDF, PNG, SVG and sound formats such as WAV, MP3, OGG, FLAC, MIDI. Both PNG and SVG are useful for eBook formats such as ePub. It can also share output using MusicXML format. More formats are supported including input formats such as MIDI.
 Lossless formats
Most of the formats described above are lossy formats in that there is data that is thrown away as part of the compression format. Presumably this is data that would not be heard anyway but for the highest fidelity there are also lossless formats that can be used. A few of these include: FLAC, ALE, and APE. Of course general compression techniques such as ZIP and RAR could also be used. None of these formats can achieve the amount of compression that is achieved in the more popular formats.
While casual users need not be too concerned with lossless formats, for audiophiles with the right playback equipment these formats will offer distinctly better sound as good a CD 16 bit quality. If 24 bits are used they can be even better than CD.