----------------------------------------------------------------------------
                                                        The Florida SunFlash

                   Multimedia: Audio (4 of 6)

SunFLASH Vol 40 #28					          April 1992 
----------------------------------------------------------------------------
                             4 Audio

Audio plays an important role in multimedia applications. When a
service representative adds a voice note to a credit record, when
executives hold a video conference, when travelers listen to voice mail
or have their email read to them over the phone, or when new employees
complete training modules, they all use desktop audio.

                       Audio Applications

Audio can be used in many applications, including voice annotation,
voice conferencing, voice mail, training and presentations,
text-to-speech, and speech recognition. The following paragraphs
describe some of these applications.

                        Voice Annotation

A voice annotation application enables audio comments to be added to
documents, data base records, and so on. These applications can be
built using fairly simple record and playback capabilities. For
example, someone records a voice comment and attaches it to a certain
spot in a document. The document then displays an indicator, showing
that an audio note is attached. While viewing the document, the reader
can select the indicator to play back the audio recording. Voice
annotation applications may also provide some simple audio editing
capabilities for message creation.

                       Voice Conferencing

Voice conferences enable people to speak to each other in real-time
over the network. Voice conferencing is an alternative to video
conferencing since both parties do not always have to see each other.
Often, voice conferencing would be used in a collaborative environment,
where two people can sit at their workstations, look at the same
document at the same time, and make verbal comments about it.

                           Voice Mail

Voice mail provides a way to send and receive voice-quality audio
recordings in a multimedia email message. Audio messages can be
recorded, sent as attachments to a multimedia email message, and played
back by the recipient.  With the integration of telephony into the
desktop, messages could be recorded automatically by a
telephone-answering application, and forwarded as email to the
telephone owner.

                   Training and Presentations

Audio can be very effective in training applications. It can provide a
soundtrack for video segments, or for illustrations of any form. It can
also provide help and feedback to the student with a more personal
feel, and without interrupting the student's focus. Audio can also
provide richer, more interesting, and more effective presentations.
Audio in combination with other multimedia technologies enables authors
to create presentations of a quality that meet the expectations of
today's consumers.

                         Text-to-Speech

Text-to-speech technology enables information stored as text to be
converted to speech - effectively to be read aloud.   Applications can
use text-to-speech to provide verbal help, or to read your calendar,
address file information, email message or other information. When
integrated with telephony capabilities, text-to-speech technology can
provide remote access to information. For example, you could telephone
your computer and have it read your mail or appointments. This
technology can also provide spoken desktop messages, such as a reminder
of a pending appointment, without imposing a pop-up window into the
middle of your current work. These desktop alerts could include
announcing an incoming email message or, with telephone integration, an
incoming phone call. They can also provide spoken status messages or
warnings from the system.

                       Speech Recognition

Speech recognition enables you to speak to your computer through a
microphone or telephone. The workstation translates the voice input
into text that the system can understand. This technology enables the
use of voice as an additional input channel to supplement the keyboard
and mouse. For example, radiologists save valuable time by dictating
their x-ray reports into the computer for immediate viewing. They can
be edited by voice or by keyboard.  You could also give commands to
open and close windows or start applications without moving away from
your current work. With telephony integration, speech recognition would
enable you to give commands to your workstation verbally over the
telephone, for example, "Read the headers of any mail messages from
William Tell." Speech recognition technology in the near future will be
limited in the size of its vocabulary, and will typically require you
to train your system to your own voice. Eventually, these restrictions
should disappear.

                       Key Audio Concepts

Multimedia audio applications depend on the interaction of a number of
variables such as the type of audio and how it can be digitized,
edited, stored, and played back. The following paragraphs describe some
of these key concepts.

Types of Audio

Workstations generally support two types of audio input and output:

   o    Music quality audio (often called CD-quality audio or 16-bit audio)
   o    Voice quality audio (also known as telephone quality or 8-bit audio)

MIDI data, which specifies music, is also often included in the audio
category.

                        CD-quality Audio

CD-quality audio requires both higher sample rates and greater sampling
precision (more bits of data), thus making greater storage and
processing demands on the workstation. Today, applications requiring
CD-quality audio are found primarily in the music industry. The use of
CD-quality audio for business training and presentations, while
currently limited, is expected to expand considerably throughout the
corporate marketplace. CD-quality audio is typically input from a CD
player or DAT (Digital Audio Tape) player, and is output through a
high-quality speaker.

                       Voice-quality Audio

Voice-quality audio can reproduce the comparatively limited dynamic
range of the human voice. Voice-quality audio is standard on every Sun
desktop workstation, enabling multimedia applications ranging from
electronic voice mail to voice annotation of documents to voice control
of your workstation.  Voice-quality audio is commonly input from a
microphone or over a telephone, and can be output through a speaker
built into or attached to the workstation, or using a telephone handset
or speaker phone.

MIDI

MIDI (Musical Instrument Digital Interface) is a note-oriented control
language for specifying music. MIDI data consists of codes specifying
notes and timing.  These codes can be generated by or output to
MIDI-compatible devices such as keyboards or synthesizers. MIDI
applications are generally found in the computer music industry, used
for studio control and audio production.

                 Audio Editing and Manipulation

You can perform various operations to audio data stored in a file in
addition to playing it back. Probably the most common operation is to
edit the audio data.  Programs that do audio editing typically generate
a display of the waveform representing the data, and then enable you to
specify sections of data to cut out or relocate. Editing can be used to
isolate segments of interest (for example, to create a "sound bite"),
or remove leading or trailing noise, silence, or pauses.

Another common operation is the mixing of sound files, for example to
combine a voice overlay on top of a music background for a training
application or in a presentation.

                         Audio Playback

Playing back stored audio data requires regenerating the analog audio
signal from the digital data. This is done by a digital to analog
converter or DAC. The analog signal can then be output to a speaker
built into or attached to your workstation, to the speaker in a
telephone handset, or to a speakerphone.

                 Capturing and Digitizing Audio

Sound, or audio, is analog data.   To store, manipulate, and enhance it
using a computer, it must be digitized - converted to a
computer-readable format.  Audio starts as a complex analog waveform
coming from some form of input device, such as a microphone, telephone
handset, or CD player connected to your workstation. An audio signal is
characterized by its bandwidth, the highest frequency in cycles per
second or hertz (Hz) that can be represented in the waveform.
Digitizing this signal involves two processes, sampling and
quantization. These functions are generally performed by a chip known
as an analog-to-digital converter, or ADC. Today the ADC and its
counterpart, the DAC, are sometimes combined into a single chip called
a Coder-Decoder or CODEC. The quality of audio a workstation supports
is primarily determined by the capabilities of the ADC and DAC
components.

                       Audio Data Storage

Once the audio input stream has been captured and digitized, it can be
stored in a data file for later playback or for editing or other
processing. Even voice- quality audio is data intensive; one minute of
voice-quality audio on a SPARCstation takes almost half a megabyte of
storage space. One minute of uncompressed CD-quality audio (16-bit 44.1
Khz stereo) would require close to 10 Mbytes of storage space.

Besides the raw data, you also need to store information about the
data, such as its sampling rate, the number of bits per sample, and the
encoding algorithm used. This information is necessary in order to be
able to reproduce the original signal. Thus, audio data is commonly
stored in files with a special format that includes this data, often in
some sort of header structure. This often requires special routines to
write the data to these files and to read it back properly.

                       Multi-Channel Audio

Many workstations, such as today's SPARCstation family, support one
channel of audio, or monaural sound. Multiple channels are also
possible. Supporting two channels (stereo) requires two input and two
output ports, independent ADC/DAC components for each data stream (or
components designed to handle two channels), and a data representation
format for the storage of multiple channels of data.

                           Challenges

There are still some challenging issues to tackle before audio will
become commonplace on the desktop. One of the most significant is the
development of more effective ways to handle the volume of data that
audio involves.  Development of compression algorithms to minimize
storage space and network bandwidth to allow transmission across
computer networks is an area for further research.

Ongoing research in the area of text-to-speech and speech recognition
is another challenge. More human-sounding speech generation, and more
flexible and accurate speech recognition are important goals for the
future.