MPEG-4: The Interactive Revolution - Page 2
The standard also caters for a wider range of devices, operating over varying
communications channels, from broadcast TV and broadband networks down to low
bit-rate networks such as dial-up connections and wireless networks (mobile
telephone). As with the internet, content compilation and composition is performed
at the receiving equipment. Unlike the internet, an MPEG-4 enabled receiving
device is capable of intelligently rendering the presentation depending on
its capabilities or limitations, scaling down content when necessary or even
ignoring it altogether. This means that content needs only to be designed once,
leaving it up to the receiving device to decide what and how to render.
The aim eventually, is that the TV set of the future will be able to handle
interactive multimedia content along with high quality broadcast content. Similarly,
the internet will provide broadcast quality performance and presentations within
its already multimedia rich layouts.
As was stated earlier, the internet uses HTML to describe scene content. HTML
on its own however, only describes the placement of the content and not its
behaviour. Usually, designers need to resort to scripting languages in order
to achieve dynamic, interactive presentations. MPEG-4 provides a much more
versatile language known as Binary Format for Scenes or BIFS, which is more
closely related to the Virtual Reality Modelling Language (VRML), and is widely
used on the internet for 3-D modelling and manipulation of 3-D objects.
BIFS not only offers a way to describe the scene’s contents and their
placement, but also how objects behave in response to user events. It can also
be used to animate objects and change their characteristics dynamically. One
other feature of BIFS, and in keeping with the overall philosophy of the MPEG-4
standard to make the most efficient use of the available bandwidth, is that
it is a binary format and not text as in the case of HTML and VRML, making
it much more compact.
At the heart of any presentation is its content and MPEG-4 is very precise
about the definition of its content. A single element used in a presentation
is referred to as an object. Objects can be combined to produce compound objects,
and a collection of objects and/or compound objects make up a scene.
An object can be natural or synthetic (i.e. computer generated), and includes
still images, audio, video, text, 2D and 3D meshes, and synthetic face and
body objects. Each object is independent of all other objects and can exist
in 2 or 3 dimensional space, including sound. As has been said, objects can
be combined to form new, compound objects such as a human figure with its associated
voice. The designer or author creates a scene by combining as many objects
and compound objects as are required.
An obvious advantage in using objects is reusability, since an object is
defined once but can be used as often as required in a scene, with each instance
having different characteristics such as size or colour. Perhaps the major
advancement however, is the ability for the author to allow the end-user
to interact with the scene’s objects – move them to a different
location, change an object’s characteristics, change viewpoint. For
example, in a hypothetical advertisement, where the latest model four-wheel-drive
is cruising down a serene country road, the user can change the colour of
the car, swap in the 2-door or 4-door model in the scene, add or remove roof-racks,
or even change the background scenery from an autumn to a summer’s
day.