IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

MPEG-4 Overview


This section provides a high-level feature analysis, or executive overview, of MPEG-4 as compared to other media technologies and standards in use in the industry today.

An official overview of MPEG-4, which is full of detail from the ISO/IEC standards body, can be read in a document created by the MPEG group.

MPEG-4 follows on from the very successful MPEG-1 and MPEG-2 standards. MPEG-1 and MPEG-2 are video and audio compressions for CD quality and broadcast quality video/audio content. MPEG-1 audio includes the popular mp3 compression; MPEG-2 is used for digital TV and on DVDs.

MPEG-4 built on that very successful compression work and introduced new audio and video compression technologies which can scale from very low bit rate Internet type applications to high quality TV broadcast and studio applications. In addition the MPEG-4 standard includes MPEG-4 Systems which can describe complete dynamic, interactive intelligent, animated 2D and 3D content that can include this video and audio. And with this MPEG-4 moved beyond simple video/audio only content to allow engaging, immersive rich-media applications.


MPEG-4 Introduction

MPEG-4 is an International Standard developed by ISO/IEC. The standard itself, ISO/IEC 14496, comprises several parts, including reference software and conformance parts. The parts that are of most interest are the first three parts.

  • ISO/IEC 14496-1: Systems
  • ISO/IEC 14496-2: Visual
  • ISO/IEC 14496-3: Audio

Each part is self-contained in that the technologies can be used on their own. The Systems part, although it can be used standalone, is more often used to integrate the visual and audio functions into one seamless composite media presentation.

The Visual and Audio parts are fairly self-explanatory. The Visual part contains visual technologies such as MPEG-4 video, as well as other visual technologies such as still image texture, compressed mesh and face and body animation. The video covers a wide range of applications from low-bit rate suitable for low-complexity mobile devices, to broadcast TV quality (overlap with MPEG-2) right up to studio applications with very high quality/resolution. The Audio part contains audio codecs covering a wide range of audio applications from very low-bit rate speech to high-quality music. It also has synthetic audio and text to speech. Audio objects can be AAC, TwinVQ, CELP, HVXC (parametric speech), TTSI (text to speech), Main synthetic, Wavetable synthesis, General Midi, Algorithmic Synthesis and Audio FX plus error resilient flavors of the above including scalable AAC.

The Systems part contains description and control frameworks for the composite scene presentation as well as for the fundamental elementary streams (audio, video etc) that can go into making up that scene. The scene takes the form of a tree-like description that relates objects to one another hierarchically as well as providing links to the media in the elementary stream framework that are to be rendered. The elementary streams can contain media, such as video or audio, as well as a number of other types. The scene was based on VRML (ISO/IEC 14472) with additions for 2D and other animation and streaming extensions (VRML being a static non-streamed scene). Systems also contains the mp4 file format for storage and exchange of mp4 content; it defines MPEG-J interfaces so that Java byte code forming so-called MPEG-lets can be executed and can interact with the scene; and it defines the eXtensible Mpeg Textual format (XMT), and XML based language designed for MPEG-4 systems for the express purposes of authoring, machine creation and interchange of content.

There are two other parts of the standard that may be of interest:

  • ISO/IEC 14496-6: DMIF
  • ISO/IEC 14496-8: Carriage of ISO/IEC 14496 contents over IP networks

DMIF, is Delivery Multimedia Integration Framework, and provides an abstraction to a delivery interface for the purposes of specification; MPEG-4 has been designed to be transport independent and so does not specify networks or network protocols. MPEG has however generated a framework document for carriage of MPEG-4 content over IP. This work has been in conjunction with the IETF and a number of RFCs have been created to specify payload formats etc.

And finally one new part under development is a new video codec. This is joint work with the ITU who were defining an H.26L codec (follow on beyond H.261 and H.263). The work is being done by Joint Video Taskforce ( JVT) working group and will become a new MPEG-4 video standard as part 10, i.e. ISO/IEC 14496-10 and is called Advanced Video Coding. The specification is technically complete and thus, as part of the standardization process, is technically frozen apart from necessary corrections. Corrections, review and voting cycles for the national bodies will mean that the standard will be finally published, and be publicly available, after February 2003.


MPEG-4 Profiles and Levels

MPEG as a standards organization does not specify end-user product or equipments. MPEG standardizes what it calls tools that can then be selected and used to build products. Tools for video codecs would be advanced motion vectors, ¼ pel compensation etc, for Systems these are individual nodes that represent the scene, individual commands etc.

What MPEG does standardize though are Profiles. A Profile is a selection of tools that a group of participating companies within the standard have selected as a basis for deploying products to meet specified application areas. To be standardized the Profiles pass through a requirements process where the tools and applications are reviewed and voted on and if there are sufficient supporting companies the profile can be standardized as being an interoperable profile for the industry.

Within each profile there can be one or more levels. Levels allow for increasing complexity of the tools to allow some diversity within a profile in addressing devices of varying performance. Levels may thus restrict bit-rates, size, number of nodes etc. The restrictions being more at one end of the level scale than the other; and at the high end there may even be no restriction.

The wide application range for MPEG-4 can be seen in the names of the profiles.

MPEG-4 Systems is broken down into 4 profile sets. Two profile sets for the scene, one set for the Object Descriptor (OD) framework describing the elementary streams, and one set for MPEG-J. Audio and Visual just have one set of profiles each.

There are two profiles for MPEG-4 systems covering the scene and these are the SceneGraph profiles, which contain mainly the tools forming the structure of the tree, and the Graphics profile that contain the renderable tools such as Circle, Rectangle, Text etc.

The Systems SceneGraph Profiles specified are:

  1. Simple 2D
  2. Audio
  3. 3D Audio
  4. Basic 2D
  5. Core 2D
  6. Main 2D
  7. Advanced 2D

and the Systems Graphics Profiles are:

  1. Simple2D
  2. Simple 2D + Text
  3. Core 2D
  4. Advanced 2D

For Audio the following profiles are defined (some have up to 8 levels):

  1. Main
  2. Scalable
  3. Speech
  4. Synthesis
  5. High Quality Audio
  6. Low Delay Audio
  7. Natural Audio
  8. Mobile Audio Internetworking

And for Visual the following:

  1. Simple
  2. Simple Scalable
  3. Core
  4. Main
  5. N-Bit
  6. Hybrid
  7. Basic Animated Texture
  8. Scalable Texture
  9. Simple Face Animation
  10. Simple FBA
  11. Advanced Real Time Simple
  12. Core Scalable
  13. Advanced Coding Efficiency
  14. Advance Core Profile
  15. Advanced Scalable Texture
  16. Simple Studio
  17. Core Studio
  18. Advanced Simple
  19. FGS

It is expected and highly desired, although not required, that industry groups building products select tools by selecting one or more profiles and levels as standardized by MPEG. Further restriction of the profiles is acceptable practice and so, for example, ISMA have selected Simple Profile visual but restricted the max bit-rate to 64kbps and only allowed one video object to be coded within the stream.


Patents and Licensing

Companies can bring technologies to MPEG on which they have patents, and the technologies may be selected for inclusion into the standard. Each part of the standard lists the companies who have provided patent statements to the standards organization. Patents would generally cover certain tool(s) and may cover encoding and/or decoding processes. MPEG however only standardizes decoders, so that the specification, the conformance, and the reference bit streams are all for decoders. By not standardizing encoders this allows their implementation to vary so long as they produce conformant bit streams that can therefore be decoded by conformant decoders. Patents can still cover either or both aspects though.

When patented tools/technologies are accepted into the standard the company(s) in question are required to provide licenses for any patents reading on those tools under reasonable and non-discriminatory terms.

Profiles were discussed above and these were described as a set of tools selected from the standard to address particular application area(s). When choosing a particular profile there may be patents on the tools therein so that licensing is required.

So how to get a license for use? A convenient way for essential patents is to go to a licensing administration company that has set up a license pool for essential patents covering those technologies. For MPEG Visual and Systems licensing the MPEG-LA is such an administration. And as of June 2003 Via Licensing is providing licensing for MPEG-4 audio and has begun the process for AVC (Advanced Video Coding).

To give an idea of how this comes about, for example, under MPEG-LA any company believing it holds relevant essential patents is invited to submit them and for a fee they are evaluated by independent experts to determine their essentiality. The resultant companies, that are determined to have essential patents, form a pool under the management of the MPEG-LA. Terms and conditions for the licensing are then worked out amongst those companies.

Note that the presence of these licensing pools does not preclude negotiations being held individually with each of the companies holding patents. So if a company wants to ship product they can either negotiate with the individual companies involved, or more straightforwardly, license the technologies from a relevant licensing administration where one exists.

See the MPEG-4 Industry Forum site for further comprehensive information on patents and licensing.




    About IBMPrivacyContact