|
What is
H.264?
H.264 is an industry standard for video compression, the
process of converting digital video into a format that
takes up less capacity when it is stored or transmitted.
Video compression (or video coding) is an essential
technology for applications such as digital television,
DVD-Video, mobile TV, videoconferencing and internet
video streaming. Standardizing video compression makes
it possible for products from different manufacturers
(e.g. encoders, decoders and storage media) to
inter-operate. An encoder converts video into a
compressed format and a decoder convert’s compressed
video back into an uncompressed format.
Recommendation H.264: Advanced Video Coding is a
document published by the international standards bodies
ITU-T (International Telecommunication Union) and ISO/IEC
(International Organization for Standardization /
International Electrotechnical Commission). It defines a
format (syntax) for compressed video and a method for
decoding this syntax to produce a displayable video
sequence. The standard document does not actually
specify how to encode (compress) digital video – this is
left to the manufacturer of a video encoder – but in
practice the encoder is likely to mirror the steps of
the decoding process. Figure 1 shows the encoding and
decoding processes and highlights the parts that are
covered by the H.264 standard.
The H.264/AVC standard was first published in 2003. It
builds on the concepts of earlier standards such as
MPEG-2 and MPEG-4 Visual and offers the potential for
better compression efficiency (i.e. better-quality
compressed video) and greater flexibility in
compressing, transmitting and storing video.

2 How does an H.264 codec work?
An H.264 video encoder carries out prediction, transform
and encoding processes (see Figure 1) to produce a
compressed H.264 bit stream. An H.264 video decoder
carries out the complementary processes of decoding,
inverse transform and reconstruction to produce a
decoded video sequence.
2.1 Encoder processes
Prediction
The encoder processes a frame of video in units of a
Macro block (16x16 displayed pixels). It forms a
prediction of the macro block based on previously-coded
data, either from the current frame (intra prediction)
or from other frames that have already been coded and
transmitted (inter prediction). The encoder subtracts
the prediction from the current macro block to form a
residual1.
The prediction methods supported by H.264 are more
flexible than those in previous standards, enabling
accurate predictions and hence efficient video
compression. Intra prediction uses 16x16 and 4x4 block
sizes to predict the macro block from surrounding,
previously-coded pixels within the same frame (Figure
2).
Inter prediction uses a range of block sizes (from 16x16
down to 4x4) to predict pixels in the current frame from
similar regions in previously-coded frames (Figure 3).


Transform and quantization
1 Finding a suitable inter prediction is often described
as motion estimation. Subtracting an inter prediction
from the current macro block is motion compensation.
A block of residual samples is transformed using a 4x4
or 8x8 integer transform, an approximate form of the
Discrete Cosine Transform (DCT). The transform outputs a
set of coefficients, each of which is a weighting value
for a standard basis pattern. When combined, the
weighted basis patterns re-create the block of residual
samples. Figure 4 shows how the inverse DCT creates an
image block by weighting each basis pattern according to
a coefficient value and combining the weighted basis
patterns.
The output of the transform, a block of transform
coefficients, is quantized, i.e. each coefficient is
divided by an integer value. Quantization reduces the
precision of the transform coefficients according to a
quantization parameter (QP). Typically, the result is a
block in which most or all of the coefficients are zero,
with a few non-zero coefficients. Setting QP to a high
value means that more coefficients are set to zero,
resulting in high compression at the expense of poor
decoded image quality. Setting QP to a low value means
that more non-zero coefficients remain after
quantization, resulting in better decoded image quality
but lower compression.

Bitstream encoding
The video coding process produces a number of values
that must be encoded to form the compressed bitstream.
These values include:
quantized transform coefficients
information to enable the decoder to re-create the
prediction
information about the structure of the compressed data
and the compression tools used during encoding
information about the complete video sequence.
These values and parameters (syntax elements) are
converted into binary codes using variable length coding
and/or arithmetic coding. Each of these encoding methods
produces an efficient, compact binary representation of
the information. The encoded bitstream can then be
stored and/or transmitted.
2.2 Decoder processes
Bitstream decoding
A video decoder receives the compressed H.264 bitstream,
decodes each of the syntax elements and extracts the
information described above (quantized transform
coefficients, prediction information, etc). This
information is then used to reverse the coding process
and recreate a sequence of video images.
Rescaling and inverse transform
The quantized transform coefficients are re-scaled. Each
coefficient is multiplied by an integer value to restore
its original scale2. An inverse transform combines the
standard basis patterns, weighted by the re-scaled
coefficients, to re-create each block of residual data.
These blocks are combined together to form a residual
macroblock.
Reconstruction
For each macroblock, the decoder forms an identical
prediction to the one created by the encoder. The
decoder adds the prediction to the decoded residual to
reconstruct a decoded macroblock which can then be
displayed as part of a video frame.
3 H.264 in practice
3.1 Performance
Perhaps the biggest advantage of H.264 over previous
standards is its compression performance. Compared with
standards such as MPEG-2 and MPEG-4 Visual, H.264 can
deliver:
Better image quality at the same compressed bitrate, or
A lower compressed bitrate for the same image quality.
For example, a single-layer DVD can store a movie of
around 2 hours’ length in MPEG2 format. Using H.264, it
should be possible to store 4 hours or more of
movie-quality video on the same disk (i.e. lower bitrate
for the same quality). Alternatively, the
H.264 compression format can deliver better quality at
the same bitrate compared with MPEG-2 and MPEG-4 Visual
(Figure 5).
2 This is often described as inverse quantization but it
is important to note that quantization is not a
fully-reversible process. Information removed during
quantization cannot be restored during re-scaling.
The improved compression performance of H.264 comes at
the price of greater computational cost. H.264 is more
sophisticated than earlier compression methods and this
means that it can take significantly more processing
power to compress and decompress H.264 video.

3.2 Applications
As well as its improved compression performance, H.264
offers greater flexibility in terms of compression
options and transmission support. An H.264 encoder can
select from a wide variety of compression tools, making
it suitable for applications ranging from low-bitrate,
low-delay mobile transmission through high definition
consumer TV to professional television production. The
standard provides integrated support for transmission or
storage, including a packetised compressed format and
features that help to minimize the effect of
transmission errors.
H.264/AVC is being adopted for an increasing range of
applications, including:
l
High Definition DVDs (HD-DVD and Blu-Ray formats)
l
High Definition TV broadcasting in Europe
l
Apple products including iTunes video downloads, iPod
video and MacOS
l
NATO and US DoD video applications
l
Mobile TV broadcasting
l
Internet video
l
Videoconferencing
|
H.264 Profiles
|
|
The standard includes the following seven sets
of capabilities, which are referred to as
profiles, targeting specific classes of
applications:
|
|
l
Baseline Profile (BP): Primarily for lower-cost
applications with limited computing resources,
this profile is used widely in videoconferencing
and mobile applications.
|
|
l
Main Profile (MP): Originally intended as the
mainstream consumer profile for broadcast and
storage applications, the importance of this
profile faded when the High profile was
developed for those applications.
|
|
l
Extended Profile (XP): Intended as the streaming
video profile, this profile has relatively high
compression capability and some extra tricks for
robustness to data losses and server stream
switching. |
|
l
High
Profile (HiP): The primary profile for broadcast
and disc storage applications, particularly for
high-definition television applications (this is
the profile adopted into
HD DVD
and
Blu-ray
Disc, for example). |
|
l
High 10 Profile (Hi10P): Going beyond today's
mainstream consumer product capabilities, this
profile builds on top of the High Profile
?adding support for up to 10 bits per sample of
decoded picture precision. |
|
l
High 4:2:2 Profile (Hi422P): Primarily targeting
professional applications that use interlaced
video, this profile builds on top of the High 10
Profile ?adding support for the 4:2:2 chroma
sampling format while using up to 10 bits per
sample of decoded picture precision.
|
|
l
High 4:4:4 Predictive Profile (Hi444PP): This
profile builds on top of the High 4:2:2 Profile
?supporting up to 4:4:4 chroma sampling, up to
14 bits per sample, and additionally supporting
efficient lossless region coding and the coding
of each picture as three separate color planes.
|
|
In addition, the standard now contains four
additional all-Intra profiles, which are defined
as simple subsets of other corresponding
profiles. These are mostly for professional
(e.g., camera and editing system) applications:
l
High 10 Intra Profile: The High 10 Profile
constrained to all-Intra use.
l
High 4:2:2 Intra Profile: The High 4:2:2 Profile
constrained to all-Intra use.
l
High 4:4:4 Intra Profile: The High 4:4:4 Profile
constrained to all-Intra use.
l
CAVLC 4:4:4 Intra Profile: The High 4:4:4
Profile constrained to all-Intra use and to
CAVLC entropy coding (i.e., not supporting CABAC). |
|
H.264 profile |
Baseline |
Extended |
Main |
High |
High 10 |
High 4:2:2 |
High 4:4:4
Predictive |
|
I and P Slices |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
|
B Slices |
No |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
|
SI and SP Slices |
No |
Yes |
No |
No |
No |
No |
No |
|
Multiple Reference Frames |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
|
In-Loop Deblocking Filter |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
|
CAVLC Entropy Coding |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
|
CABAC Entropy Coding |
No |
No |
Yes |
Yes |
Yes |
Yes |
Yes |
|
Flexible Macroblock Ordering (FMO) |
Yes |
Yes |
No |
No |
No |
No |
No |
|
Arbitrary Slice Ordering (ASO) |
Yes |
Yes |
No |
No |
No |
No |
No |
|
Redundant Slices (RS) |
Yes |
Yes |
No |
|
|
|
|
|
Profile Name |
Description about tools |
Applications |
|
Baseline profile |
l
B-picture type prediction is not allowed.
l
CABAC is not allowed.
l
Weighted prediction is not allowed.
l
Error resilience is allowed
|
Telephony
Applications |
|
Extended profile |
l
B-picture type prediction is allowed.
l
Switching slice feature
l
The rest are similar to Baseline profile.
|
Streaming
Applications |
|
Main profile |
l
Inclusive I-, P- and B-picture types
l
Error resilience is not allowed.
l
CABAC is allowed.
|
General AV
Applications |
|
High profile |
8x8 DCT is allowed.
The rest are similar to Main profile.
|
Blu-ray DVD,
HD-DVD |
|
High 10 profile |
10-bit accuracy coding instead of 8-bit
Industrial
|
Industrial use |
|
High 4:2:2 profile |
4:2:2 video format instead of 4:2:0
|
Professional use |
|
High 4:4:4 profile |
4:4:4 video format instead of 4:2:0
Loss-less coding inclusive
|
Professional use |
Now, Video content is rapidly transitioning from
standard definition to high definition, and is
increasingly being distributed over IP networks. To
enable higher quality video with faster delivery and
reduced storage requirements, the H.264 video standard
has been developed to provide over twice the compression
ratio of MPEG-2. as the pioneer in the IP surveillance
technology development, Multipix developed the world 1st
HD IP camera via H.264 standard - The
MP264HD IP camera series.
|