Article: 17005 of comp.compression
From: pacalet@elec.enst.fr (Renaud Pacalet)
Newsgroups: comp.compression
Subject: Re: What is Discrete Cosine Transforms ?
Date: 12 Jan 1995 10:06:16 GMT
Organization: Ecole Nationale Superieure des Telecommunications, Paris France

[ The following post was made in response to the question what is
  a "discrete cosine transform" and how is it used in JPEG for compression? ]

Sorry in advance for my english, etc. I will try to make it understandable.
In JPEG, MPEG-I, MPEG-II and H261, the DCT is a two-dimentional one, applied
on 8x8 blocks of pixels. The result of a DCT on such a block is a 8x8 block
of coefficients. It can be described as a matrix product:

F = TC * X * C

Where X is the original pixel block, F is the DCT result of X, C and TC are
8x8 matrix of constants. Let A(i,j) be the element of the matrix A that is
located on row i and column j. The matrix C is defined by:

C(i,j) = K(j) / 2 * cos((2 * i + 1) * j * PI / 16)
where K(0) = sqrt(1 / 2)
      K(j) = 1 when j != 0

TC(i,j) = C(j,i)

The DCT is said to be an orthogonal transforme because TC * C = I (and
C * TC = I), where I(i,i) = 1 and I(i,j) = 0 when i != j.

So, as you can see,

X = C * F * TC

This second transform is the inverse DCT (IDCT). The original form of block X
can be restored from F, its DCT form. In fact, this is not true, because the
DCT coefficients are usually rounded to integer values and the rouding noise
makes the restored block be different from the original one. This transform is
not lossless. But, as you can be as accurate as you want in your computations
and keep as many digits as you need in the result, this loss may be considered
as neglictable (and it gives you the "lossless" version of JPEG).

What is the DCT used for ? Compression. This may look strange to you because
starting from a 8x8 block of natural values range 0 to 255, all you gain with
DCT is another 8x8 block of integer values range -721 to 721 ! Where is the
compression ? Well, consider the DCT as the real part of a Fourrier transform.
Your pixel block comes from the picture domain. The DCT form of the same block
belongs to the frequency domain. In the upper left corner area of the DCT
block you find the low frequency coefficients. In the lower right corner area
you find the high frequency coefficients. That is, if your pixel block is from a
uniform area of the picture, the DCT form will have only one non-zero
coefficient in the upper left corner. If it comes from a part of the picture
with no high frequencies (a very common situation in real pictures) most of
the coefficients will have zero value. This fact is the first compression in
DCT based coding schemes because when transferring (or storing) the transformed
picture, you dont scan it in a usual way:

 0  1  2  3  4  5  6  7
 8  9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63

but in something like:

 0  1  5  6 14 15 27 28
 2  4  7 13 16 26 29 42
 3  8 12 17 25 30 41 43
 9 11 18 24 31 40 44 53
10 19 23 32 39 45 52 54
20 22 33 38 46 51 55 60
21 34 37 47 50 56 59 61
35 36 48 49 57 58 62 63

It gives you a 64 long series of values with a lot of consecutive zeros by the
end. Such series are efficiently encoded with a run-length method.

Take as an example the following DCT block:

129  12   9   0   0   1   0   1
 28   8  34   5   1   0   0   0
 17   0   0   2   0   0   0   0
  6   1   0   0   0   0   1   0
  0   2   0   0   1   0   0   0
  0   0   0   0   1   0   0   0
  0   0   4   0   0   7   0   0
  8   1   5   0   1   0   0   0

It becomes a 24 long series of events (an event beeing a non-zero value and a
number of zero values):

(129,0) (12,0) (28,0) (17,0) (8,0) (9,1) (34,1) (6,1) (1,1) (5,1) (1,0) (1,0)
(2,1) (2,8) (1,6) (8,0) (1,0) (4,1) (1,4) (1,1) (1,1) (5,7) (7,0) (1,6)

that you can encode again using a Huffman like method.

The main idea is to optimize the encoding of (almost) uniform areas of the
picture.

In a second (lossy) processing you can gain even more. The human eye beeing not
so sensitive to high frequencies than to low frequencies, you can apply a strong
quantization transform to high frequencies (forcing them to zero and increasing
the number of consecutive zeros in the final series, making the run-length
encoding even more efficient). This is the second idea: the DCT makes it easier
to remove useless informations from a picture. In our example, after
quantization has been applied, we could have a 13 long series of events, with
almost no visible difference for a human eye:

129  12   9   0   0   0   0   0
 28   8  34   5   1   0   0   0
 17   0   0   2   0   0   0   0
  6   1   0   0   0   0   0   0
  0   2   0   0   0   0   0   0
  0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0

(129,0) (12,0) (28,0) (17,0) (8,0) (9,1) (34,1) (6,1) (1,1) (5,2) (1,0) (2,1)
(2,44)

For instance, if your picture represents a guy wearing a striped suit with very
narrow black and white lines on it (this is a high frequency area of the
picture), you can replace the suit by a uniform grey one, very easy to encode
and it will make no difference to you, espacially if your TV set is that one
your grand mother gave to you years ago.

In MPEG, the DCT is applied after differential coding has been applied to the
blocks (this is called motion compensation), so the "pixels" are between -255
and 255. The motion compensation reduces the amount of informations to be
coded by using informations of a previous picture (and even following picture
in some modes, amazing, dont you think ?) in the stream.

What else to say ? The DCT is not the best transform we could use for the
purpose of compression but it's easy to implement in hardware devices.

Sorry if this is too long, hope it helps anyway.

+-------------------------------------------+-----------------------+
| Renaud Pacalet                            | Mes idees et opinions |
| Departement Electronique, Ecole Nationale | n'engagent que moi.   |
| Superieure des Telecommunications         | Elles ne constituent  |
| 46, Rue Barrault 75634 Paris Cedex 13     | en aucun cas le       |
| Tel : +33 1 45 81 78 08                   | reflet d'une          |
| Fax : +33 1 45 80 40 36                   | politique de mon      |
| Email : pacalet@elec.enst.fr              | employeur.            |
+-------------------------------------------+-----------------------+