The Cooley-Tukey algorithm for the discrete Fourier transform is an example. It is based on a factorization algorithm by Good:
where the Ai are very sparse matrices.
This reduces the number of operations to
.
We illustrate this by the 8-point Walsh transform (
Orthogonal Functions), which uses the same algorithm with different coefficients:
with g = W8 f where f and g are 8-vectors. If this transformation were to be carried out in a straightforward way, 64 additions or subtractions would be necessary. Good's sparse matrix factorization for this case reads W8 = A1 A2 A3, with the definitions
In the first step, only sums and differences of neighbouring pixels are formed. They are then used in the second step to produce expressions of four pixels, etc. Only three steps are necessary to obtain the entire transform:
The following signal flowchart shows the three steps; solid and dashed lines indicate additions and subtractions, respectively:
Blind execution needs 64(N2) additions or subtractions (plus a small number of multiplications by 2 or
).
Suppressing all zero values, this is reduced to
.
The corresponding signal flowchart shows that only 14, or more generally,
additions or subtractions are necessary for its computation (remember that N is a power of 2). This is by far the fastest of all unitary transforms.
For the Fourier transform, these operations are complex, for all the others real. Because the non-sinusoidal Walsh and Haar transform matrices consist only of +1, -1 and zero, or simple multiples thereof, only additions or subtractions have to be executed. Comparing complex multiplication with real addition, and considering the necessary high precision for the Fourier kernels
, the difference in processing time between Fourier and Walsh or Haar transforms is evidently substantial.
For
e.g. [Beauchamp87], [Kunt80]more details and references ,
[Pratt78], or [Ahmed75].