New matrix formulation for two-dimensional DCT/IDCT computation and its distributed-memory VLSI implementation
A direct method for the computation of 2-D DCT/IDCT on a linear-array architecture is presented. The 2-D DCT/IDCT is first converted into its corresponding 1-D DCT/IDCT problem through proper input/output index reordering. Then, a new coefficient matrix factorisation is derived, leading to a cascade of several basic computation blocks. Unlike other previously proposed high-speed 2-D N×N DCT/IDCT processors that usually require intermediate transpose memory and have computation complexity O(N 3), the proposed hardware-efficient architecture with distributed memory structure has computation complexity O(N 2 log2 N) and requires only log2 N multipliers. The new pipelinable and scalable 2-D DCT/IDCT processor uses storage elements local to the processing elements and thus does not require any address generation hardware or global memory-to-array routing.