Area-efficient and scalable solution to real-data fast fourier transform via regularised fast Hartley transform
A highly parallel hardware solution to the regularised fast Hartley transform (FHT), an algorithm developed only recently for the efficient computation of the discrete Hartley transform (DHT) and hence of the real-data fast Fourier transform (FFT), is discussed. A drawback of conventional FHT algorithms lies in the loss of regularity arising from the need for two types of ‘butterfly’ for catering for fixed-radix formulations. A generic radix-4 double butterfly was therefore developed for the regularised FHT which overcame the problem in an elegant fashion. An architecture for the parallel computation of the generic double butterfly and the resulting regularised FHT is now discussed in some detail; this exploits a single high-performance processing element (PE) which yields an attractive solution, particularly when implemented with field-programmable gate array (FPGA) technology, which is both area-efficient and scalable in terms of transform length. High performance is achieved by having the PE able to process the input/output data sets to the double butterfly in parallel, this in turn implying the need to be able to access simultaneously, and without conflict, both multiple data and multiple twiddle factors, or coefficients, from their respective memories. The resulting area-efficient and scalable single-PE architecture is shown to yield a generic solution to the real-data FFT which is capable of achieving the throughput of the most advanced commercially available solutions for just a fraction of the silicon area and, for new applications, at negligible re-design cost.