Radix-43 based two-dimensional FFT architecture with efficient data reordering scheme

Radix-43 based two-dimensional FFT architecture with efficient data reordering scheme

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Computers & Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Multi-dimensional Discrete Fourier Transforms (DFTs) play an important role in signal and image processing applications. Image reconstruction is a key component in signal processing applications like medical imaging, computer vision, face recognition etc. Two dimensional fast Fourier Transform (2D FFT) and Inverse FFT plays vital role in reconstruction. In this paper we present a fast 64 × 64 point 2D FFT architecture based on radix-43 algorithm using a parallel unrolled radix-43 FFT as the basic block. Our radix-43 architecture is a memory optimized parallel architecture which computes 64-point FFT, with least execution time. Proposed architecture produces reordered output of both 64-point one dimensional (1D) FFT and 64 × 64 point 2D FFT, without using any additional hardware for reordering. The proposed architecture has been implemented in UMC 40nm CMOS technology with clock frequency of 500 MHz and area of 0.841 mm2. The power consumption of proposed architecture is 358 mW at 500 MHz. Energy efficiency (FFTs computed per unit of energy) is 341 points/Joule. Computation time of 64 × 64 point FFT is 8.19 μs. ASIC implementation results shows better performance of proposed work in terms of computation time when compared with state-of-art implementation. Proposed architecture has also been implemented in Virtex-7 FPGA which gives comparable area.


    1. 1)
      • 1. Proakis, J.G., Manolakis, D.G.: ‘Digital signal processing: principles, algorithms, and applications’ (Prentice-Hall, Inc., New Jersey, 1996, 3rd edn.).
    2. 2)
      • 2. Yu, C.L., Chakrabarti, C., Park, S., et al: ‘Bandwidth-intensive FPGA architecture for multi-dimensional DFT’. 2010 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2010, pp. 14861489, doi: 10.1109/ICASSP.2010.5495495.
    3. 3)
      • 3. Puschel, M., Moura, J.M.F., Johnson, J.R., et al: ‘Spiral: code generation for DSP transforms’, Proc. IEEE, 2005, 93, (2), pp. 232275, 10.1109/JPROC.2004.840306.
    4. 4)
      • 4. Frigo, M., Johnson, S.G.: ‘FFTW: an adaptive software architecture for the FFT’. Proc. 1998 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Seattle, WA, USA, vol. 3, 1998, pp. 13811384, doi: 10.1109/ICASSP.1998.681704.
    5. 5)
      • 5. Yu, C.-L., Kim, J.-S., Deng, L., et al: ‘FPGA architecture for 2D discrete Fourier transform based on 2D decomposition for large-sized data’, J. Signal Process. Syst., 2011, 64, (1), pp. 109122.
    6. 6)
      • 6. D'Alberto, P., Milder, P.A., Sandryhaila, A., et al: ‘Generating FPGA-accelerated DFT libraries’. 15th Annual IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM 2007), Napa, CA, USA, 2007, pp. 173184. doi: 10.1109/FCCM.2007.58.
    7. 7)
      • 7. Akin, B., Milder, P.A., Franchetti, F., et al: ‘Memory bandwidth efficient two-dimensional fast Fourier transform algorithm and implementation for large problem sizes’. 2012 IEEE 20th Int. Symp. on Field-Programmable Custom Computing Machines, Toronto, ON, Canada, 2012, pp. 188191. doi: 10.1109/FCCM.2012.40.
    8. 8)
      • 8. Kee, H., Bhattacharyya, S.S., Petersen, N., et al: ‘Resource-efficient acceleration of 2-dimensional fast Fourier transform computations on FPGAs’. Int. Conf. on Distributed Smart Cameras, Como, Italy, 2009, pp. 14.
    9. 9)
      • 9. Babionitakis, K., Chouliaras, V., Manolopoulos, K., et al: ‘Fully systolic FFT architecture for Giga-sample applications’, J. Signal Process. Syst., 2010, 58, pp. 281299.
    10. 10)
      • 10. Kala, S., Nalesh, S., Maity, A., et al: ‘High throughput, low latency, memory optimized 64k point FFT architecture using novel radix-4 butterfly unit’. 2013 IEEE Int. Symp. on Circuits and Systems (ISCAS2013), Beijing, China, 2013, pp. 30343037. doi: 10.1109/ISCAS.2013.6572518.
    11. 11)
      • 11. Cooley, J.W., Tukey, J.W.: ‘An algorithm for machine computation of complex Fourier series’, Math Comput., 1965, 9, pp. 297301.
    12. 12)
      • 12. He, S., Torkelson, M.: ‘Design and implementation of a 1024-point pipeline FFT processor’. Proc. IEEE 1998 Custom Integrated Circuits Conf. (Cat. No. 98CH36143), Santa Clara, CA, USA, 1998, pp. 131134. doi: 10.1109/CICC.1998.694922.
    13. 13)
      • 13. Oh, J.Y., Lim, M.S.: ‘New radix-2 to the 4th power pipeline FFT processor’, IEICE Trans. Electron., 2005, E88-C, (8), pp. 17401746.
    14. 14)
      • 14. Cho, T., Lee, H., Park, J., et al: ‘A high-speed low-complexity modified radix-25 FFT processor for gigabit WPAN applications’. 2011 IEEE Int. Symp. on Circuits and Systems (ISCAS), Rio de Janeiro, Brazil, 2011, pp. 12591262. doi: 10.1109/ISCAS.2011.5937799.
    15. 15)
      • 15. Lin, H.L., Lin, H., Chang, R.C., et al: ‘A high-speed highly pipelined 2n-point FFT architecture for a dual OFDM processor’. Proc. Int. Conf. on Mixed Design of Integrated Circuits and System, 2006 (MIXDES 2006), Gdynia, Poland, 2006, pp. 627631, doi: 10.1109/MIXDES.2006.1706659.
    16. 16)
      • 16. Yin, X.B., Yu, F., Ma, Z.G.: ‘Resource-efficient pipelined architectures for radix-2 real-valued FFT with real datapaths’, IEEE Trans. Circuits Syst. II, Express Briefs, 2016, 63, (8), pp. 803807, 10.1109/TCSII.2016.2530862.
    17. 17)
      • 17. Cohen, D.: ‘Simplified control of FFT hardware’, IEEE Trans. Acoust. Speech Signal Process., 1976, 24, (6), pp. 577579, 10.1109/TASSP.1976.1162854.
    18. 18)
      • 18. Hsiao, C.F., Chen, Y., Lee, C.Y.: ‘A generalized mixed-radix algorithm for memory-based FFT processors’, IEEE Trans. Circuits Syst. II, Express Briefs, 2010, 57, (1), pp. 2630, 10.1109/TCSII.2009.2037262.
    19. 19)
      • 19. Tsai, P.Y., Lin, C.Y.: ‘A generalized conflict-free memory addressing scheme for continuous-flow parallel-processing FFT processors with rescheduling’, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2011, 19, (12), pp. 22902302, 10.1109/TVLSI.2010.2077314.
    20. 20)
      • 20. Reisis, D., Vlassopoulos, N.: ‘Conflict-free parallel memory accessing techniques for FFT architectures’, IEEE Trans. Circuits Syst. I, Regul. Pap., 2008, 55, (11), pp. 34383447, 10.1109/TCSI.2008.924889.
    21. 21)
      • 21. Ma, Z.G., Yin, X.B., Yu, F.: ‘A novel memory-based FFT architecture for real-valued signals based on a radix-2 decimation-in-frequency algorithm’, IEEE Trans. Circuits Syst. II, Express Briefs, 2015, 62, (9), pp. 876880, 10.1109/TCSII.2015.2435522.
    22. 22)
      • 22. Xia, K., Wu, B., Zhou, X., et al: ‘A generalized conflict-free address scheme for arbitrary 2k-point memory-based FFT processors’. 2016 IEEE Int. Symp. On Circuits and Systems (ISCAS), Montreal, QC, Canada, 2016, pp. 21262129.
    23. 23)
      • 23. Xing, Q.J., Ma, Z.G., Xu, Y.K.: ‘A novel conflict-free parallel memory access scheme for FFT processors’, IEEE Trans. Circuits Syst. II, Express Briefs, 2017, 64, (11), pp. 13471351, 10.1109/TCSII.2017.2683643.
    24. 24)
      • 24. Huang, S.J., Chen, S.G.: ‘A high-parallelism memory-based FFT processor with high SQNR and novel addressing scheme’. 2016 IEEE Int. Symp. on Circuits and Systems (ISCAS), Montreal, QC, Canada, 2016, pp. 26712674, doi: 10.1109/ISCAS.2016.7539143.
    25. 25)
      • 25. Dillon, T.: ‘Two virtex-II FPGAs deliver fastest, cheapest, best high-performance image processing system’, Xilinx Xcell J., 2001, 41, pp. 7071.
    26. 26)
      • 26. Lenart, T., Gustafsson, M., wall, V.: ‘A hardware acceleration platform for digital holographic imaging’, J. Signal Process. Syst., 2008, 52, pp. 297311, 10.1007/s11265-008-0161-2.
    27. 27)
      • 27. Chen, R., Prasanna, V.K.: ‘Energy optimizations for FPGA based 2d FFT architecture’. IEEE High Performance Extreme Computing Conf. (HPEC), Waltham, MA, USA, 2014, pp. 16.
    28. 28)
      • 28. Sorokin, H., Takala, J.: ‘Conflict-free parallel access scheme for mixed-radix FFT supporting I/O permutations’. 2011 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 17091712, doi: 10.1109/ICASSP.2011.5946830.
    29. 29)
      • 29. Hu, H.-S., Chen, H.-Y., Jou, S.-J.: ‘Novel FFT processor with parallel-in-parallel-out in normal order’. 2009 Int. Symp. on VLSI Design, Automation and Test, Hsinchu, Taiwan, 2009, pp. 150153, doi: 10.1109/VDAT.2009.5158117.
    30. 30)
      • 30. Garrido, M., Grajal, J., Gustafsson, O.: ‘Optimum circuits for bit reversal’, IEEE Trans. Circuits Syst. II, Express Briefs, 2011, 58, (10), pp. 657661, 10.1109/TCSII.2011.2164141.
    31. 31)
      • 31. Yang, K.J., Tsai, S.H., Chuang, G.C.H.: ‘MDC FFT/IFFT processor with variable length for MIMO-OFDM systems’, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2013, 21, (4), pp. 720731, 10.1109/TVLSI.2012.2194315.
    32. 32)
      • 32. Kristensen, F., Nilsson, P., Olsson, A.: ‘Flexible baseband transmitter for OFDM’. IASTED Conf. on Circuits Signals Syst., Cancun, Mexico, 2003, pp. 356361.
    33. 33)
      • 33. Chakraborty, T.S., Chakrabarti, S.: ‘On output reorder buffer design of bit reversed pipelined continuous data FFT architecture’. APCCAS 2008–2008 IEEE Asia Pacific Conf. on Circuits and Systems, Macao, China, 2008, pp. 11321135, doi: 10.1109/APCCAS.2008.4746224.
    34. 34)
      • 34. Huang, S.J., Chen, S.G.: ‘A high-throughput radix-16 FFT processor with parallel and normal input/output ordering for IEEE 802.15.3c systems’, IEEE Trans. Circuits Syst. I, Regul. Pap., 2012, 59, (8), pp. 17521765, 10.1109/TCSI.2011.2180430.
    35. 35)
      • 35. Marti-Puig, P., Bolano, R.R.: ‘Radix-4 FFT algorithms with ordered input and output data’. IEEE DSP, Santorini-Hellas, Greece, 2009.
    36. 36)
      • 36. Kala, S., Nalesh, S., Nandy, S.K., et al: ‘Design of a low power 64 point FFT architecture for WLAN applications’. 2013 25th Int. Conf. on Microelectronics (ICM), Beirut, Lebanon, 2013, pp. 14. doi: 10.1109/ICM.2013.6734951.
    37. 37)
      • 37. Lin, Y.-W., Liu, H.-Y., Lee, C.-Y.: ‘A dynamic scaling FFT processor for DVB-T applications’, IEEE J. Solid State Circuits, 2004, 39, (11), pp. 20052013.
    38. 38)
      • 38. Rodrguez-Ramos, J.M., Castell, E.M., Conde, C.D., et al: ‘2D-FFT implementation on FPGA for wavefront phase recovery from the CAFADIS camera’, Proc. SPIE, 2008, 7015, pp. 111.
    39. 39)
      • 39. Wang, W., Duan, B., Zhang, C., et al: ‘Accelerating 2D FFT with non-power-of-two problem size on FPGA’. 2010 Int. Conf. on Reconfigurable Computing and FPGAs, Quintana Roo, Mexico, 2010, pp. 208213, doi: 10.1109/ReConFig.2010.13.
    40. 40)
      • 40. Deng, L., Yu, C.-L., Chakrabarti, C., et al: ‘Efficient image reconstruction using partial 2d Fourier transform’. SIPS 2008, Washington DC, USA, 2008.
    41. 41)
      • 41. Mahmood, F., Toots, M., Öfverstedt, L.-G., et al: ‘2D discrete Fourier transform with simultaneous edge artifact removal for real-time applications’. 2015 Int. Conf. on Field Programmable Technology (FPT), Queenstown, New Zealand, doi: 10.1109/FPT.2015.7393157.
    42. 42)
      • 42. Uzun, I.S., Amira, A., Bouridane, A.: ‘FPGA implementations of fast Fourier transforms for real-time signal and image processing’, IEE Proc. Vis. Image Signal Process., 2005, 152, (3), pp. 283296.
    43. 43)
      • 43. Shirazi, N., Athanas, P.M., Abbott, A.L.: ‘Implementation of a 2-d fast Fourier transform on an FPGA-based custom computing machine’. 5th Int. Workshop Field Programmable Logic and Applications, Oxford, UK, 1995, pp. 282292.
    44. 44)
      • 44. Kim, Y.J., Lee, H.S.: ‘The implementation of 2D FFT using multiple topology on 4 × 4 torus’. 2009 9th Int. Symp. on Communications and Information Technology, Icheon, Republic of Korea, 2009, pp. 615620. doi: 10.1109/ISCIT.2009.5341171.
    45. 45)
      • 45. Wang, M., Li, Z.: ‘A hybrid SDC/SDF architecture for area and power minimization of floating-point FFT computations’. IEEE Int. Symp. on Circuits and Systems (ISCAS2016), Montreal, Canada, 2016, pp. 21702173.

Related content

This is a required field
Please enter a valid email address