Efficient digital implementation of a multi-precision square-root algorithm

Efficient digital implementation of a multi-precision square-root algorithm

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Computers & Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

In high performance computing systems and signal processing, there is a basic set of mathematical functions that are essential. While addition, subtraction and multiplication are well understood, there is less literature on square-rooting, which is a particularly time- and resource-consuming function. Traditional non-restoring algorithms produce a mantissa half the length of the input mantissa, causing a loss of precision. This study presents a method for increasing the accuracy of this algorithm. It is shown to work for all IEEE-754R standard floating-point numbers. Error analysis shows a 57-fold (for half-precision) and 134e6-fold improvement (for double-precision) in the normalised error, equivalent to at most 1 Units of Least Precision. Resource and performance optimised variants are analysed and their throughput analysed. On an Intel Stratix V device, performance optimised implementations achieve a throughput of 717 MFLOPs. Resource optimised implementations on a low-cost device require only 127 Adaptive Logic Modules and 232 registers, with a throughput of 8.56 MFLOPs. All implementations are DSP block and memory free, saving valuable resources. The maximum throughput of the presented design is 15.5 times greater than that proposed by Pimentel et al. and two orders of magnitude greater than typical multiply-accumulate methods.


    1. 1)
      • 1. Pimentel, J., Bohmenstiehl, B., Baas, B.: ‘Hybrid hardware/software floating-point implementations for optimised area and throughput tradeoffs’, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2017, 25, (1), pp. 100113.
    2. 2)
      • 2. Liang-Kai, W., Schulte, M.: ‘Decimal floating-point square root using Newton-Raphson iteration’. IEEE Int. Conf. on Application-Specific Systems, Architecture Processors (ASAP'05), Samos, Greece, 2005, pp. 309315.
    3. 3)
      • 3. Tack-Jun, K., Sondeen, J., Draper, J.: ‘Floating-point division and square root using a Taylor-series expansion algorithm’. 50th Midwest Symp. on Circuits and Systems, Montreal, QC, Canada, 2007, pp. 305308.
    4. 4)
      • 4. Cornea-Hasegan, M., Golliver, R., Markstein, P.: ‘Correctness proofs outline for Newton-Raphson based floating-point divide and square root algorithms’. IEEE Symp. on Computer Arithmetic, Adelaide, SA, Australia, 1999, pp. 96105.
    5. 5)
      • 5. Kachhwal, P., Rout, B.: ‘Novel square root algorithm and its FPGA implementation’. Int. Conf. on Signal Propagation and Computer Technology (ICSPCT 2014), Ajmer, India, 2014, pp. 158162.
    6. 6)
      • 6. Putra, R.: ‘A novel fixed-point square root algorithm and its digital hardware design’. Int. Conf. on ICT for Smart Society, Jakarta, Indonesia, 2013, pp. 14.
    7. 7)
      • 7. Suresh, S., Beldianu, S., Ziavras, S.: ‘FPGA and ASIC square root designs for high performance and power efficiency’. IEEE 24th Int. Conf. on Application-Specific Systems, Architectures and Processors, Washington DC, USA, 2013, pp. 269272.
    8. 8)
      • 8. Vijeyakumar, K., Sumath, V., Vasakipriya, P., et al: ‘FPGA implementation of low power high speed square root circuits’. IEEE Int. Conf. on Computational Intelligence and Computing Research, Coimbatore, India, 2012, pp. 15.
    9. 9)
      • 9. Yamin, L., Wanming, C.: ‘Implementation of single precision floating point square root on FPGAs’. Proc. The 5th Annual IEEE Symp. on Field Programmable Custom Computing Machines, Napa Valley, CA, USA, 1997, pp. 226232.
    10. 10)
      • 10. Yamin, L., Wanming, C.: ‘A new non-restoring square root algorithm and its VLSI implementations’. Proc. Int. Conf. on Computer Design, VLSI in Computers and Processors, Austin, TX, USA, 1996, pp. 538544.
    11. 11)
      • 11. Hasnat, A., Bhattacharyya, T., Dey, A., et al: ‘A fast FPGA based architecture for computation of square root and inverse square root’. 2017 Devices for Integrated Circuit (DevIC), Kalyani, 2017, pp. 383387.
    12. 12)
      • 12. de Dinechin, F., Joldes, M., Pasca, B., et al: ‘Multiplicative square root algorithms for FPGAs’. 2010 Int. Conf. on Field Programmable Logic and Applications, Milano, 2010, pp. 574577.
    13. 13)
      • 13. Amaricai, A., Boncalo, O.: ‘FPGA implementation of very high radix square root with prescaling’. 2012 19th IEEE Int. Conf. on Electronics, Circuits, and Systems (ICECS 2012), Seville, 2012, pp. 221224.
    14. 14)
      • 14. ‘IEEE Standard for Floating-Point Arithmetic’, IEEE Std 754-2008, 2008, pp. 170.
    15. 15)
      • 15. Intel: ‘Cyclone V FPGAs & SoCs’. Available at, accessed: 20/04/2018.
    16. 16)
      • 16. Intel: ‘Stratix V FPGAs’. Available at, accessed: 20/04/2018.
    17. 17)
      • 17. TerASIC: ‘DE10-standard’. Available at, accessed: 20/04/2018.
    18. 18)
      • 18. Intel: ‘Stratix V device datasheet’. Available at, accessed: 21 June 2018.
    19. 19)
      • 19. Intel: ‘Floating-point IP cores user guide’. Available at, accessed: 25 January 2017.
    20. 20)
      • 20. Peterson, R.L., Ziemer, R.E., Borth, D.E.: ‘Introduction to spread spectrum communications’ (Prentice-Hall, Inc., New Jersey, USA, 1995).

Related content

This is a required field
Please enter a valid email address