The advantages of the convergence with the square of the Newton-Raphson method are combined with the precision characteristics of digit-by-digit algorithms to obtain units for fast division that satisfy the IEEE 754 floating point standard requirements. A general design methodology that leads to a class of alternative architectures providing interesting performances for division is presented, together with one example of possible implementation. In particular, the proposed implementation achieves a speedup varying from 20% to about 30% in comparison with a previous architecture by Fandrianto, with a relatively small additional hardware cost if a multiplier is already available on the arithmetic unit.