This paper proposes design techniques for the efficient VLSI implementation of bit-serial multiplication over a modulus. These techniques reduce multiplication into simple cyclic shifts, where the number representation of the data is chosen appropriately. This representation will, in general, be highly redundant, implying a relatively poor throughput for the multiplier. It is then shown how, by splitting the multiplier into two pipelined multipliers, the throughput of the unit can be increased, whilst still retaining a cyclic-shift implementation. The split multiplier requires a mid-computation basis conversion, and the two number representations, used within the unit, are only moderately redundant. Thus, high-throughput, bit-serial multipliers are achieved, with most of the complexity contained within systolic basis converter modules. The multipliers are applicable to the VLSI implementation of high-throughput, signal processing operations performed over finite fields, in particular, transform and filter operations.