Online ISSN
1751-861X
Print ISSN
1751-8601
IET Computers & Digital Techniques
Volume 1, Issue 4, July 2007
Volumes & issues:
Volume 1, Issue 4
July 2007
-
- Author(s): P.H.W. Leong ; A. Koch ; E. Boemo
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 265 –266
- DOI: 10.1049/iet-cdt:20079015
- Type: Article
- + Show details - Hide details
-
p.
265
–266
(2)
- Author(s): Y. Lin ; M. Hutton ; L. He
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 267 –275
- DOI: 10.1049/iet-cdt:20060185
- Type: Article
- + Show details - Hide details
-
p.
267
–275
(9)
Process variations affecting timing and power is an important issue for modern integrated circuits in nanometre technologies. Field programmable gate arrays (FPGA) are similar to application-specific integrated circuit (ASIC) in their susceptibility to these issues, but face unique challenges in that critical paths are unknown at test time. The first in-depth study on applying statistical timing analysis with cross-chip and on-chip variations to speed-binning and guard-banding in FPGAs has been presented. Considering the uniqueness of re-programmability in FPGAs, the effects of timing-model with guard-banding and speed-binning on statistical performance and timing yield are quantified. A new variation aware statistical placement, which is the first statistical algorithm for FPGA layout and achieves a yield loss of 29.7% of the original yield loss with guard-banding and a yield loss of 4% of the original one with speed-binning for Microelectronics Center of North Carolina (MCNC) and Quartus University Interface Program (QUIP) designs, has also been developed. - Author(s): L. Singhal and E. Bozorgzadeh
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 276 –294
- DOI: 10.1049/iet-cdt:20070012
- Type: Article
- + Show details - Hide details
-
p.
276
–294
(19)
Partial dynamic reconfiguration is an emerging area in field programmable gate arrays (FPGA) designs, which is used for saving device area and cost. In order to reduce the reconfiguration overhead, two consecutive similar sub-designs should be placed in the same locations to get the maximum reuse of common components. This requires that all the future designs be considered while floorplanning for any given design. A comprehensive framework for floorplanning designs on partial reconfigurable architecture is provided. Several reconfiguration-specific floorplanning cost functions and moves that aim to reduce the reconfiguration overhead are introduced. A new multi-layer sequence pair-representation-based floorplanner that allows overlap of static and non-static components of multiple designs and guarantees a feasible overlapping floorplan with minimal area packing is introduced. A new matching algorithm that covers all possible matchings of static blocks during floorplanning for multiple designs is presented. In our experiments, it is shown that the proposed floorplanner gives more than 50% savings in reconfiguration frames compared with the scheme where no reuse is done. Further, compared with a traditional sequential floorplanner, our floorplanner removes infeasibility in many designs, achieves an improvement of clock period by 12% on average and reduces the place and route time significantly. The proposed floorplanner could be used for finding high-quality floorplans for applications that use partial reconfiguration. - Author(s): K. Danne ; R. Mühlenbernd ; M. Platzner
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 295 –302
- DOI: 10.1049/iet-cdt:20060186
- Type: Article
- + Show details - Hide details
-
p.
295
–302
(8)
A prototype system that executes a set of periodic real-time tasks utilising dynamic hardware reconfiguration is presented. The proposed scheduling technique, merge server distribute load (MSDL), is not only able to give an offline guarantee for the feasibility of the task set, but also minimises the number of device configurations. After describing this technique, the schedulability analysis is extended to cover different runtime system overheads, including the device reconfiguration time. Then, a light-weight runtime system that performs the online part of the MSDL scheduling technique is detailed. The runtime system is implemented entirely in hardware. Finally, the corresponding synthesis tool flow is outlined and the overhead posed by the runtime system is reported. - Author(s): N. Baradaran and P.C. Diniz
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 303 –311
- DOI: 10.1049/iet-cdt:20060181
- Type: Article
- + Show details - Hide details
-
p.
303
–311
(9)
Configurable architectures offer the unique opportunity of customising the storage allocation to meet specific applications' needs. A compiler approach to map the arrays of a loop-based computation to internal memories of a configurable architecture with the objective of minimising the overall execution time is described. An algorithm that considers the data access patterns of the arrays along the critical path of the computation as well as the available storage and memory bandwidth is presented. Experimental results are presented which demonstrate the application of this approach for a set of kernel codes when targeting a field-programmable gate-array. The results reveal that the proposed algorithm outperforms the naive and custom data layout techniques by an average of 33% and 15% in terms of execution time, while taking into account the available hardware resources. - Author(s): D.B. Thomas and W. Luk
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 312 –321
- DOI: 10.1049/iet-cdt:20060188
- Type: Article
- + Show details - Hide details
-
p.
312
–321
(10)
A hardware architecture for non-uniform random number generation, which allows the generator's distribution to be modified at run-time without reconfiguration is presented. The architecture is based on a piecewise linear approximation, using just one table lookup, one comparison and one subtract operation to map from a uniform source to an arbitrary non-uniform distribution, resulting in very low area utilisation and high speeds. Customisation of the distribution is fully automatic, requiring less than a second of CPU time to approximate a new distribution, and typically around 1000 cycles to switch distributions at run-time. Comparison with Gaussian-specific generators shows that the new architecture uses less than half the resources, provides a higher sample rate and retains statistical quality for up to 50 billion samples, but can also generate other distributions. When higher statistical quality is required and multiple samples are required per cycle, a two-level piecewise generator can be used, reducing the RAM required per generated sample while retaining the simplicity and speed of the basic technique.
Editorial: Field-programmable logic and applications
Statistical placement for FPGAs considering process variation
Multi-layer floorplanning for reconfigurable designs
Server-based execution of periodic tasks on dynamically reconfigurable hardware
Exploiting parallelism in configurable architectures through custom array mapping
Non-uniform random number generation through piecewise linear approximations
-
- Author(s): M. Hosseinabady ; P. Lotfi-Kamran ; F. Lombardi ; Z. Navabi
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 322 –333
- DOI: 10.1049/iet-cdt:20050133
- Type: Article
- + Show details - Hide details
-
p.
322
–333
(12)
A novel design-for-test (DFT) method that requires minor modifications to the controller in the register-transfer level (RTL) description of a circuit is presented. The control/data flow graph representation of an RTL circuit is used for analysing the testability of individual RTL operations within the RTL circuit. Using a non-scan arrangement, existing data paths are utilised to provide controllability and observability to RTL operations. Furthermore, additional data paths are introduced by altering the controller states or adding new transitions. This method considerably reduces the test application time by ignoring unnecessary control states in the test process. The proposed method is applied to behavioural and RTL benchmarks. The results show the effectiveness of this method when compared with some other DFT insertion methods. - Author(s): J. Hu ; N. Vijaykrishnan ; M.J. Irwin ; M. Kandemir
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 334 –348
- DOI: 10.1049/iet-cdt:20060170
- Type: Article
- + Show details - Hide details
-
p.
334
–348
(15)
As the issue width and the number of function units of superscalar processors continue to increase, the fetch unit must support a large fetch bandwidth in order to fully utilise the datapath resources. This trend makes power issue worse in the fetch unit since the traditional instruction fetch mechanism is not optimised for power consumption. This paper explores the problem of extra power consumption in traditional instruction caches because of dynamic control flows. Capturing the dynamic paths/characteristics of code during the course of execution, trace caches provide a potential framework for power optimisation in the fetch unit. Our study shows that conventional trace caches (CTC) may increase power consumption in the fetch unit because of the simultaneous access to both the trace cache and the instruction cache, and sequential trace caches (STC) have the advantage of lower power consumption at the cost of a significant performance loss. In order to address this problem, we perform a detailed study of trace distribution and access locality. Based on this study, we first propose a new model, the selective trace cache (SLTC). SLTC uses both compiler and hardware support to selectively control trace cache lookup and update. Experimental evaluation shows that our selective trace cache achieves up to 42.2% power reduction over CTC and an additional reduction of up to 21.8% over STC, on the average, while only trading a performance loss of no more than 1.8% compared to CTC. Further, we propose a dynamic direction prediction based trace cache (DPTC), which eliminates the need for compilation and instruction set architecture (ISA) modification involved in SLTC. Powered by a fetch direction predictor, DPTC achieves competitive power efficiency. On the average, DPTC reduces the power consumption by up to 40.5% and 17.6% in the fetch unit compared to CTC and STC, respectively, by trading a performance loss of less than 2.4% to CTC. - Author(s): K.H. Tsoi ; K.H. Leung ; P.H.W. Leong
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 349 –352
- DOI: 10.1049/iet-cdt:20050173
- Type: Article
- + Show details - Hide details
-
p.
349
–352
(4)
A field programmable gate array (FPGA) -based implementation of a physical random number generator (PRNG) is presented. The PRNG uses an alternating step generator construction to decorrelate an oscillator-phase-noise-based physical random source. The resulting design can be implemented completely in digital technology, requires no external components, is very small in area, achieves very high throughput and has good statistical properties. The PRNG was implemented on an FPGA device and tested using the NIST, Diehard and TestU01 random number test suites. - Author(s): I. Pomeranz and S.M. Reddy
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 353 –363
- DOI: 10.1049/iet-cdt:20060120
- Type: Article
- + Show details - Hide details
-
p.
353
–363
(11)
Test sets that detect each target fault n times (n-detection test sets) are typically generated for restricted values of n because of the increase in test set size with n. Both a worst-case analysis and an average-case analysis is performed to investigate the effects of restricting n on the unmodelled fault coverage of an (arbitrary) n-detection test set of a full-scan circuit. The analysis is independent of any particular test set or any particular test generation procedure. It is based on a specific set of target faults and a specific set of untargeted faults. It shows that, depending on the circuit, very large values of n may be needed to guarantee the detection of all the untargeted faults. The implications of these results are discussed and it is also demonstrated that the proposed analysis methods can be used to evaluate the effects of incorporating into the n-detection test generation procedure specific strategies aimed at improving the n-detection test set quality. - Author(s): A. El-Maleh and S. Khursheed
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 364 –368
- DOI: 10.1049/iet-cdt:20070004
- Type: Article
- + Show details - Hide details
-
p.
364
–368
(5)
Test compaction is an effective technique for reducing test data volume and test application time. The authors present a new static test compaction technique based on test vector decomposition and clustering. Test vectors are decomposed and clustered for faults in an increasing order of faults detection count. This clustering order gives more degree of freedom and results in better compaction. Experimental results demonstrate the effectiveness of the proposed approach in achieving higher compaction in a much more efficient CPU time than the previous clustering-based test compaction approaches. - Author(s): M.-H. Yang ; Y. Kim ; Y. Park ; D. Lee ; S. Kang
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 369 –376
- DOI: 10.1049/iet-cdt:20060114
- Type: Article
- + Show details - Hide details
-
p.
369
–376
(8)
A new low-power testing methodology to reduce the excessive power dissipation associated with scan-based designs in the deterministic test pattern generated by linear feedback shift registers (LFSRs) in built-in self-test is proposed. This new method utilises two split LFSRs to reduce the amount of the switching activity. The original test cubes are partitioned into zero-set and one-set cubes according to specified bits in the test cubes, and the split LFSR generates a zero-set or one-set cube in the given test cube. In cases where the current scan shifting value is a do not care bit accounting for the output values of the LFSRs, the last value shifted into the scan chain is repeatedly shifted into the scan chain and no transition is produced. Experimental results for the largest ISCAS'89 benchmark circuits show that the proposed scheme can reduce the switching activity by 50% with little hardware overhead compared with previous schemes. - Author(s): G. Jaberipur and A. Kaivani
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 377 –381
- DOI: 10.1049/iet-cdt:20060160
- Type: Article
- + Show details - Hide details
-
p.
377
–381
(5)
With the growing popularity of decimal computer arithmetic in scientific, commercial, financial and Internet-based applications, hardware realisation of decimal arithmetic algorithms is gaining more importance. Hardware decimal arithmetic units now serve as an integral part of some recently commercialised general purpose processors, where complex decimal arithmetic operations, such as multiplication, have been realised by rather slow iterative hardware algorithms. However, with the rapid advances in very large scale integration (VLSI) technology, semi- and fully parallel hardware decimal multiplication units are expected to evolve soon. The dominant representation for decimal digits is the binary-coded decimal (BCD) encoding. The BCD-digit multiplier can serve as the key building block of a decimal multiplier, irrespective of the degree of parallelism. A BCD-digit multiplier produces a two-BCD digit product from two input BCD digits. We provide a novel design for the latter, showing some advantages in BCD multiplier implementations. - Author(s): G. Yang ; X. Song ; M.A. Perkowski ; W.N.N. Hung ; J. Biamonte ; Z. Tang
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 382 –388
- DOI: 10.1049/iet-cdt:20060097
- Type: Article
- + Show details - Hide details
-
p.
382
–388
(7)
Two questions have been addressed here: how many logic levels are necessary to synthesise a specific instance of a reversible circuit, and how much better to have large gate libraries, and what gates should be required in libraries. Group theory is applied to 3-bit reversible gate synthesis to create a library useful in hierarchical design. It has been shown that arbitrary 3-bit reversible circuit can be synthesised with four-logic levels, using this new gate library. The respective universal library for the four-level synthesis is constructed and optimised it on the level of nuclear magnetic resonance pulses. A very fast algorithm to synthesise arbitrary 3-bit reversible function to gates from our library is also presented. The algorithm demonstrates dramatic speed benefit and results in a maximum of four-level circuit for arbitrary 3-bit reversible function. The gates are optimised on the level of pulses to decrease their cost and allow for objective comparison with standard CNOT, NOT, Toffoli gates (CNT) circuits. This library guarantees a four-level circuit for any 3-qubit reversible function and is also intended to be used in a hierarchical design of larger circuits. - Author(s): A.A-A. Gutub
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 389 –396
- DOI: 10.1049/iet-cdt:20060183
- Type: Article
- + Show details - Hide details
-
p.
389
–396
(8)
Modular inversion is a fundamental process in several cryptographic systems. It can be computed in software or hardware, but hardware computation has been proven to be faster and more secure. This research focused on improving an old scalable inversion hardware architecture proposed in 2004 for finite field GF(p). The architecture comprises two parts, a computing unit and a memory unit. The memory unit holds all the data bits of computation whereas the computing unit performs all the arithmetic operations in word (digit) by word bases such that the design is scalable. The main objective of this paper is to show the cost and benefit of modifying the memory unit to include shifting, which was previously one of the tasks of the scalable computing unit. The study included remodeling the entire hardware architecture removing the shifter from the scalable computing part and embedding it in the non-scalable memory unit instead. This modification resulted in a speedup to the complete inversion process with an area increase due to the new memory shifting unit. Several design schemes have been compared giving the user the complete picture to choose from depending on the application need. - Author(s): B. Gupta ; S. Rahimi ; Z. Liu
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 397 –404
- DOI: 10.1049/iet-cdt:20060102
- Type: Article
- + Show details - Hide details
-
p.
397
–404
(8)
An efficient roll-forward checkpointing/recovery scheme for distributed systems has been presented. This work is an improvement of our earlier work. The use of the concept of forced checkpoints helps to design a single phase non-blocking algorithm to find consistent global checkpoints. It offers the main advantages of both the synchronous and the asynchronous approaches, that is simple recovery and simple way to create checkpoints. The algorithm produces reduced number of checkpoints. Since each process independently takes its decision whether to take a forced checkpoint or not, it makes the algorithm simple, fast and efficient. The proposed work offers better performance than some noted existing works. Besides, the advantages stated above also ensure that the algorithm can work efficiently in mobile computing environment. - Author(s): F. Burns ; J. Murphy ; D. Shang ; A. Koelmans ; A. Yakorlev
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 405 –413
- DOI: 10.1049/iet-cdt:20060121
- Type: Article
- + Show details - Hide details
-
p.
405
–413
(9)
A dynamic global security-aware synthesis flow using the SystemC language is presented. SystemC security models are first specified at the system or behavioural level using a library of SystemC behavioural descriptions which provide for the reuse and extension of security modules. At the core of the system is incorporated a global security-aware scheduling algorithm which allows for scheduling to a mixture of components of varying security level. The output from the scheduler is translated into annotated nets which are subsequently passed to allocation, optimisation and mapping tools for mapping into circuits. The synthesised circuits incorporate asynchronous secure power-balanced and fault-protected components. Results show that the approach offers robust implementations and efficient security/area trade-offs leading to significant improvements in turnover. - Author(s): K.-J. Cho and J.-G. Chung
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 414 –422
- DOI: 10.1049/iet-cdt:20060033
- Type: Article
- + Show details - Hide details
-
p.
414
–422
(9)
This study presents a design method for fixed-width two's complement squarer that receives an n-bit input and produces an n-bit squared product. To efficiently compensate for the truncation error, modified Booth-folding encoder signals are used for the generation of error compensation bias. The truncated bits are divided into two groups depending on their effects on the truncation error. Then, different error compensation methods are applied to each group. By simulations, it is shown that the performance of the proposed method is close to that of the true rounding method and much better than those of other methods. Also, it is shown that the proposed fixed-width two's complement squarers lead to about 34% reduction in area, 35% reduction in power consumption and 10% improvement in speed compared with conventional squarers. - Author(s): B. Pal ; A. Sinha ; P. Dasgupta ; P.P. Chakrabarti ; K. De
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 423 –433
- DOI: 10.1049/iet-cdt:20070016
- Type: Article
- + Show details - Hide details
-
p.
423
–433
(11)
Recent design and verification languages, such as SystemVerilog, support a rich test bench language, which provides significant support towards developing layered, structured, constrained random test bench architectures. Typically, the test bench language offers many features that are not synthesisable and therefore cannot be carried into the hardware for hardware accelerated simulation. One of the main challenges in improving the performance of hardware accelerated simulation is to run the task of random value selection under specified constraints in hardware. This problem (possibly for the first time) is addressed and a two-step approach is presented. In the first step, the constraints are pre-processed in software to generate a set of entailed regions. In the second step, random value selection is performed in hardware using the entailed regions pre-computed in the first step. It is shown that this method has modest area overhead and produces constraint satisfying random valuations within very few cycles. Results on test bench architectures for the ARM AMBA Bus and IBM CoreConnect protocol suites have been reported. - Author(s): J.H. Park and Y. Chu
- Source: IET Computers & Digital Techniques, Volume 1, Issue 4, p. 434 –442
- DOI: 10.1049/iet-cdt:20060113
- Type: Article
- + Show details - Hide details
-
p.
434
–442
(9)
An efficient operating system-based power management scheme for DRAM in the multiprogramming/time-sharing environment is presented. Formulas for evaluating the condition of positive energy gain are developed and a finite state machine (FSM) for selecting the best power mode for a given idle time is designed. In the proposed scheme, the scheduler selectively assigns the most efficient power mode to each idle memory bank at context switching time based on the FSM. For computing the idle time, two efficient and practical prediction methods are developed and tested for performance. The proposed scheme achieves further energy saving by starting the resynchronisation of idle memory banks as early as possible. In the aspects of the prediction method and the early resynchronisation method, multiple versions of the proposed scheme are developed and tested for performance. The proposed scheme utilises events occurring at context switching time in the multiprogramming/time-sharing environment and the scheduling ensures the maximum energy gain without significantly degrading the performance. The proposed scheme is tested with a simulated system, and the experimental results demonstrate the efficiency of the scheme. In the experiment, the energy gain by using the proposed scheme ranges from 17.84% to 52.21% depending on the time quantum sizes tested.
Low overhead DFT using CDFG by modifying controller
Optimising power efficiency in trace cache fetch unit
High performance physical random number generator
Worst-case and average-case analysis of n-detection test sets and test generation strategies
Efficient test compaction for combinational circuits based on Fault detection count-directed clustering
Deterministic built-in self-test using split linear feedback shift register reseeding for low-power testing
Binary-coded decimal digit multipliers
Four-level realisation of 3-qubit reversible functions
High speed hardware architecture to compute galois fields GF(p) montgomery inversion with scalability features
Novel low-overhead roll-forward recovery scheme for distributed systems
Dynamic global security-aware synthesis using SystemC
Low error fixed-width two's complement squarer design using Booth-folding technique
Hardware accelerated constrained random test generation
Finite state machine-based DRAM power management with early resynchronisation
Most viewed content for this Journal
Article
content/journals/iet-cdt
Journal
5
Most cited content for this Journal
-
High-performance elliptic curve cryptography processor over NIST prime fields
- Author(s): Md Selim Hossain ; Yinan Kong ; Ehsan Saeedi ; Niras C. Vayalil
- Type: Article
-
Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits
- Author(s): Aiman H. El-Maleh
- Type: Article
-
Scalable GF(p) Montgomery multiplier based on a digit–digit computation approach
- Author(s): M. Morales-Sandoval and A. Diaz-Perez
- Type: Article
-
Fabrication and characterisation of Al gate n-metal–oxide–semiconductor field-effect transistor, on-chip fabricated with silicon nitride ion-sensitive field-effect transistor
- Author(s): Rekha Chaudhary ; Amit Sharma ; Soumendu Sinha ; Jyoti Yadav ; Rishi Sharma ; Ravindra Mukhiya ; Vinod K. Khanna
- Type: Article
-
Adaptively weighted round-robin arbitration for equality of service in a many-core network-on-chip
- Author(s): Hanmin Park and Kiyoung Choi
- Type: Article