Online ISSN
1751-861X
Print ISSN
1751-8601
IET Computers & Digital Techniques
Volume 6, Issue 4, July 2012
Volumes & issues:
Volume 6, Issue 4
July 2012
-
- Author(s): I. Voyiatzis ; C. Efstathiou ; H. Antonopoulou ; A. Milidonis
- Source: IET Computers & Digital Techniques, Volume 6, Issue 4, p. 195 –204
- DOI: 10.1049/iet-cdt.2010.0061
- Type: Article
- + Show details - Hide details
-
p.
195
–204
(10)
Built-in self test (BIST) techniques use test pattern-generation and response-verification operations, reducing the need for external testing. BIST techniques that use arithmetic modules existing in the circuit (accumulators, counters etc.) to perform the test-generation and response-verification operations have been proposed in the open literature. Two-pattern tests are exercised to detect complementary metal oxide semiconductor (CMOS) stuck-open faults and to assure correct temporal circuit operation at clock speed (delay fault testing). In this study, a novel, arithmetic module-based BIST architecture for two-pattern testing (ABAS) is presented that exercises arithmetic modules to generate two-pattern tests; the hardware overhead required by the presented scheme, provided the availability of such modules is by far the lowest of all schemes that have been presented for the same purpose in the open literature. - Author(s): A. Gellert ; H. Calborean ; L. Vintan ; A. Florea
- Source: IET Computers & Digital Techniques, Volume 6, Issue 4, p. 205 –213
- DOI: 10.1049/iet-cdt.2011.0116
- Type: Article
- + Show details - Hide details
-
p.
205
–213
(9)
This work extends an earlier manual design space exploration (DSE) of the authors' developed selective load value prediction-based superscalar architecture to the L2 unified cache. After that the authors perform an automatic DSE using a special developed software tool by varying several architectural parameters. The goal is to find optimal configurations in terms of cycles per instruction and energy consumption. By varying 19 architectural parameters, as the authors proposed, the design space is over 2.5 millions of billions configurations which obviously means that only a heuristic search can be considered. Therefore the authors propose different methods of automatic DSE based on their developed framework for automatic design space exploration which allow them to evaluate only 2500 configurations of the above mentioned huge design space! The experimental results show that their automatic DSE provides significantly better configurations than the previous manual DSE approach, considering the proposed multi-objective approach. - Author(s): J.-S. Lee ; S. Venkateswaran ; M. Choi
- Source: IET Computers & Digital Techniques, Volume 6, Issue 4, p. 214 –222
- DOI: 10.1049/iet-cdt.2011.0025
- Type: Article
- + Show details - Hide details
-
p.
214
–222
(9)
The recently proposed asynchronous nanowire crossbar architecture is envisioned to enhance the manufacturability and robustness of nanowire crossbar-based configurable digital circuits by removing various timing-related failure modes. Even though the proposed clock-free nanowire crossbar architecture has numerous technical merits over its clocked counterparts, it is still subject to high defect rates inherently induced by the non-deterministic nanoscale assembly of nanowire crossbars. In order to address this issue, a novel functional testing scheme has been proposed to validate threshold gates configured on programmable gate macro blocks (PGMB). The proposed approach selectively tests the crosspoints programmed as ON-state using test vectors tailored to the given threshold gate macro and its functionality. Therefore high-fault coverage can be achieved at significantly reduced test overhead. Also, numerous replacement and reconfiguration schemes have been proposed based on the proposed functional testing scheme to repair configured PGMBs that are partially faulty by locating incorrectly programmed crosspoints and replacing them with defect-free spares. Specific figures of merit have also been coined to quantify the performance of the proposed testing and reconfiguration algorithms. These findings have been extensively validated by a series of parametric simulations. - Author(s): I. Pomeranz
- Source: IET Computers & Digital Techniques, Volume 6, Issue 4, p. 223 –231
- DOI: 10.1049/iet-cdt.2011.0163
- Type: Article
- + Show details - Hide details
-
p.
223
–231
(9)
When a logic block is embedded in a larger design, the input sequences applicable to it may be constrained by other logic blocks in the design. This has an impact on what would constitute overtesting of the logic block by scan-based tests. This study defines functional broadside tests that avoid overtesting for an embedded block based on functional broadside tests for the larger design. The definition is constructive and results in a procedure for generating the tests. This study compares these tests with ones generated for the logic block as a stand-alone circuit. The results demonstrate that it is important to consider in the discussion of overtesting the extent to which the functionality of an embedded logic block is utilised as a part of the design. Under certain conditions it is possible to apply to the logic block functional broadside tests that were generated for it as a stand-alone circuit in order to maximise the fault coverage without overtesting, and reduce the computational complexity of test generation. - Author(s): I. Pomeranz and S.M. Reddy
- Source: IET Computers & Digital Techniques, Volume 6, Issue 4, p. 232 –239
- DOI: 10.1049/iet-cdt.2011.0131
- Type: Article
- + Show details - Hide details
-
p.
232
–239
(8)
Functional broadside tests were defined to avoid overtesting that may occur under scan-based tests because of non-functional operation conditions created by unreachable scan-in states. Functional broadside tests were computed assuming that functional operation starts after the circuit is initialised by applying a synchronising sequence. This study discusses the definition of functional broadside tests for the case where hardware reset is used for bringing the circuit into a known state before functional operation starts. This study shows that the set of reachable states for a circuit with hardware reset contains the set of reachable states based on a synchronising sequence. Consequently, the set of functional broadside tests and the set of detectable faults for a circuit with hardware reset contain those obtained based on a synchronising sequence. In addition, there are differences between different reset states in the sets of reachable states and the sets of detectable faults. This study also discusses the case where hardware reset is provided only for a subset of the state variables (referred to as partial reset). - Author(s): S.P. Mohanty and E. Kougianos
- Source: IET Computers & Digital Techniques, Volume 6, Issue 4, p. 240 –248
- DOI: 10.1049/iet-cdt.2011.0166
- Type: Article
- + Show details - Hide details
-
p.
240
–248
(9)
Low-power consumption and stability in static random access memories (SRAMs) is essential for embedded applications. This study presents a novel design flow for power minimisation of nano-complementary metal-oxide semiconductor SRAMs, while maintaining stability. A 32 nm high-κ/metal-gate SRAM has been used as an example circuit. The baseline circuit is subjected to power minimisation using a dual-threshold voltage assignment based on novel combined design of experiments and integer linear programming (DOE-ILP) approach. However, this leads to a 15% reduction in the static noise margin (SNM) of the cell. The conjugate gradient optimisation overcomes this SNM degradation, while reducing the power consumption. The final SRAM design shows 86% reduction in power consumption (including leakage) and 8% increase in the SNM compared with the baseline design. The variability analysis of the optimised cell is performed by considering the effect of 12 parameters. SRAM arrays of different sizes are constructed to demonstrate the feasibility of the proposed SRAM cell. To the best of the authors’ knowledge, this is the first study which makes use of DOE-ILP and conjugate gradient method for simultaneous stability and power optimisation in high-κ/metal-gate SRAM circuits. - Author(s): Ž. Jovanović and V. Milutinović
- Source: IET Computers & Digital Techniques, Volume 6, Issue 4, p. 249 –256
- DOI: 10.1049/iet-cdt.2011.0132
- Type: Article
- + Show details - Hide details
-
p.
249
–256
(8)
This study treats architecture and implementation of a field-programmable gate array (FPGA) accelerator for double-precision floating-point matrix multiplication. The architecture is oriented towards minimising resource utilisation and maximising clock frequency. It employs the block matrix multiplication algorithm which returns the result blocks to the host processor as soon as they are computed. This avoids output buffering and simplifies placement and routing on the chip. The authors show that such architecture is especially well suited for full-duplex communication links between the accelerator and the host processor. The architecture requires the result blocks to be accumulated by the host processor; however, the authors show that typically more than 99% of all arithmetic operations are performed by the accelerator. The implementation focuses on efficient use of embedded FPGA resources, in order to allow for a large number of processing elements (PEs). Each PE uses eight Virtex-6 DSP blocks. Both adders and multipliers are deeply pipelined and use several FPGA-specific techniques to achieve small area size and high clock frequency. Finally, the authors quantify the performance of accelerator implemented in Xilinx Virtex-6 FPGA, with 252 PEs running at 403 MHz (achieving 203.1 Giga FLOPS (GFLOPS)), by comparing it to double-precision matrix multiplication function from MKL, ACML, GotoBLAS and ATLAS libraries executing on Intel Core2Quad and AMD Phenom X4 microprocessors running at 2.8 GHz. The accelerator performs 4.5 times faster than the fastest processor/library pair.
Arithmetic module-based built-in self test architecture for two-pattern testing
Multi-objective optimisations for a superscalar architecture with selective value prediction
Efficient post-configuration testing of an asynchronous nanowire crossbar system for reliability
Functional broadside tests for embedded logic blocks
Reset and partial-reset-based functional broadside tests
Design of experiments and integer linear programming-assisted conjugate-gradient optimisation of high-κ/metal-gate nano-complementary metal-oxide semiconductor static random access memory
FPGA accelerator for floating-point matrix multiplication
Most viewed content for this Journal
Article
content/journals/iet-cdt
Journal
5
Most cited content for this Journal
-
High-performance elliptic curve cryptography processor over NIST prime fields
- Author(s): Md Selim Hossain ; Yinan Kong ; Ehsan Saeedi ; Niras C. Vayalil
- Type: Article
-
Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits
- Author(s): Aiman H. El-Maleh
- Type: Article
-
Scalable GF(p) Montgomery multiplier based on a digit–digit computation approach
- Author(s): M. Morales-Sandoval and A. Diaz-Perez
- Type: Article
-
Fabrication and characterisation of Al gate n-metal–oxide–semiconductor field-effect transistor, on-chip fabricated with silicon nitride ion-sensitive field-effect transistor
- Author(s): Rekha Chaudhary ; Amit Sharma ; Soumendu Sinha ; Jyoti Yadav ; Rishi Sharma ; Ravindra Mukhiya ; Vinod K. Khanna
- Type: Article
-
Adaptively weighted round-robin arbitration for equality of service in a many-core network-on-chip
- Author(s): Hanmin Park and Kiyoung Choi
- Type: Article