Online ISSN
1751-861X
Print ISSN
1751-8601
IET Computers & Digital Techniques
Volume 5, Issue 4, July 2011
Volumes & issues:
Volume 5, Issue 4
July 2011
-
- Author(s): Y. Dotan ; N. Levison ; D. Lilja
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 221 –230
- DOI: 10.1049/iet-cdt.2010.0009
- Type: Article
- + Show details - Hide details
-
p.
221
–230
(10)
Future nano-scale devices are expected to shrink to ever smaller dimensions, to operate at low voltages and high frequencies, to be more sensitive to environmental influences and to be characterised by high dynamic fault rates and defect densities. Fundamentally new fault-tolerant architectures are required in order to produce reliable systems that will operate correctly. Simple replication of micro-architecture blocks will no longer suffice, as all replicated blocks will have faults. The history index of correct computation (HICC) is examined in a recursive and non-recursive fault-tolerant approach at the bit and module levels to identify reliable blocks on-the-fly and forward their computation results, while ignoring results from unreliable blocks. Simulation results show that recursive and non-recursive HICC offers the best resilience to faults when faults are non-uniformly distributed among redundant blocks. A correct computation rate of 99% is achieved using the recursive HICC when decision units at the bit and module levels are fault free, despite an average fault injection rate of 20% compared to a 68% correct computation rate for the recursive triple modular redundancy voter. When faults are injected everywhere in the design, the non-recursive HICC supports the best correct computation percentage. The effect of circuit size and history indices are also examined and discussed. - Author(s): G.R. Jagadeesh ; T. Srikanthan ; C.M. Lim
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 231 –237
- DOI: 10.1049/iet-cdt.2009.0072
- Type: Article
- + Show details - Hide details
-
p.
231
–237
(7)
There exist several practical applications that require high-speed shortest-path computations. In many situations, especially in embedded applications, an field programmable gate array (FPGA)-based accelerator for computing the shortest paths can help to achieve high performance at low cost. This study presents an FPGA-based distributed architecture for solving the single-source shortest-path problem in a fast and efficient manner. The proposed architecture is based on the Bellman–Ford algorithm adapted to facilitate early termination of computation. One of the novelties of the architecture is that it does not involve any centralised control and the processing elements (PEs), which are identical in construction, operate in perfect synchronisation with each other. The functional correctness of the design has been verified through simulations and also in actual hardware. It has been shown that the implementation on a Xilinx Virtex-5 FPGA is more than twice as fast as a software implementation of the algorithm on a high-end general-purpose processor that runs at an order-of-magnitude faster clock. The speed-up offered by the design can be further improved by adopting an interconnection topology that maximises the data transfer rate among the PEs. - Author(s): W.-C. Wang ; C.-Y. Hsu ; J. Li ; Y.-C. Sung ; A. Rao ; L.-T. Wang
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 238 –246
- DOI: 10.1049/iet-cdt.2010.0041
- Type: Article
- + Show details - Hide details
-
p.
238
–246
(9)
This study presents a row-linear feedback shift register-column (RLC) masking technique that is capable of handling many unknowns in the test responses. The proposed technique takes advantage that most unknowns are locally clustered after the test compactor. With three novel masking mechanisms [direct row, direct column and linear feedback shift register (LFSR) column masking], RLC masks all unknowns in test responses using a very short LFSR. Experiments on a real design show that the proposed technique is able to mask up to 7.38% unknowns with only 0.61% fault coverage loss. By providing a very high test response compaction ratio, RLC masking technique enables massive parallel testing of many-core system chips. - Author(s): I. Pomeranz and S.M. Reddy
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 247 –253
- DOI: 10.1049/iet-cdt.2009.0022
- Type: Article
- + Show details - Hide details
-
p.
247
–253
(7)
Partially functional broadside tests were defined to address the tradeoff that exists between fault coverage and proximity to functional operation conditions during the application of scan-based tests for delay faults. Proximity to functional operation conditions is important for avoiding overtesting. The definition of a partially functional broadside test does not take into consideration the extent of deviation from functional operation conditions that occurs during the second pattern of a test. The authors define two-dimensional partially functional broadside tests to address this issue. The authors demonstrate through experimental results that this improves the ability of the test to remain close to functional operation conditions during the application of both patterns. - Author(s): A.Z. Jooya ; A. Baniasadi ; M. Analoui
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 254 –262
- DOI: 10.1049/iet-cdt.2009.0045
- Type: Article
- + Show details - Hide details
-
p.
254
–262
(9)
The authors introduce a history-aware, resource-based dynamic (or simply HARD) scheduler for heterogeneous chip multi-processors (CMPs). HARD relies on recording application resource utilisation and throughput to adaptively change cores for applications during runtime. The authors show that HARD can be configured to achieve both performance and power improvements and compare HARD to an alternative dynamic scheduler and a static scheduler to provide better understanding. - Author(s): M.G. Mohammad
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 263 –270
- DOI: 10.1049/iet-cdt.2010.0083
- Type: Article
- + Show details - Hide details
-
p.
263
–270
(8)
Chalcogenide-based phase change memory (PCM) is a type of non-volatile memory that will most likely replace the currently widespread flash memory. Current research on PCM targets the integration feasibility, as well as the reliability of such memory technology into the currently used complementary metal oxide semiconductor (CMOS) process. Such studies identified special failure modes, known as disturbs, as well as other PCM specific faults. In this study, the authors identify these failures, analyse their behaviours and develop fault primitives/models that describe these faults accurately and effectively. In addition, the authors propose an efficient test algorithm, called March-PCM, to test for these faults and compare its performance to some previously developed test algorithms. - Author(s): M.E. Angelopoulou ; C.-S. Bouganis ; P.Y.K. Cheung
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 271 –286
- DOI: 10.1049/iet-cdt.2009.0053
- Type: Article
- + Show details - Hide details
-
p.
271
–286
(16)
Restoration methods, such as super-resolution (SR), largely depend on the accuracy of the point spread function (PSF). PSF estimation is an ill-posed problem, and a linear and uniform motion is often assumed. In real-life systems, this may deviate significantly from the actual motion, impairing subsequent restoration. To address the above, this work proposes a dynamically configurable imaging system that combines algorithmic video enhancement, field programmable gate array (FPGA)-based video processing and adaptive image sensor technology. Specifically, a joint blur identification and validation (BIV) scheme is proposed, which validates the initial linear and uniform motion assumption. For the cases that significantly deviate from that assumption, the real-time reconfiguration property of an adaptive image sensor is utilised, and the sensor is locally reconfigured to larger pixels that produce higher frame-rate samples with reduced blur. Results demonstrate that once the sensor reconfiguration gives rise to a valid motion assumption, highly accurate PSFs are estimated, resulting in improved SR reconstruction quality. To enable real-time reconstruction, an FPGA-based BIV architecture is proposed. The system's throughput is significantly higher than 25 fps, for frame sizes up to 1024 × 1024, and its performance is robust to noise for signal-to-noise ratio (SNR) as low as 20 dB. - Author(s): X. Zhou and P. Petrov
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 287 –295
- DOI: 10.1049/iet-cdt.2009.0030
- Type: Article
- + Show details - Hide details
-
p.
287
–295
(9)
A novel page table organisation for real-time and memory-constrained embedded system is presented. Increasingly many high-end embedded processors offer virtual memory support in the form of hardware memory management unit, which is responsible for caching and rapidly looking-up the address mapping required to access memory. However, to completely implement virtual memory support the system software needs to maintain a page table per task, which goal is to capture the virtual to physical page translation information for the entire address space. Page tables have been traditionally designed for general-purpose systems where their size and real-time performance have not been of primary importance; the average performance of page table traversal has been the major concern. Many embedded systems, however, impose strict real-time requirements coupled with limited memory resources. To address these problems, a novel page table organisation is proposed, which not only requires significantly less memory than the traditional page tables, but also enables a rapid and deterministic hardware-based page table traversal. This is achieved by exploiting application knowledge regarding the memory footprint of the program under execution and, in particular, the fact that often times large sequences of consecutive virtual pages are mapped to a non-fragmented region in physical memory comprising of consecutive physical memory frames. - Author(s): B.A. Al Jassani ; N. Urquhart ; A.E.A. Almaini
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 296 –305
- DOI: 10.1049/iet-cdt.2010.0045
- Type: Article
- + Show details - Hide details
-
p.
296
–305
(10)
In this study, a new approach using a multi-objective genetic algorithm (MOGA) is proposed to determine the optimal state assignment with less area and power dissipations for completely and incompletely specified sequential circuits. The goal is to find the best assignments which reduce the component count and switching activity. The MOGA employs a Pareto ranking scheme and produces a set of state assignments, which are optimal in both objectives. The ESPRESSO tool is used to optimise the combinational parts of the sequential circuits. Experimental results are given using a personal computer with an Intel CPU of 2.4 GHz and 2 GB RAM. The algorithm is implemented using C++ and fully tested with benchmark examples. The experimental results show that saving in components and switching activity are achieved in most of the benchmarks tested compared with recent published research. - Author(s): S. Almukhaizim and O. Sinanoglu
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 306 –315
- DOI: 10.1049/iet-cdt.2009.0075
- Type: Article
- + Show details - Hide details
-
p.
306
–315
(10)
N-modular redundancy (NMR) is the simplest and most effective fault-tolerant design method for integrated circuits, where N copies of a circuit are employed and a majority voter produces the voted output. Asynchronous circuits, however, exhibit various characteristics that limit the applicability of NMR. Specifically, the hazard-free property of the output in these circuits must be preserved when hardware providing fault tolerance, such as a majority voter, is added. In this work, we first demonstrate that a typical majority voter design would fail to preserve the hazard-free property of its response. We then propose a hazard-free majority voter design for the triple-modular redundancy fault-tolerance design paradigm, which enters an output-holding state to preserve the output value when transient errors may be sensitised to its inputs. By exploring various conditions to exit from the output-holding state, we describe several extensions of the voter into an NMR one, each yielding a distinct implementation with different tolerance characteristics and area cost. We generalise this extension based on the exit condition and analyse the associated tolerance capability of the extended NMR voter. Finally, the proposed hazard-free voter is simulated using HSPICE, and detailed area cost formulations are derived for the proposed voter designs.
Fault tolerance for nanotechnology devices at the bit and module levels with history index of correct computation
Field programmable gate array-based acceleration of shortest-path computation
Row-linear feedback shift register-column X-masking technique for simultaneous testing of many-core system chips
Two-dimensional partially functional broadside tests
History-aware, resource-based dynamic scheduling for heterogeneous multi-core processors
Fault model and test procedure for phase change memory
Blur identification with assumption validation for sensor-based video reconstruction and its implementation on field programmable gate array
Towards virtual memory support in real-time and memory-constrained embedded applications: the interval page table
State assignment for sequential circuits using multi-objective genetic algorithm
Novel hazard-free majority voter for N-modular redundancy-based fault tolerance in asynchronous circuits
-
- Author(s): K.S. Stevens and A. Yakovlev
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 316 –317
- DOI: 10.1049/iet-cdt.2011.9055
- Type: Article
- + Show details - Hide details
-
p.
316
–317
(2)
- Author(s): M.R. Casu
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 318 –330
- DOI: 10.1049/iet-cdt.2010.0116
- Type: Article
- + Show details - Hide details
-
p.
318
–330
(13)
Synchronous elastic circuits borrow the tolerance of computation and communication latencies from the asynchronous design style. The datapath is made elastic by turning registers into elastic buffers and adding a control layer that uses synchronous handshake signals and join/fork controllers. Join elements are the objective of two improvements discussed in this study. Half-buffer retiming allows the creation of input queues by relocating one of the latches of the elastic buffer which follows the join controller. Token cages improve the performance of join controllers that use the early-evaluation firing rule. Their effect on throughput is discussed by means of examples representative of typical topologies, simulations with synthetic benchmarks and a realistic microarchitecture. Area and power costs of the control logic and the possible impact on the datapath are evaluated, based on the results of logic synthesis experiments on a 45 nm CMOS technology. - Author(s): W.B. Toms and D.A. Edwards
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 331 –341
- DOI: 10.1049/iet-cdt.2010.0107
- Type: Article
- + Show details - Hide details
-
p.
331
–341
(11)
Self-timed circuits present an attractive solution to the problem of process variation. However, implementing self-timed combinational logic is complex and expensive. As there are no external timing references, data must be encoded within an unordered (DI) encoding and the outputs of functions must indicate to the environment that transitions on inputs and internal signals have taken place. Mapping large function blocks into cell-libraries is extremely difficult as decomposing gates introduces new signals which may violate indication. This study presents a novel method for implementing any m-of-n-encoded function block using ‘bounded gates’, where any gate may be decomposed without violating indication. This is achieved by successively decomposing the input encoding into smaller unordered codes. The study presents algorithms to determine and quantify potential re-encodings. An exact branch and bound approach to the solution is shown, but the complexity of determining unordered encodings restricts the size of function blocks that may be decomposed. To overcome this problem, an approach has been proposed that uses algebraic extraction techniques to efficiently determine and quantify potential encodings. The results of the synthesis procedures are demonstrated on a range of combinational function blocks. - Author(s): O.C. Akgun ; J.N. Rodrigues ; J. Sparsø
- Source: IET Computers & Digital Techniques, Volume 5, Issue 4, p. 342 –353
- DOI: 10.1049/iet-cdt.2010.0118
- Type: Article
- + Show details - Hide details
-
p.
342
–353
(12)
This study addresses the design of self-timed energy-minimum circuits, operating in the sub-VT domain and a generic implementation template using bundled-data circuitry and current sensing completion detection (CSCD). Furthermore, a fully decoupled latch controller was developed, which integrates with the current-sensing circuitry. Different configurations that utilise the proposed latch controller are highlighted. A contemporary synchronous electronic design automation tools-based design flow, which transforms a synchronous design into a corresponding self-timed circuit, is outlined. Different use cases of the CSCD system are examined. The design flow and the current-sensing technique are validated by the implementation of a self-timed version of a wavelet-based event detector for cardiac pacemaker applications in a standard 65 nm CMOS process. The chip was fabricated and verified to operate down to 250 mV. Spice simulations indicate a gain of 52.58% in throughput because of asynchronous operation. By trading the throughput improvement, energy dissipation is reduced by 16.8% at the energy-minimum supply voltage.
Editorial: Selected papers from the 16th IEEE International Symposium on Asynchronous Circuits and Systems
Half-buffer retiming and token cages for synchronous elastic circuits
Indicating combinational logic decomposition
Energy-minimum sub-threshold self-timed circuits using current-sensing completion detection
Most viewed content for this Journal
Article
content/journals/iet-cdt
Journal
5
Most cited content for this Journal
-
High-performance elliptic curve cryptography processor over NIST prime fields
- Author(s): Md Selim Hossain ; Yinan Kong ; Ehsan Saeedi ; Niras C. Vayalil
- Type: Article
-
Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits
- Author(s): Aiman H. El-Maleh
- Type: Article
-
Scalable GF(p) Montgomery multiplier based on a digit–digit computation approach
- Author(s): M. Morales-Sandoval and A. Diaz-Perez
- Type: Article
-
Fabrication and characterisation of Al gate n-metal–oxide–semiconductor field-effect transistor, on-chip fabricated with silicon nitride ion-sensitive field-effect transistor
- Author(s): Rekha Chaudhary ; Amit Sharma ; Soumendu Sinha ; Jyoti Yadav ; Rishi Sharma ; Ravindra Mukhiya ; Vinod K. Khanna
- Type: Article
-
Adaptively weighted round-robin arbitration for equality of service in a many-core network-on-chip
- Author(s): Hanmin Park and Kiyoung Choi
- Type: Article