IET Computers & Digital Techniques
Volume 12, Issue 3, May 2018
Volumes & issues:
Volume 12, Issue 3
May 2018
-
- Author(s): Simi Zerine Sleeba ; John Jose ; Maniyelil Govindankutty Mini
- Source: IET Computers & Digital Techniques, Volume 12, Issue 3, p. 69 –79
- DOI: 10.1049/iet-cdt.2017.0006
- Type: Article
- + Show details - Hide details
-
p.
69
–79
(11)
New generation multi-processor system-on-chips integrate hundreds of processing elements in a single chip which communicate with each other through on-chip communication networks, commonly known as network-on-chip (NoC). Routers are the most critical NoC components and deflection routing is a technique used in buffer-less routers for better energy efficiency. Massive integration of devices along with fabrication at deep sub-micron level feature sizes increases the possibility of wear out and damage to various components resulting in unreliable operation of the chip. Hence NoC fabric in general and routers, in particular, should be equipped with built-in fault tolerance mechanisms to ensure the reliability of the chip in the presence of faults. The authors propose an energy-efficient routing technique that can tolerate permanent faults in NoC links by introducing a simple logic unit placed next to the output port allocation stage of the deflection router pipeline. This technique incurs minimum wiring overheads and promises a stable network throughput for high fault rates. Evaluation of the proposed method on 8 × 8 mesh NoC for various fault rates reports reduced flit deflection rate and hop power which brings about a significant reduction in dynamic power consumption at the inter-router links compared to state-of-the-art fault tolerance techniques.
- Author(s): Irith Pomeranz
- Source: IET Computers & Digital Techniques, Volume 12, Issue 3, p. 80 –86
- DOI: 10.1049/iet-cdt.2017.0032
- Type: Article
- + Show details - Hide details
-
p.
80
–86
(7)
Functional broadside tests are important for avoiding overtesting of delay faults during the application of scan-based tests. Multicycle tests have advantages in defect detection and test compaction. This study addresses the on-chip generation of primary input sequences for the application of multicycle functional broadside tests to a circuit that is embedded in a larger design. In this study, multicycle functional broadside tests are considered under two types of constraints: (i) functional constraints that the design imposes on the primary input patterns of the circuit, and (ii) test application constraints when direct access to the primary inputs of the circuit is not available, and the application of two or more consecutive primary input patterns requires hardware support. The use of multicycle functional broadside tests also results in an increased fault coverage.
- Author(s): Yang Zhang ; Zuocheng Xing ; Chuan Tang ; Cang Liu
- Source: IET Computers & Digital Techniques, Volume 12, Issue 3, p. 87 –94
- DOI: 10.1049/iet-cdt.2017.0004
- Type: Article
- + Show details - Hide details
-
p.
87
–94
(8)
Graphics processing units (GPUs) are playing more important roles in parallel computing. Using their multi-threaded execution model, GPUs can accelerate many parallel programmes and save energy. In contrast to their strong computing power, GPUs have limited on-chip memory space which is easy to be inadequate. The throughput-oriented execution model in GPU introduces thousands of hardware threads, which may access the small cache simultaneously. This will cause cache thrashing and contention problems and limit GPU performance. Motivated by these issues, the authors put forward a locality-protected method based on instruction programme counter (LPC) to make use of data locality in L1 data cache with very low hardware overhead. First, they use a simple Program Counter (PC)-based locality detector to collect reuse information of each cache line. Then, a hardware-efficient prioritised cache allocation unit is proposed to coordinate data reuse information with time-stamp information to predict the reuse possibility of each cache line, and to evict the line with the least reuse possibility. Their experiment on the simulator shows that LPC provides an up to 17.8% speedup and an average of 5.0% improvement over the baseline method with very low overhead.
- Author(s): Nehal N. Shah and Upena D. Dalal
- Source: IET Computers & Digital Techniques, Volume 12, Issue 3, p. 95 –104
- DOI: 10.1049/iet-cdt.2016.0178
- Type: Article
- + Show details - Hide details
-
p.
95
–104
(10)
Fast search block matching algorithm (BMA)-based video coding provides reasonable good quality video with minute cost of computation. In fast BMA, clock cycles required to read pixel data are quite more compared with matching operation due to erratic location of candidate macroblocks (CMBs). With aim of reduction in number of clock cycles, parallel memory system is used in this study, which can accelerate reading of CMBs and speedup motion vector (MV) computation. Novel concept of register array is introduced to organise CMBs, which expedite computation hungry search process. Owing to shape of register array, lesser space is needed to store CMBs and architecture addresses wide range of search patterns. The proposed sum of absolute difference processor with parallel memory system computes MV of 1 macroblock in 28 clock cycles in average case. Compared to single memory system, it saves 68% and 80% clock cycles in CMB access of initial search and intermediate search process, respectively. Hardware architecture is tested with Xilinx Virtex5 field programmable gate array. The proposed fixed 8×8 macroblock size architecture processes 354 high definition (HD) (1080p) frames per second (fps) and configurable architecture processes 201 HD fps which is more than adequate for real-time encoding.
- Author(s): Hao Xiao ; Xiang Yin ; Ning Wu ; Xin Chen ; Jun Li ; Xiaoxing Chen
- Source: IET Computers & Digital Techniques, Volume 12, Issue 3, p. 105 –110
- DOI: 10.1049/iet-cdt.2017.0060
- Type: Article
- + Show details - Hide details
-
p.
105
–110
(6)
Fast Fourier transform (FFT) plays an important role in digital signal processing systems. In this study, the authors explore the very large-scale integration (VLSI) design of high-precision fixed-point reconfigurable FFT processor. To achieve high accuracy under the limited wordlength, this study analyses the quantisation noise in FFT computation and proposes the mixed use of multiple scaling approaches to compensate the noise. In addition, a statistics-based optimisation scheme is proposed to configure the scaling operations of the cascaded arithmetic blocks at each stage for yielding the most optimised accuracy for a given FFT length. On the basis of this approach, they further present a VLSI implementation of area-efficient and high-precision FFT processor, which can perform power-of-two FFT from 32 to 8192 points. By using the SMIC process, the area of the proposed FFT processor is with a maximum operating frequency of 400 MHz. When the FFT processor is configured to perform 8192-point FFT at 40 MHz, the signal-to-quantisation-noise ratio is up to 53.28 dB and the power consumption measured by post-layout simulation is 35.7 mW.
- Author(s): Yanyun Tao ; Yuzhen Zhang ; Qinyu Wang ; Jian Cao
- Source: IET Computers & Digital Techniques, Volume 12, Issue 3, p. 111 –120
- DOI: 10.1049/iet-cdt.2016.0199
- Type: Article
- + Show details - Hide details
-
p.
111
–120
(10)
As the development of deep-submicron and nano-technology, leakage power minimisation becomes as important as dynamic power reduction in IC design. In order to achieve low-power state assignment for finite-state machine (FSM) synthesis, a multi-population genetic algorithm (MPGA)-based state assignment method is proposed. MPGA consists of an outer-loop and a set of inner-GAs. In MPGA, inner-GA is a local search component for finding low-power state assignment. Selection, crossover and mutation are used to perform variations on individuals. Cost function is defined based on power dissipation formulation of complementary metal oxide semiconductor (CMOS) gate for dynamic power and leakage power estimation. The outer-loop is used to optimise the parameters of inner-genetic algorithm (GA) through population variation schema, intra-specific competition and newborn. Twenty-three FSMs that were commonly used as benchmarks are employed to test the effectiveness of MPGA and compare different state assignment methods. Experimental results show MPGA achieves a significant improvement over the previous publications both on dynamic power and leakage power reduction in most benchmarks.
Energy-efficient fault tolerant technique for deflection routers in two-dimensional mesh Network-on-Chips
On-chip generation of primary input sequences for multicycle functional broadside tests
Locality-protected cache allocation scheme with low overhead on GPUs
Register array-based sum of absolute difference processor with parallel memory system for fast motion estimation
VLSI design of low-cost and high-precision fixed-point reconfigurable FFT processors
MPGA: an evolutionary state assignment for dynamic and leakage power reduction in FSM synthesis
Most viewed content
Most cited content for this Journal
-
High-performance elliptic curve cryptography processor over NIST prime fields
- Author(s): Md Selim Hossain ; Yinan Kong ; Ehsan Saeedi ; Niras C. Vayalil
- Type: Article
-
Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits
- Author(s): Aiman H. El-Maleh
- Type: Article
-
Scalable GF(p) Montgomery multiplier based on a digit–digit computation approach
- Author(s): M. Morales-Sandoval and A. Diaz-Perez
- Type: Article
-
Fabrication and characterisation of Al gate n-metal–oxide–semiconductor field-effect transistor, on-chip fabricated with silicon nitride ion-sensitive field-effect transistor
- Author(s): Rekha Chaudhary ; Amit Sharma ; Soumendu Sinha ; Jyoti Yadav ; Rishi Sharma ; Ravindra Mukhiya ; Vinod K. Khanna
- Type: Article
-
Adaptively weighted round-robin arbitration for equality of service in a many-core network-on-chip
- Author(s): Hanmin Park and Kiyoung Choi
- Type: Article