IET Computers & Digital Techniques
Volume 14, Issue 6, November 2020
Volumes & issues:
Volume 14, Issue 6
November 2020
-
- Author(s): Mingfu Xue ; Chongyan Gu ; Weiqiang Liu ; Shichao Yu ; Máire O'Neill
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 231 –246
- DOI: 10.1049/iet-cdt.2020.0041
- Type: Article
- + Show details - Hide details
-
p.
231
–246
(16)
Hardware Trojan detection techniques have been studied extensively. However, to develop reliable and effective defenses, it is important to figure out how hardware Trojans are implemented in practical scenarios. The authors attempt to make a review of the hardware Trojan design and implementations in the last decade and also provide an outlook. Unlike all previous surveys that discuss Trojans from the defender's perspective, for the first time, the authors study the Trojans from the attacker's perspective, focusing on the attacker's methods, capabilities, and challenges when the attacker designs and implements a hardware Trojan. First, the authors present adversarial models in terms of the adversary's methods, adversary's capabilities, and adversary's challenges in seven practical hardware Trojan implementation scenarios: in-house design team attacks, third-party intellectual property vendor attacks, computer-aided design tools attacks, fabrication stage attacks, testing stage attacks, distribution stage attacks, and field-programmable gate array Trojan attacks. Second, the authors analyse the hardware Trojan implementation methods under each adversarial model in terms of seven aspects/metrics: hardware Trojan attack scenarios, the attacker's motivation, feasibility, detectability (anti-detection capability), protection and prevention suggestions for the designer, overhead analysis, and case studies of Trojan implementations. Finally, future directions on hardware Trojan attacks and defenses are also discussed.
Ten years of hardware Trojans: a survey from the attacker's perspective
-
- Author(s): M. Mohamed Asan Basiri
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 247 –255
- DOI: 10.1049/iet-cdt.2020.0038
- Type: Article
- + Show details - Hide details
-
p.
247
–255
(9)
Discrete wavelet transform (DWT) is widely used in the image and video compression due to its high compression ratio and resolution. This study proposes efficient very large scale integration (VLSI) architectures of lifting based 3D-DWT using (5,3) and (9,7) Daubechies wavelets. The advantage of these proposed architectures is the absence of storage buffer in between the row, column, and temporal processes. Also, five and nine numbers of frames of the 3D signal can be processed in parallel using the proposed (5,3) and (9,7) lifting based DWTs, respectively. Due to this parallelism and the elimination of storage buffers, the throughput of the proposed design is greater than other existing techniques. The authors have implemented all the existing and proposed 3D-DWTs using 45 nm CMOS library with Cadence and Artix-7 FPGA with Xilinx Vivado. The synthesis results show that the proposed designs achieve significant improvement in throughput than various existing designs. For example, the proposed (9,7) lifting based 3D-DWT achieves 85.4% of improvement in the throughput than the conventional design.
- Author(s): Nejmeddine Bahri and Randa Khemiri
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 256 –262
- DOI: 10.1049/iet-cdt.2019.0197
- Type: Article
- + Show details - Hide details
-
p.
256
–262
(7)
High-efficiency video coding (HEVC) is the latest video coding standard aimed to reduce the bitrate by half for the same video quality compared to H.264/AVC. This encoding performance makes HEVC more suitable for high-definition video applications. However, this performance is coupled with a high-computational complexity, which makes it hard to achieve real-time video encoding with a classic embedded processor. Multicore technology of programmable processors could be a very promising solution to overcome this computational complexity. Moreover, software optimisations by proposing fast algorithms for the most complex functions could also be an efficient solution to speed up the encoding process. In this context, this study presents a fast mode decision algorithm for the intra prediction module. This algorithm aims to reduce the number of intra prediction modes to be tested instead of performing a full intra mode search. Experimental results for all-Intra configuration show that the proposed fast intra mode decision allows saving up to 46.79% of the intra prediction time in average. Encoding performance in terms of video quality and bitrate is not significantly affected.
- Author(s): Khokan Mondal ; Subhajit Das ; Tuhina Samanta
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 263 –271
- DOI: 10.1049/iet-cdt.2020.0010
- Type: Article
- + Show details - Hide details
-
p.
263
–271
(9)
The coupling capacitance and inductance of 2D and 3D integrated circuit (IC) interconnects in deep sub-micron technology has been increased due to reduced coupling distance in such a way that their magnitudes become comparable to the area and fringing capacitance of an interconnect. This leads to an increasing risk of failure due to unintentional noise and a need for accurate noise assessment. Incorrect noise estimation could either result in defects in circuit design if the design resources are understated or it will end up with a waste of overestimation resources. In this study, a crosstalk noise model for coupled RLC on-chip interconnects has been demonstrated. Subsequently, a novel time-efficient method is proposed to estimate and optimise the crosstalk noise precisely. The proposed method calculates coupling noise as well as optimises crosstalk noise, which has been validated using SPICE. Besides the estimation of crosstalk noise for 2D interconnect, this study also estimates the crosstalk noise for through-silicon-via (TSV), which is used to connect different dies vertically in a 3D IC. Under high-frequency operation, effects of signal rise time, TSV structure (height of the TSV), substrate resistivity and the guarding TSV termination on crosstalk noise have also been studied in this work.
- Author(s): Weng Xiaodong ; Liu Yi ; Yang Yintang
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 272 –280
- DOI: 10.1049/iet-cdt.2019.0212
- Type: Article
- + Show details - Hide details
-
p.
272
–280
(9)
With the development of network-on-chip (NoC) theory, lots of mapping algorithm have been proposed to solve the application mapping problem which is an NP-hard (non-polynomial hard) problem. Most algorithms are based on a heuristic algorithm. They are trapped by iterations limited, not by the distance between iterations, because of the isomorphism of mapping sequence. In this study, the authors define and analyse the isomorphism with the genetic algorithm (GA) which is a heuristic algorithm. Then, they proposed an approach called density direction transform algorithm to eliminate the isomorphism of mapping sequence and accelerate the convergence of population. To verify this approach, they developed a density-direction-based genetic mapping algorithm (DDGMAP) and make a comparison with genetic mapping algorithm (GMA). The experiment demonstrates that compared to the random algorithm, their algorithm (DDGMAP) can achieve on an average 23.48% delay reduction and 7.15% power reduction. And DDGMAP gets better performance than GA in searching the optimal solution.
- Author(s): Lalengmawia Chhangte and Alok Chakrabarty
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 281 –289
- DOI: 10.1049/iet-cdt.2019.0257
- Type: Article
- + Show details - Hide details
-
p.
281
–289
(9)
Quantum computers that are based on technologies like superconducting and quantum dots impose a physical constraint that requires interacting qubits to be adjacent. The initial placement of qubits and the swap gate insertion techniques affect the circuit cost. The authors proposed a global qubit ordering technique that considers fewer permutations for the number of interactions a qubit does with other qubits of its circuit. They also performed the local re-ordering of qubits by attempting to reduce the cost as much as possible; the cost is estimated by defining a window with weights assigned in such a way that nearby gates to the current gate in question are given higher weightage. Experiments have been conducted on NCV benchmarks, and results have been compared with those of recent state-of-the-art techniques. When compared with the existing works, the proposed method shows improvements of up to 53.3% for smaller benchmarks and up to 51.61% for larger benchmarks.
- Author(s): Sagar Reddy Vumanthala and Bikshalu Kalagadda
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 290 –298
- DOI: 10.1049/iet-cdt.2020.0034
- Type: Article
- + Show details - Hide details
-
p.
290
–298
(9)
In this study, the authors present a novel speech enhancement method by exploring the benefits of non-local means (NLM) estimation and optimised empirical mode decomposition (OEMD) adopting cubic-spline interpolation. The optimal parameters responsible for improving the performance are estimated using the path-finder algorithm. At first, the noisy speech signal is decomposed into many scaled signals called intrinsic-mode functions (IMFs) through the use of a temporary decomposition method is called sifting process in OEMD approach. The obtained IMFs are processed by NLM estimation technique in terms of non-local similarities present in each IMF, to reduce the ill-effects caused by interfering noise. The proposed NLM-based method is effective to eliminate the noise of less-frequency. Each IMF contains essential information about the signals, on some scale or frequency band. Field programmable gate array architecture is implemented on a Xilinx ISE 14.5 and the result of the proposed method offers good performance with a high signal-to-noise ratio (SNR) and low mean-square error compared to other approaches. The performance evolution is carried out for different speech signals taken from the TIMIT database and noises taken from the NOISEX-92 database in different SNR stages of 0, 5 and 10 dB, respectively.
- Author(s): Sumitra Velayudham ; Sivakumar Rajagopal ; Yeragudipati Venkata Ramana Rao ; Seok-Bum Ko
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 299 –312
- DOI: 10.1049/iet-cdt.2019.0082
- Type: Article
- + Show details - Hide details
-
p.
299
–312
(14)
A configurable self-calibrated power efficient five-bit error correction code is proposed to correct both single bit random and burst errors up to five bits; providing 100% error correction probability with crosstalk avoidance. It can also correct higher-order error up to 9 bits with an error correction probability tolerance of 73% for on-chip interconnection links. Single error correction and double error detection with extended Hamming code (22,16) is utilised along with standard triplication error correction methods in the proposed code. Self-calibration algorithm and data stream rerouting block are integrated into the error correction code to achieve power efficiency. Reliability, link power consumption, and link swing voltage are estimated using an analytical model used in a network-on-chip. Area, power, and delay of the codec are obtained using Synopsys tools utilising UMC 90 nm technology. The proposed method provides 32–73% power saving and 22.3–60.6% delay reduction with negligible area overhead compared with the state-of-the-art works. Estimated results prove that it provides a 40.5–50% reduction in link swing voltage and link power consumption compared with the state-of-the-art works. The proposed code is more appropriate for on-chip interconnect links where it provides high reliability and low swing voltage with high error correction capability compared with existing codes.
- Author(s): Mahdi Abbasi and Milad Rafiee
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 313 –321
- DOI: 10.1049/iet-cdt.2019.0118
- Type: Article
- + Show details - Hide details
-
p.
313
–321
(9)
The categorisation of network packets according to multiple parameters such as sender and receiver addresses is called packet classification. Packet classification lies at the core of Software-Defined Networking (SDN)-based network applications. Due to the increasing speed of network traffic, there is an urgent need for packet classification at higher speeds. Although it is possible to accelerate packet classification algorithms through hardware implementation, this solution imposes high costs and offers limited development capacity. On the other hand, current software methods to solve this problem are relatively slow. A practical solution to this problem is to parallelise packet classification using multi-core processors. In this study, the Thread, parallel patterns library (PPL), open multi-processing (OpenMP), and threading building blocks (TBB) libraries are examined and implemented to parallelise three packet classification algorithms, i.e. tuple space search, tuple pruning search, and hierarchical tree. According to the results, the type of algorithm and rulesets may influence the performance of parallelisation libraries. In general, the TBB-based method shows the best performance among parallelisation libraries due to using a theft mechanism and can accelerate the classification process up to 8.3 times on a system with a quad-core processor.
- Author(s): Rajib Lochan Jana ; Soumyajit Dey ; Arijit Mondal ; Pallab Dasgupta
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 322 –335
- DOI: 10.1049/iet-cdt.2019.0283
- Type: Article
- + Show details - Hide details
-
p.
322
–335
(14)
Bug traces serve as references for patching a microprocessor design after a bug has been found. Unless the root cause of a bug has been detected and patched, variants of the bug may return through alternative bug traces, following a different sequence of micro-architectural events. To avoid such a situation, the verification engineer must think of every possible way in which the bug may return, which is a complex problem for a modern microprocessor. This study proposes a methodology which gleans high-level descriptions of the micro-architectural steps and uses them in an artificial Intelligence planning framework to find alternative pathways through which a bug may return. The plans are then translated to simulation test cases which explore these potential bug scenarios. The planning tool essentially automates the task of the verification engineer towards exploring possible alternative sequences of micro-architectural steps that may allow a bug to return. The proposed methodology is demonstrated in three case studies.
- Author(s): Mrinal Goswami ; Jayanta Pal ; Mayukh Roy Choudhury ; Pritam P. Chougule ; Bibhash Sen
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 336 –343
- DOI: 10.1049/iet-cdt.2020.0008
- Type: Article
- + Show details - Hide details
-
p.
336
–343
(8)
The conventional computing system has been facing enormous pressure to cope with the uprising demand for computing speed in today's world. In search of high-speed computing in the nano-scale era, it becomes the utmost necessity to explore a viable alternative to overcome the challenges of the physical limit of complementary-metal-oxide-semiconductor (CMOS). Towards that direction, the processing-in-memory (PIM) is advancing its importance as it keeps the computation as adjacent as possible to memory. It promises to outperform the latencies of the conventional stored-program concept by embedding storage and data computation in a single unit. On the other hand, the bit storing and processing capability of Akers array provides the foundation of PIM. Again, quantum-dot cellular automata (QCA) emerges as a promising nanoelectronic to put back CMOS to give fast-paced devices at the nanoelectronics era. This work presents a novel PIM concept, embedding Akers array in QCA to achieve high-speed computing at the nano-scale era. QCA implementation of universal logic utilizing Akers array signifies its processing power and puts forth its potentials. A universal function is considered for testing the effectiveness of the proposed PIM cell. The performance evaluation indicates the efficacy of QCA PIM over the conventional Von Neumann architecture.
- Author(s): Karim Shahbazi and Seok-Bum Ko
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 344 –352
- DOI: 10.1049/iet-cdt.2019.0179
- Type: Article
- + Show details - Hide details
-
p.
344
–352
(9)
This study presents a high throughput field-programmable gate array (FPGA) implementation of advanced encryption standard-128 (AES-128). AES is a well-known symmetric key encryption algorithm with high security against different attacks that are widely used in different applications. The main goal of this study is to design a high throughput and FPGA efficiency (FPGA-Eff) cryptosystem for high-traffic applications. To achieve high throughput, loop-unrolling, inner and outer pipelining techniques are employed. In AES, substitution bytes (Sub-Bytes) is one of the costly functions that occupy a large number of resources and has a large delay. To reduce the area of Sub-Bytes, new-affine-transformation, which is the combination of inverse isomorphic and affine transformation, is proposed and employed. Besides that, AES has been modified according to the proposed architecture. For the first nine rounds, Shift-Rows and Sub-Bytes have been exchanged, and Shift-Rows is merged with Add-Round-Key. To make an equal latency between stages, Mix-Columns is divided into two different stages. AES is implemented in counter mode on Xilinx Virtex-5 using VHDL. The proposed implementation achieves a throughput of 79.7 Gbps, FPGA-Eff of 13.3 Mbps/slice, and frequency of 622.4 MHz. Compared to the state-of-the-art work, the proposed design has improved data throughput by 8.02% and FPGA-Eff by 22.63%.
Efficient VLSI architectures of lifting based 3D discrete wavelet transform
Optimised HEVC encoder intra-only configuration
Rectilinear routing algorithm for crosstalk minimisation in 2D and 3D IC
Network-on-chip heuristic mapping algorithm based on isomorphism elimination for NoC optimisation
Technique for two-dimensional nearest neighbour realisation of quantum circuits using weighted look-ahead
Real-time speech enhancement using optimised empirical mode decomposition and non-local means estimation
Power efficient error correction coding for on-chip interconnection links
Efficient parallelisation of the packet classification algorithms on multi-core central processing units using multi-threading application program interfaces
Automated planning for finding alternative bug traces
In memory computation using quantum-dot cellular automata
High throughput and area-efficient FPGA implementation of AES for high-traffic applications
-
- Author(s): Xiaokun Yang ; Shi Sha ; Ishaq Unwala ; Jiang Lu
- Source: IET Computers & Digital Techniques, Volume 14, Issue 6, p. 353 –362
- DOI: 10.1049/iet-cdt.2019.0090
- Type: Article
- + Show details - Hide details
-
p.
353
–362
(10)
To integrate third-party intellectual properties (IPs) into a new system-on-chip (SoC) architecture is a big challenge. Therefore, this study first presents a new bus protocol named as integrated bus (IBUS), and more important, a configurable bus wrapper for connecting AXI3-interfaced IPs into IBUS is further proposed, aiming to finding the optimal balance between bus efficiency and resource cost in terms of field-programming gate array slice count, bus transfer latency, and energy consumption. As a case study, the authors implemented three IBUS wrappers for integrating three AXI3-interfaced verification IPs into an IBUS SoC. Experimental results show that their proposed work achieves a higher valid data throughput ( in the block test and in the cipher test) compared with the designs on conventional bridge-based SoC integration, as well as a large reduction in the normalised slice-time-power (18.73% in the block benchmark and 23.45% in the cipher benchmark) when setting the same weights of slice number, data transfer latency, and energy dissipation.
Towards IP integration on SoC: a case study of high-throughput and low-cost wrapper design on a novel IBUS architecture
Most viewed content
Most cited content for this Journal
-
High-performance elliptic curve cryptography processor over NIST prime fields
- Author(s): Md Selim Hossain ; Yinan Kong ; Ehsan Saeedi ; Niras C. Vayalil
- Type: Article
-
Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits
- Author(s): Aiman H. El-Maleh
- Type: Article
-
Scalable GF(p) Montgomery multiplier based on a digit–digit computation approach
- Author(s): M. Morales-Sandoval and A. Diaz-Perez
- Type: Article
-
Fabrication and characterisation of Al gate n-metal–oxide–semiconductor field-effect transistor, on-chip fabricated with silicon nitride ion-sensitive field-effect transistor
- Author(s): Rekha Chaudhary ; Amit Sharma ; Soumendu Sinha ; Jyoti Yadav ; Rishi Sharma ; Ravindra Mukhiya ; Vinod K. Khanna
- Type: Article
-
Adaptively weighted round-robin arbitration for equality of service in a many-core network-on-chip
- Author(s): Hanmin Park and Kiyoung Choi
- Type: Article