IET Computers & Digital Techniques
Volume 11, Issue 4, July 2017
Volumes & issues:
Volume 11, Issue 4
July 2017
-
- Author(s): Haythem Bahri ; Fatma Sayadi ; Randa Khemiri ; Marwa Chouchene ; Mohamed Atri
- Source: IET Computers & Digital Techniques, Volume 11, Issue 4, p. 125 –132
- DOI: 10.1049/iet-cdt.2016.0135
- Type: Article
- + Show details - Hide details
-
p.
125
–132
(8)
Optimising computing times of applications is an increasingly important task in many different areas such as scientific and industrial applications. Graphics processing unit (GPU) is considered as one of the powerful engines for computationally demanding applications since it proposes a highly parallel architecture. In this context, the authors introduce an algorithm to optimise the computing time of feature extraction methods for the colour image. They choose generalised Fourier descriptor (GFD) and generalised colour Fourier descriptor (GCFD) models, as a method to extract the image feature for various applications such as colour object recognition in real-time or image retrieval. They compare the computing time experimental results on central processing unit and GPU. They also present a case study of these experimental results descriptors using two platforms: a NVIDIA GeForce GT525M and a NVIDIA GeForce GTX480. Their experimental results demonstrate that the execution time can considerably be reduced until 34× for GFD and 56× for GCFD.
- Author(s): Giovani Gracioli and Antônio Augusto Fröhlich
- Source: IET Computers & Digital Techniques, Volume 11, Issue 4, p. 133 –139
- DOI: 10.1049/iet-cdt.2016.0114
- Type: Article
- + Show details - Hide details
-
p.
133
–139
(7)
A two-phase colour-aware real-time scheduler to reduce the contention caused by the cache coherence protocol due to accesses to shared cache partitions in a multicore processor is proposed. The first phase is a colour-aware task partitioning (CAP) algorithm that assigns tasks that share colours to a common processor whenever possible. The second phase is a dynamic colour-aware scheduler that detects cache coherence activities at run-time, preventing the execution of tasks that interfere with each other and thus reducing the contention caused by the cache coherence protocol. The authors compare the proposed scheduler with a CAP without run-time optimisation and with the best-fit decreasing heuristic in terms of deadline misses and tardiness of several task sets using a real-time operating system and a modern 8-core processor. The results indicate that the proposed scheduler improves deadline tardiness and provides hard real-time guarantees by combining cache and task partitioning with scheduling optimisations.
- Author(s): Usman Ali Gulzari ; Sheraz Anjum ; Shahrukh Aghaa ; Sarzamin Khan ; Frank Sill Torres
- Source: IET Computers & Digital Techniques, Volume 11, Issue 4, p. 140 –148
- DOI: 10.1049/iet-cdt.2016.0184
- Type: Article
- + Show details - Hide details
-
p.
140
–148
(9)
This study presents an efficient and scalable networks-on-chip (NoC) topology termed as cross-by-pass-mesh (CBP-Mesh). The proposed architecture is derived from the traditional mesh topology by addition of cross-by-pass links in the network. The design and impact of adding cross-by-pass links on the topology is analysed in detail with the help of synthetic, hotspot as well as embedded traffic traces. The advantages of proposed CBP-Mesh as compared with its competitor topologies include reduction in the network diameter, increase in bisection bandwidth, reduction in average numbers of hops, improvement in symmetry and regularity of the network. The synthetic traffic traces and some real embedded system workloads are applied on the proposed CBP-Mesh and its competitor two-dimensional-based NoC topologies. The comparison of analytical results in terms of performance and costs for different network dimensions indicate that the proposed CBP-Mesh offers short latency, high throughput and good scalability at small increase in power and energy.
- Author(s): Hao Zhang ; Dongdong Chen ; Seok-Bum Ko
- Source: IET Computers & Digital Techniques, Volume 11, Issue 4, p. 149 –158
- DOI: 10.1049/iet-cdt.2016.0100
- Type: Article
- + Show details - Hide details
-
p.
149
–158
(10)
In this study, an area and power-efficient iterative floating-point (FP) multiplier architecture is designed and implemented on FPGA devices with pipelined architecture. The proposed multiplier supports both single-precision (SP) and double-precision (DP) operations. The operation mode can be switched during run time by changing the precision selection signal. The Karatsuba algorithm is applied when mapping the mantissa multiplier in order to reduce the number of digital signal processing (DSP) blocks required. For DP operations, the iterative method is applied which require much less hardware than a fully pipelined DP multiplier and thus reduces the power consumption. To further reduce the power consumption, the unused logic blocks for a specific operation mode are disabled. Compared to previous work, the proposed multiplier can achieve 33% reduction of DSP blocks, 4.3% less look-up tables (LUTs), and 31.2% less flip-flops while having 4% faster clock frequency on Virtex-5 devices. Compared to the intellectual property core DP multiplier provided by the FPGA vendors, the proposed multiplier required less DSP blocks and achieves lower-power consumption. The mapping solutions and implementation results of the proposed multiplier on Xilinx Virtex-7 and Altera Arria-10 devices are also presented. In addition, the results of a direct implementation of the proposed architecture on STM-90 nm ASIC platform are reported.
- Author(s): Aiman H. El-Maleh
- Source: IET Computers & Digital Techniques, Volume 11, Issue 4, p. 159 –164
- DOI: 10.1049/iet-cdt.2016.0085
- Type: Article
- + Show details - Hide details
-
p.
159
–164
(6)
Recently, a finite state machine-based fault tolerance technique for sequential circuits based on protecting few states with high probability of occurrence has been proposed. In this study, the authors propose an algorithm that starts with a given state assignment targeting the optimisation of either area or power and generates a state assignment that preserves the original state assignment and satisfies the fault tolerance requirements for the protected states. Experimental results demonstrate the effectiveness of the proposed algorithm in significantly reducing the area and power of synthesised sequential circuits while enhancing their fault tolerance.
Image feature extraction algorithm based on CUDA architecture: case study GFD and GCFD
Two-phase colour-aware multicore real-time scheduler
Efficient and scalable cross-by-pass-mesh topology for networks-on-chip
Area- and power-efficient iterative single/double-precision merged floating-point multiplier on FPGA
Finite state machine-based fault tolerance technique with enhanced area and power of synthesised sequential circuits
Most viewed content
Most cited content for this Journal
-
High-performance elliptic curve cryptography processor over NIST prime fields
- Author(s): Md Selim Hossain ; Yinan Kong ; Ehsan Saeedi ; Niras C. Vayalil
- Type: Article
-
Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits
- Author(s): Aiman H. El-Maleh
- Type: Article
-
Scalable GF(p) Montgomery multiplier based on a digit–digit computation approach
- Author(s): M. Morales-Sandoval and A. Diaz-Perez
- Type: Article
-
Fabrication and characterisation of Al gate n-metal–oxide–semiconductor field-effect transistor, on-chip fabricated with silicon nitride ion-sensitive field-effect transistor
- Author(s): Rekha Chaudhary ; Amit Sharma ; Soumendu Sinha ; Jyoti Yadav ; Rishi Sharma ; Ravindra Mukhiya ; Vinod K. Khanna
- Type: Article
-
Adaptively weighted round-robin arbitration for equality of service in a many-core network-on-chip
- Author(s): Hanmin Park and Kiyoung Choi
- Type: Article