IET Computers & Digital Techniques
Volume 11, Issue 1, January 2017
Volumes & issues:
Volume 11, Issue 1
January 2017
-
- Author(s): Bahar Asgari ; Mahdi Fazeli ; Ahmad Patooghy ; Seyed Vahid Azhari
- Source: IET Computers & Digital Techniques, Volume 11, Issue 1, p. 1 –7
- DOI: 10.1049/iet-cdt.2015.0155
- Type: Article
- + Show details - Hide details
-
p.
1
–7
(7)
In this paper, the authors propose two approaches that employ spin transfer torque random access memory (STTRAM) in the design of the register file, an important part of embedded processors. However, STTRAM suffers from both endurance and latency in the write operation. Consequently, employing STTRAM in the register file entails two challenges: (i) the lifetime significantly decreases as data are frequently written into a register file; (ii) the delay of the critical path increases as a result of the slow write operation. They have proposed using two well-known micro-architectural approaches: the last-write-update (LWU) and the value locality (VL), to minimize the number of write operations into a register. The main observation behind LWU is that only the last write operations of certain registers in the re-order buffer should be considered. VL exploits the fact that a limited number of values are more likely to be written into a register file. Their simulation shows that by employing the LWU and VL approaches, the lifetime of the STTRAM-based register file, compared with the lifetime of traditional static RAM-based register file architecture, extends to about 40 years on average and around 3.5 years in the worst case while improving power consumption by 30%.
- Author(s): Hao Xiao ; Busheng Zheng ; Tsuyoshi Isshiki ; Hiroaki Kunieda
- Source: IET Computers & Digital Techniques, Volume 11, Issue 1, p. 8 –15
- DOI: 10.1049/iet-cdt.2015.0217
- Type: Article
- + Show details - Hide details
-
p.
8
–15
(8)
Ultra-wideband (UWB) is a well-known radio technology whose media access control (MAC) protocol, WiMedia MAC, has considerable potential to ensure high-speed and high-quality data communication for wireless personal area networks. However, these benefits involve a heavy computational workload, thereby posing a challenge to the conventional very-large-scale integration (VLSI) approach in terms of providing the required performance and power efficiency. Therefore, this study aims to optimise the VLSI implementation of the WiMedia MAC system by proposing a hybrid shared-memory and message-passing multiprocessor system-on-chip (MPSoC) architecture. The proposed solution combines the state-of-the-art MPSoC technology and application-specific instruction-set processor techniques to (i) accelerate the MAC protocol at task level by using parallel processing, (ii) enable the using of custom instructions to optimise the inter-processor communication by using an explicit message passing mechanism, and (iii) ease the implementation process by using a high-level software/hardware co-design methodology. The proposed platform is implemented on both system-level SystemC for architecture exploration and standard-cell technology for future chip implementation. Experimental results show that the proposed hybrid MPSoC architecture achieves 24% performance improvement and 22% power savings over the conventional shared-memory-only one.
- Author(s): Pawan Singh and Nirayo Hailu
- Source: IET Computers & Digital Techniques, Volume 11, Issue 1, p. 16 –23
- DOI: 10.1049/iet-cdt.2016.0097
- Type: Article
- + Show details - Hide details
-
p.
16
–23
(8)
In the current epoch, the energy consumption is a great concern in the online non-clairvoyant job scheduling. The online non-clairvoyant scheduling is studied less extensively than the online clairvoyant scheduling. The authors study non-clairvoyant scheduling problem of minimising the total prioritised flow time plus energy, where the jobs with arbitrary sizes and priorities arrive online. The authors consider unbounded speed model in multiprocessor settings, where the speed of m individual processors can vary from zero to infinity, i.e. [0, ∞) , to save the energy and to optimise the prioritised flow time plus energy. The authors consider the traditional power function , where s is the speed of a processor and α > 1, a constant. In this study, the authors introduce an online non-clairvoyant scheduling multiprocessor priority round robin (MPRR), which is -competitive; more precisely -competitive, i.e. the competitive ratio is 5.33 for α = 2 and 7.3 for α = 3. In this study, the algorithm is studied using the potential analysis against an optimal offline adversary.
- Author(s): Gaizhen Yan ; Ning Wu ; Fen Ge ; Hao Xiao ; Fang Zhou
- Source: IET Computers & Digital Techniques, Volume 11, Issue 1, p. 24 –32
- DOI: 10.1049/iet-cdt.2015.0198
- Type: Article
- + Show details - Hide details
-
p.
24
–32
(9)
Three-dimensional networks-on-chip are beneficial for performance improvement, but suffer from severe thermal issues. Dynamic thermal management (DTM) schemes have been proposed to keep the temperature below the thermal limit while improve the system performance. However, existing fully-throttling DTM schemes degrade the network availability and thus decrease the system performance. In this study, a novel collaborative fuzzy-based partially-throttling DTM (CFP-DTM) scheme is developed. Two main components are involved in the CFP-DTM: (i) a fuzzy-based clock gating scheme dynamically adjusting the throttling ratio and throttled nodes (ii) a highly adaptive throttling-aware routing scheme for packets to detour the easily congested channels. Experiments show that, compared with the fully-throttling based vertical throttling scheme, the proposed CFP-DTM can improve the throughput by 27.5% and reduce the thermal control oscillation by 3°C under the maximum system workload.
- Author(s): Md Selim Hossain ; Yinan Kong ; Ehsan Saeedi ; Niras C. Vayalil
- Source: IET Computers & Digital Techniques, Volume 11, Issue 1, p. 33 –42
- DOI: 10.1049/iet-cdt.2016.0033
- Type: Article
- + Show details - Hide details
-
p.
33
–42
(10)
This study presents a description of an efficient hardware implementation of an elliptic curve cryptography processor (ECP) for modern security applications. A high-performance elliptic curve scalar multiplication (ECSM), which is the key operation of an ECP, is developed both in affine and Jacobian coordinates over a prime field of size p using the National Institute of Standards and Technology standard. A novel combined point doubling and point addition architecture is proposed using efficient modular arithmetic to achieve high speed and low hardware utilisation of the ECP in Jacobian coordinates. This new architecture has been synthesised both in application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA). A 65 nm CMOS ASIC implementation of the proposed ECP in Jacobian coordinates takes between 0.56 and 0.73 ms for 224-bit and 256-bit elliptic curve cryptography, respectively. The ECSM is also implemented in an FPGA and provides a better delay performance than previous designs. The implemented design is area-efficient and this means that it requires not many resources, without any digital signal processing (DSP) slices, on an FPGA. Moreover, the area–delay product of this design is very low compared with similar designs. To the best of the authors’ knowledge, the ECP proposed in this study over performs better than available hardware in terms of area and timing.
- Author(s): Ranjan Kumar Barik and Manoranjan Pradhan
- Source: IET Computers & Digital Techniques, Volume 11, Issue 1, p. 43 –49
- DOI: 10.1049/iet-cdt.2016.0043
- Type: Article
- + Show details - Hide details
-
p.
43
–49
(7)
This study presents a generalised architecture for cube operation based on Yavadunam sutra of Vedic mathematics. This algorithm converts the cube of a large magnitude number into smaller magnitude number and addition operation. The Vedic sutra for decimal numbers is extended to binary radix-2 number system considering digital platforms. The cubic architecture is synthesised and simulated using Xilinx ISE 14.1 software and implemented on various Field-programmable gate array devices for comparison purpose. The Encounter(R) RTL Compiler RC13.10 v13.10-s006_1 of cadence tool is also used considering Application specific integrated circuit platform. The performance parameters such as delay, area and power are obtained from synthesis reports. The results show that the proposed architecture is useful for less area and high-speed application in microprocessor environment.
Micro-architectural approach to the efficient employment of STTRAM cells in a microprocessor register file
Hybrid shared-memory and message-passing multiprocessor system-on-chip for UWB MAC layer
Energy-aware online non-clairvoyant multiprocessor scheduling: multiprocessor priority round robin
Collaborative fuzzy-based partially-throttling dynamic thermal management scheme for three-dimensional networks-on-chip
High-performance elliptic curve cryptography processor over NIST prime fields
Efficient ASIC and FPGA implementation of cube architecture
Most viewed content
Most cited content for this Journal
-
High-performance elliptic curve cryptography processor over NIST prime fields
- Author(s): Md Selim Hossain ; Yinan Kong ; Ehsan Saeedi ; Niras C. Vayalil
- Type: Article
-
Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits
- Author(s): Aiman H. El-Maleh
- Type: Article
-
Scalable GF(p) Montgomery multiplier based on a digit–digit computation approach
- Author(s): M. Morales-Sandoval and A. Diaz-Perez
- Type: Article
-
Fabrication and characterisation of Al gate n-metal–oxide–semiconductor field-effect transistor, on-chip fabricated with silicon nitride ion-sensitive field-effect transistor
- Author(s): Rekha Chaudhary ; Amit Sharma ; Soumendu Sinha ; Jyoti Yadav ; Rishi Sharma ; Ravindra Mukhiya ; Vinod K. Khanna
- Type: Article
-
Adaptively weighted round-robin arbitration for equality of service in a many-core network-on-chip
- Author(s): Hanmin Park and Kiyoung Choi
- Type: Article