IET Computers & Digital Techniques
Volume 13, Issue 6, November 2019
Volumes & issues:
Volume 13, Issue 6
November 2019
-
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 415 –416
- DOI: 10.1049/iet-cdt.2019.0233
- Type: Article
- + Show details - Hide details
-
p.
415
–416
(2)
- Author(s): Dipika Deb ; John Jose ; Maurizio Palesi
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 417 –428
- DOI: 10.1049/iet-cdt.2019.0035
- Type: Article
- + Show details - Hide details
-
p.
417
–428
(12)
With the increase in processing cores performance have increased, but energy consumption and memory access latency have become a crucial factor in determining system performance. In tiled chip multiprocessor, tiles are interconnected using a network and different application runs in different tiles. Non-uniform load distribution of applications results in varying L1 cache usage pattern. Application with larger memory footprint uses most of its L1 cache. Prefetching on top of such application may cause cache pollution by evicting useful demand blocks from the cache. This generates further cache misses which increases the network traffic. Therefore, an inefficient prefetch block placement strategy may result in generating more traffic that may increase congestion and power consumption in the network. This also dampens the packet movement rate which increases miss penalty at the cores thereby affecting Average Memory Access Time (AMAT). The authors propose an energy-efficient caching strategy for prefetch blocks, ECAP. It uses the less used cache set of nearby tiles running light applications as virtual cache memories for the tiles running high applications to place the prefetch blocks. ECAP reduces AMAT, router and link power in NoC by 23.54%, 14.42%, and 27%, respectively as compared to the conventional prefetch placement technique.
- Author(s): Sumanta Pyne
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 429 –442
- DOI: 10.1049/iet-cdt.2019.0028
- Type: Article
- + Show details - Hide details
-
p.
429
–442
(14)
The wake up of power gating (PG) components leads to flow of inrush current which quickly discharges the battery. An arrangement of instruction controlled hybrid battery supercapacitor (SC) elongates battery life in systems with PG. The present work improves a batterysingle SC system (BSC) model to its equivalent batterydual SC system (B2SC). Instructions: disconnect battery (db) and connect battery (cb) have been introduced along with architectural support for B2SC. During wakeup db disconnects (i) battery from the PG components, and (ii) either one or both of the SCs from the battery. Hence, simultaneously either both SCs can discharge or one can discharge while the other's charging. While cb connects the battery to the PG components and SCs. A suboptimal version of B2SC (B2SCsopt) is introduced, where the SCs are connected to the PG components requiring higher inrush current while rest remains connected to the battery. The efficacy of the proposed methods are evaluated on cardiac pacemaker, unmanned aerial vehicle and benchmark programmes. B2SC reduces battery ratecapacityeffect (Crate) by an average of 21.87% at the cost of average performance loss of 9.25%. (B2SCsopt) reduces Crate by an average of 29.37% at the cost of average performance loss of 16.87%.
- Author(s): James Clay ; Naveena Elango ; Sheena Ratnam Priya ; Shixiong Jiang ; Ramalingam Sridhar
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 443 –452
- DOI: 10.1049/iet-cdt.2019.0040
- Type: Article
- + Show details - Hide details
-
p.
443
–452
(10)
Large-scale machine-learning (ML) algorithms require extensive memory interactions. Managing or reducing data movement can significantly increase the speed and efficiency of many ML tasks. Towards this end, the authors devise an energy efficient in-memory computing (IMC) kernel for linear classification and design an initial prototype. The authors achieve a power savings of over 6.4 times than a conventional discrete system while improving reliability by 54.67%. The authors employ a split-data-aware technique to manage process, voltage, and temperature variations and to achieve fair trade-offs between energy efficiency, area requirements, and accuracy. The authors utilise a trimodal architecture with a hierarchical tree structure to further decrease power consumption. The authors also explore alternatives to the hierarchical tree structure with a significantly reduced number of linear regression blocks, while maintaining a competitive classification accuracy. Overall, the scheme provides a fast, energy efficient, and competitively accurate binary classification kernel.
- Author(s): Biswajit Mishra ; Sanket Thakkar ; Nupur Jain
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 453 –460
- DOI: 10.1049/iet-cdt.2019.0027
- Type: Article
- + Show details - Hide details
-
p.
453
–460
(8)
A low power single lead electrocardiogram front-end acquisition system in 0.18 μm CMOS operating at 0.5 V is presented here. The analogue blocks in low noise amplifier (LNA), filters and passive elements that perform amplification and DC offset cancellation are replaced by a moving average voltage to time converter (MA-VTC) to get amplification and anti-aliasing in the time domain. A digital feedback algorithm is used to cancel out the DC offset. The front-end structure is designed in the sub-threshold region of MOS to reduce the power consumption in the circuit. The proposed architecture consumes 50 nW of power with a gain of 670 μs/V. The output of the front-end is fed to an all digital time-to-digital converter (TDC) that operates in the near threshold region with a resolution of 586.4 ps and 32.5 μW power consumption.
- Author(s): Jinti Hazarika ; Mohd. Tasleem Khan ; Shaik Rafi Ahamed ; Harshal B. Nemade
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 461 –469
- DOI: 10.1049/iet-cdt.2019.0025
- Type: Article
- + Show details - Hide details
-
p.
461
–469
(9)
This study presents an energy-efficient serial pipelined architecture of fast Fourier transform (FFT) to process real-valued signals. A new data mapping scheme is presented to obtain a normal order input–output without the requirement of a post-processing stage. It facilitates reduction in the computational workload on the hardware resources which is confirmed through mathematical derivations. Further, the proposed design involves a novel quadrant multiplier with relatively lower hardware complexity. It performs the quarter operation of a complex multiplier in one clock cycle, and thereby consumes relatively lower power. Moreover, in the last stage, a merged unit for butterfly computation and data re-ordering is also proposed which performs either a half-butterfly operation or interchanges data, and thereby reduces the hardware usage. Application specific integrated circuit synthesis and field programmable gate array results show that for a 1024-points FFT computation, the proposed architecture offers 10.26% savings in area, 20.83% savings in power, 16.98% savings in area-delay product and 26.76% savings in energy-per-sample, 7.79% savings in sliced look-up tables, and 11.93% savings in flip-flops over the best existing design.
- Author(s): Sanjay Moulik ; Rajesh Devaraj ; Arnab Sarkar
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 470 –480
- DOI: 10.1049/iet-cdt.2019.0023
- Type: Article
- + Show details - Hide details
-
p.
470
–480
(11)
Devising energy-efficient scheduling strategies for real-time periodic tasks on heterogeneous platforms is a challenging as well as a computationally demanding problem. This study proposes a low-overhead heuristic strategy called, HEALERS, for dynamic voltage and frequency scaling (DVFS)-cum-dynamic power management (DPM) enabled energy-aware scheduling of a set of periodic tasks executing on a heterogeneous multi-core system. The presented strategy first applies deadline-partitioning to acquire a set of distinct time-slices. At any time-slice boundary, the following three-phase operations are applied to obtain a schedule for the next time-slice: first, it computes the fragments of the execution demands of all tasks onto each of the different processing cores in the platform. Next, it generates a schedule for each task on one or more processing cores such that the total execution demand of all tasks is satisfied. Finally, HEALERS applies DVFS and DPM on all processing cores so that energy consumption within the time-slice may be minimized while not jeopardising execution requirements of the scheduled tasks. Experimental results show that the proposed scheme is not only able to achieve appreciable energy savings with respect to state-of-the-art (5–42% on average) but also enables a significant improvement in resource utilisation (as high as 58%).
- Author(s): Khushboo Rani and Hemangee K. Kapoor
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 481 –492
- DOI: 10.1049/iet-cdt.2019.0039
- Type: Article
- + Show details - Hide details
-
p.
481
–492
(12)
With the advancement in CMOS technology and multiple processors on the chip, communication across these cores is managed by a network-on-chip (NoC). Power and performance of these NoC interconnects have become a significant factor.The authors aim to reduce the leakage power consumption of NoC buffers by the use of non-volatile spin transfer torque random access memory (STT-RAM)-based buffers. STT-RAM technology has the advantages of high density and low leakage but suffers from low endurance. This low endurance has an impact on the lifetime of the router on the whole due to unwanted write-variations governed by virtual channel (VC) allocation policies. Here various VC allocation policies that help the uniform distribution of the writes across the buffers are proposed. Iso-capacity and iso-area-based alternatives to replace SRAM buffers with STT-RAM buffers are also presented. Pure STT-RAM buffers, however, impact the network latency. To mitigate this, a hybrid variant of the proposed policies which uses alternative VCs made of SRAM technology in the case of heavy network traffic is proposed. Experimental evaluation of full system simulation shows that proposed policies reduce the write variation by 99% and improve lifetime by 3.2 times and 1093 times, respectively. Also a 55.5% gain in the energy delay product is obtained.
- Author(s): Sumana Ghosh ; Soumyajit Dey ; Pallab Dasgupta
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 493 –504
- DOI: 10.1049/iet-cdt.2019.0030
- Type: Article
- + Show details - Hide details
-
p.
493
–504
(12)
Embedded control systems are prevalent in a multitude of domains such as automotive, avionics, industrial control etc. For such systems, robustness against non-idealities of the compute platform created by situations such as hardware level transient faults (memory errors, sensor reading errors), network packet drops, late arrival of messages etc., is needed to be ensured at the design level. In traditional regular periodic execution of control loops, such guarantees are obtained by oversampling the plant, which requires extra rounds of sensing, control law computation, and actuation. One possible measure for reducing such over-provisioning of computing and communication resources is allowing occasional drops in the execution leading to better resource management as well as energy efficiency. This work showcases a methodology for deriving window-based bounds on possible drops in control loop executions while assuring formal performance guarantees even in the presence of platform-level non-idealities. Derivation of such relaxation bounds in terms of control loop executions help in deciding energy efficient scheduling solutions for low-power resource-constrained embedded control platforms while retaining control performance. The present work provides a structured methodology for deriving performance-aware control execution patterns given an energy-budget and uncertainty specification of the platform executing a set of embedded control loops.
- Author(s): Pramod Kumar Bharti ; Neelam Surana ; Joycee Mekie
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 505 –513
- DOI: 10.1049/iet-cdt.2019.0019
- Type: Article
- + Show details - Hide details
-
p.
505
–513
(9)
Wide-spread availability of high-speed INTERNET and rapid increase of smart-phone users have significantly increased online video surfing. Video decoders like H.264/H.265/MPEG consume a significant amount of power in Static Random Access Memory (SRAM) buffers. In this study, the authors propose a 1 kb (32 × 32) heterogeneous 8T SRAM architectures with (2-lower order bits) and without truncation for H.264 video decoder. They have used heterogeneous sized SRAM design and bit-truncation techniques are used simultaneously to obtain low power memory design for the H.264 video decoder. They show that the proposed approximate memory used for H.264 video decoder provide high video quality even at low power and low area budget of 0.3 µW/pixel and 5.2 µm2/pixel, respectively, at 0.5 V and 20 MHz in UMC 28 nm CMOS technology. The proposed memory architecture is compared with existing approximate memories such as heterogeneous 6T, hybrid 8T/6T, all-identical 6T, and all-identical 8T SRAM memory. The results show that proposed memory architectures perform cumulatively better than existing techniques in terms of dynamic power, leakage power, and area.
- Author(s): Somdip Dey ; Amit Kumar Singh ; Klaus Dieter McDonald-Maier
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 514 –523
- DOI: 10.1049/iet-cdt.2019.0037
- Type: Article
- + Show details - Hide details
-
p.
514
–523
(10)
Thermal cycling, as well as spatial and thermal gradient, affects the lifetime reliability and performance of heterogeneous Multi-Processor Systems-on-Chips (MPSoCs). Conventional temperature management techniques are not intelligent enough to cater for performance, energy efficiency as well as the operating temperature of the system. In this study, the authors propose a light-weight novel thermal management mechanism (P-EdgeCoolingMode) in the form of intelligent software agent, which monitors and regulates the operating temperature of the CPU cores to improve the reliability of the system while catering for performance requirements. P-EdgeCoolingMode is capable of pro-actively monitoring performance and based on the user's demand the agent takes necessary action, making the proposed methodology highly suitable for implementation on existing as well as conceptual Edge devices utilising heterogeneous MPSoCs with dynamic voltage and frequency scaling (DVFS) capabilities. They validated the authors’ methodology on the Odroid-XU4 MPSoC and Huawei P20 Lite (HiSilicon Kirin 659 MPSoC). P-EdgeCoolingMode has been successful in reducing the operating temperature while improving performance and reducing power consumption for chosen test cases than the state-of-the-art. For applications with demanding performance requirement P-EdgeCoolingMode has been found to improve the power consumption by 30.62% at the most in comparison to existing state-of-the-art power management methodologies.
- Author(s): Spandana Rachamalla ; Shashidhar Reddy ; Arun Joseph
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 524 –531
- DOI: 10.1049/iet-cdt.2019.0031
- Type: Article
- + Show details - Hide details
-
p.
524
–531
(8)
In dynamic power dominated FinFET-based microprocessors, there is significant heterogeneity in the chip power profile induced due to various factors. The workloads the microprocessors operate on are inherently heterogeneous in their switching characteristics. There is also variation in power across the chip, even within the IP block for a given workload. The aim is to provide a comprehensive industry perspective on analysing and mitigating the problems and challenges posed by this heterogeneity in dynamic power signatures in the design of next generation 14 nm FinFET-based microprocessors like IBM POWER9. This broader view of design principles from a modern-day microprocessor like POWER9 is useful for the adoption of the techniques presented and further fuel-related research in the area of dynamic power management of FinFET-based microprocessor designs. Additional focus in the context of clock-gating techniques and other use cases is also placed. New approaches for heterogeneity-aware per clock-gating domain parameterised power abstractions for enabling rapid hierarchical chip power analysis are presented.
- Author(s): Vitor Ferreira Torres and Frank Sill Torres
- Source: IET Computers & Digital Techniques, Volume 13, Issue 6, p. 532 –542
- DOI: 10.1049/iet-cdt.2019.0036
- Type: Article
- + Show details - Hide details
-
p.
532
–542
(11)
As Machine Learning applications increase the demand for optimised implementations in both embedded and high-end processing platforms, the industry and research community have been responding with different approaches to implement these solutions. This work presents approximations to arithmetic operations and mathematical functions that, associated with a customised adaptive artificial neural networks training method, based on RMSProp, provide reliable and efficient implementations of classifiers. The proposed solution does not rely on mixed operations with higher precision or complex rounding methods that are commonly applied. The intention of this work is not to find the optimal simplifications for specific deep learning problems but to present an optimised framework that can be used as reliably as one implemented with precise operations, standard training algorithms and the same network structures and hyper-parameters. By simplifying the ‘half-precision’ floating point format and approximating exponentiation and square root operations, the authors’ work drastically reduces the field programmable gate array implementation complexity (e.g. −43 and −57% in two of the component resources). The reciprocal square root approximation is so simple it could be implemented only with combination logic. In a full software implementation for a mixed-precision platform, only two of the approximations compensate the processing overhead of precision conversions.
Guest Editorial: Energy-efficient Computing for Embedded and IoT Devices
ECAP: energy-efficient caching for prefetch blocks in tiled chip multiprocessors
Scheduling of dual supercapacitor for longer battery lifetime in safety-critical embedded systems with power gating
Energy-efficient and reliable in-memory classifier for machine-learning applications
Ultra-low power digital front-end for single lead ECG acquisition integrated with a time-to-digital converter
Energy efficient VLSI architecture of real-valued serial pipelined FFT
HEALERS: a heterogeneous energy-aware low-overhead real-time scheduler
Write-variation aware alternatives to replace SRAM buffers with non-volatile buffers in on-chip interconnects
Performance and energy aware robust specification of control execution patterns under dropped samples
Hetro8T: power and area efficient approximate heterogeneous 8T SRAM for H.264 video decoder
P-EdgeCoolingMode: an agent-based performance aware thermal management unit for DVFS enabled heterogeneous MPSoCs
Heterogeneity aware power abstractions for dynamic power dominated FinFET-based microprocessors
Resilient training of neural network classifiers with approximate computing techniques for hardware-optimised implementations
Most viewed content
Most cited content for this Journal
-
High-performance elliptic curve cryptography processor over NIST prime fields
- Author(s): Md Selim Hossain ; Yinan Kong ; Ehsan Saeedi ; Niras C. Vayalil
- Type: Article
-
Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits
- Author(s): Aiman H. El-Maleh
- Type: Article
-
Scalable GF(p) Montgomery multiplier based on a digit–digit computation approach
- Author(s): M. Morales-Sandoval and A. Diaz-Perez
- Type: Article
-
Fabrication and characterisation of Al gate n-metal–oxide–semiconductor field-effect transistor, on-chip fabricated with silicon nitride ion-sensitive field-effect transistor
- Author(s): Rekha Chaudhary ; Amit Sharma ; Soumendu Sinha ; Jyoti Yadav ; Rishi Sharma ; Ravindra Mukhiya ; Vinod K. Khanna
- Type: Article
-
Adaptively weighted round-robin arbitration for equality of service in a many-core network-on-chip
- Author(s): Hanmin Park and Kiyoung Choi
- Type: Article