IET Computers & Digital Techniques
Volume 10, Issue 4, July 2016
Volumes & issues:
Volume 10, Issue 4
July 2016
-
- Author(s): Liu Han ; Hao Zhang ; Seok-Bum Ko
- Source: IET Computers & Digital Techniques, Volume 10, Issue 4, p. 147 –156
- DOI: 10.1049/iet-cdt.2015.0058
- Type: Article
- + Show details - Hide details
-
p.
147
–156
(10)
Decimal floating-point (DFP) arithmetic has attracted attention in the applications of financial and commercial computing. However, the processing efficiency of DFP is still far away from that of binary designs. On the other hand, a floating-point fused multiply-add (FMA) function is widely used in many processors within functional iterations to implement division, square root, and many other functions due to the better accuracy achieved by a single rounding of continuous multiplication and addition. In this work, a new architecture of FMA is proposed to speed up the DFP processing. Compared with previous architectures, first, the proposed design applies a specific decimal redundant encoding system. The circuits to decide and shift the rounding position on a redundant result are therefore simplified. Second, the only digit-set conversion in the entire design is combined with the rounding operation to further reduce the critical path. Third, the techniques applied in different previous FMAs are merged in the proposed design. In addition the multiplier and adder referred to the previous designs are further optimised. Consequently, compared with the fastest previous design, the synthesis results show about 33.7% speed advantage and about 16.6% area advantage.
- Author(s): Isuru Nawinne ; Haris Javaid ; Roshan Ragel ; Sri Parameswaran
- Source: IET Computers & Digital Techniques, Volume 10, Issue 4, p. 157 –164
- DOI: 10.1049/iet-cdt.2015.0114
- Type: Article
- + Show details - Hide details
-
p.
157
–164
(8)
Caches are used to improve memory access time and energy consumption. The cache configuration which enables the best performance often differs between applications due to diverse memory access patterns. The authors present a new concept, called switchable cache, where multiple cache configurations exist on chip, leveraging the abundant transistors available due to what is known as the dark silicon phenomenon. Only one cache configuration is active at any given time based on the application under execution, while all other configurations remain inactive (dark). They describe an architecture to enable seamless integration of multiple cache configurations, and a novel design space exploration methodology to rapidly pre-determine the optimal set of configurations at design-time, for a given group of applications. For design spaces containing trillions of design points, the authors’ exploration methodology always found the optimal solution in less than 2 s. The switchable cache improved memory access time by up to 26.2% when compared to a fixed cache.
- Author(s): Sima Afsharpour ; Ahmad Patooghy ; Mahdi Fazeli
- Source: IET Computers & Digital Techniques, Volume 10, Issue 4, p. 165 –173
- DOI: 10.1049/iet-cdt.2015.0131
- Type: Article
- + Show details - Hide details
-
p.
165
–173
(9)
This study proposes an efficient task migration algorithm for mesh-based multi- and many-core chips. The proposed algorithm collects tasks running on a rectangular-based set of cores, that is, source sub-mesh and moves the tasks to another rectangular-based set to remove chip temperature hotspots and to provide balanced load on the chip. The proposed migration algorithm uses the concept of gathering/scattering to minimise the traffic induced by the migration. In this regard, the proposed algorithm uses a selected node in each row of the source sub-mesh to gather tasks of all cores in the same row. Selection of the gathering node is done based on its location in the row and traffic rate of other cores in the row. When gathering nodes are migrated, in the destination sub-mesh, then, they scatter their tasks according to the same pattern among the cores in their rows. Simulations of the proposed migration algorithm are done by Access Noxim simulator in a various range of network conditions with application graphs of D263DECMP3DEC, DMPEG4, and DVOPD. Results obtained from simulations show that the proposed algorithm offers 36% better performance, 28% lower energy consumption, and 7% lower temperature in comparison with the previously proposed migration algorithms.
- Author(s): Qutaiba Ibrahim
- Source: IET Computers & Digital Techniques, Volume 10, Issue 4, p. 174 –185
- DOI: 10.1049/iet-cdt.2015.0135
- Type: Article
- + Show details - Hide details
-
p.
174
–185
(12)
In this study, a green vehicular ad-hoc network (VANET) infrastructure is suggested. The main players in such an infrastructure are the road side units (RSUs) which are able to harvest the energy needed for their work from the surrounding environment, especially the solar energy. Such a suggestion permits to install the RSUs in any place without considering the power supply availability and hence, an extensive area is covered by the VANET infrastructure with an improved performance. To achieve this goal, a new distributed power management scheme called duty cycle estimation-event driven duty cycling is suggested and installed locally in the RSUs in order to decrease their power consumption and to extend the lifetime of their batteries. Embedded UBICOM IP2022 network processer platform is adopted to implement the proposed RSU and the detailed design steps are described, while the necessary values of the system components such as the number of solar cell panels, battery cells capacity and so on, are tuned to suit the design goals. The suggested method is compared with other duty cycling methods to show its effectiveness to build a green VANET infrastructure.
- Author(s): Viacheslav Borisovich Marakhovsky and Alexey Vadimovich Surkov
- Source: IET Computers & Digital Techniques, Volume 10, Issue 4, p. 186 –192
- DOI: 10.1049/iet-cdt.2015.0130
- Type: Article
- + Show details - Hide details
-
p.
186
–192
(7)
The problem of organising the temporal behaviour of globally asynchronous systems consisting of parallel interacting blocks is discussed. System blocks are represented by the Moore state machine model. The earlier suggested GALA (Globally Asynchronous, Locally Arbitrary) design methodology is used. This methodology is based on decomposing the system to a Processors Stratum (stratum of blocks) and a Synchronisation Stratum (synch-stratum). The synch-stratum acts as a distributed asynchronous clock network that produces local synch-signals for the processor stratum, which basically can be a synchronous prototype. The synch-stratum is a self-timed circuit that interacts with the processor stratum (system devices) via the handshake protocol. Every local device that has received the request signal from the synch-stratum produces the acknowledgment signal and sends it back. In this study, some logic circuits of universal modules are suggested. They provide an easy way to design any synch-stratum for parallel synchronisation of system blocks with arbitrary interconnection graphs and for wave synchronisation of system blocks with acyclic interconnection graph.
- Author(s): Liang Geng ; Jizhong Shen ; Congyuan Xu
- Source: IET Computers & Digital Techniques, Volume 10, Issue 4, p. 193 –201
- DOI: 10.1049/iet-cdt.2015.0139
- Type: Article
- + Show details - Hide details
-
p.
193
–201
(9)
In this study, a novel power efficient implicit pulsed-triggered flip-flop with embedded clock-gating and pull-up control scheme (IPFF-CGPC) is proposed. By applying an XOR-based clock-gating scheme in the pulse generating stage, which conditionally disables the inverter chain when the input keeps unchanged, IPFF-CGPC is able to gain low power efficiency by eliminating redundant transitions of internal nodes. Meanwhile, a pull-up control scheme is applied to enhance the discharging path and save short-circuit power when D makes ‘0’–‘1’ transition. To further improve the robustness of the proposed design, the XOR-based comparator in the clock-gating scheme is replaced by a transmission gate-based comparator, which results in an enhanced version (IPFF-ECGPC). Based on the SMIC 65 nm technology, extensive post-layout simulation results show that IPFF-CGPC exhibits excellent power characteristic with a reduction of 32.06–85.89% against its rival designs at 10% data switching activity. Due to its power efficiency, its power-delay product (PDP) gains an improvement of up to 73.94% in the same condition. Moreover, IPFF-ECGPC also enjoys outstanding total-power and PDP efficiency at 10% data switching activity. Therefore, the proposed designs are suitable for power-constrained applications in very-large-scale integration designs which are speed-insensitive.
Decimal floating-point fused multiply-add with redundant internal encodings
Switchable cache: utilising dark silicon for application specific cache optimisations
Performance/energy aware task migration algorithm for many-core chips
Enhanced power management scheme for embedded road side units
Globally asynchronous systems of interactive Moore state machines
Design of flip-flops with clock-gating and pull-up control scheme for power-constrained and speed-insensitive applications
Most viewed content
Most cited content for this Journal
-
High-performance elliptic curve cryptography processor over NIST prime fields
- Author(s): Md Selim Hossain ; Yinan Kong ; Ehsan Saeedi ; Niras C. Vayalil
- Type: Article
-
Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits
- Author(s): Aiman H. El-Maleh
- Type: Article
-
Scalable GF(p) Montgomery multiplier based on a digit–digit computation approach
- Author(s): M. Morales-Sandoval and A. Diaz-Perez
- Type: Article
-
Fabrication and characterisation of Al gate n-metal–oxide–semiconductor field-effect transistor, on-chip fabricated with silicon nitride ion-sensitive field-effect transistor
- Author(s): Rekha Chaudhary ; Amit Sharma ; Soumendu Sinha ; Jyoti Yadav ; Rishi Sharma ; Ravindra Mukhiya ; Vinod K. Khanna
- Type: Article
-
Adaptively weighted round-robin arbitration for equality of service in a many-core network-on-chip
- Author(s): Hanmin Park and Kiyoung Choi
- Type: Article