Online ISSN
1751-861X
Print ISSN
1751-8601
IET Computers & Digital Techniques
Volume 2, Issue 2, March 2008
Volumes & issues:
Volume 2, Issue 2
March 2008
-
- Author(s): X. Zhou and P. Petrov
- Source: IET Computers & Digital Techniques, Volume 2, Issue 2, p. 75 –85
- DOI: 10.1049/iet-cdt:20070090
- Type: Article
- + Show details - Hide details
-
p.
75
–85
(11)
An arithmetic-based address translation technique is presented for low-power and real-time embedded processors with virtual memory support. General-purpose virtual memory support comes with its disadvantages of excessive power consumption and nondeterministic execution times, which are the main reasons for not adopting virtual memory in energy-efficient and real-time embedded systems. To address these issues, an application-driven address translation is proposed, where most of the address translations, which are traditionally performed as translation lookaside buffer (TLB) lookups, are replaced with fast and energy-efficient addition operations. To achieve this, a program and system-wide information is used to identify sequences of consecutive virtual page numbers, which are mapped to sequences of consecutive physical page frames. For such pairs of page sequences, only the addition of a constant to the virtual page number is needed to produce the physical page frame. The proposed methodology relies on the combined efforts of compiler, operating system, and hardware architecture to achieve a significant power reduction. As the approach fundamentally eliminates conflicts inherent in the hardware translation table, execution time is not only improved but also made predictable for a large number of memory reference instructions. Experiments show power reductions in the range of 80–95% compared to a general-purpose TLB. - Author(s): X. Yang ; Y.Y. Tang ; J. Cao
- Source: IET Computers & Digital Techniques, Volume 2, Issue 2, p. 86 –93
- DOI: 10.1049/iet-cdt:20050219
- Type: Article
- + Show details - Hide details
-
p.
86
–93
(8)
A number of parallel algorithms admit a static torus-structured task graph. Hexagonal honeycomb torus (HHT) networks are regarded as promising candidates for interconnection networks. In order to efficiently execute a torus-structured parallel algorithm on an HHT, it is essential to map the tasks to processors so that the communication overhead is minimised. The study proves that a (3n, 2n) torus can be embedded into an nth-order HHT with dilation 3, congestion 4, expansion 1 and load factor 1. Consequently, a parallel algorithm with a (3n, 2n) torus task graph can be executed on an nth-order HHT efficiently. - Author(s): S. del Pino ; D. Chaver ; L. Pinuel ; M. Prieto ; F. Tirado
- Source: IET Computers & Digital Techniques, Volume 2, Issue 2, p. 94 –107
- DOI: 10.1049/iet-cdt:20060179
- Type: Article
- + Show details - Hide details
-
p.
94
–107
(14)
A highly efficient fetch unit is essential not only to obtain good performance but also to achieve energy efficiency. However, existing commercial fetch designs are not adaptable and depending on the program behaviour, they can be either insufficient or an overkill. A phase-based adaptive fetch mechanism that can be dynamically adjusted based on feedback information of the program behaviour is introduced. This design adds very little hardware complexity and relegates complex tasks to the software components. It is also very effective: saving 35% and 52% fetch energy on an average compared with a conventional and a trace cache-based fetch unit, respectively. At the same time, performance is improved by 4.7% and 0.8%, respectively. - Author(s): A. Al-Yamani ; N. Devta-Prasanna ; A. Gunda
- Source: IET Computers & Digital Techniques, Volume 2, Issue 2, p. 108 –117
- DOI: 10.1049/iet-cdt:20070037
- Type: Article
- + Show details - Hide details
-
p.
108
–117
(10)
Analysis of the tradeoff between hardware overhead, runtime and test data volume is presented when implementing systematic scan reconfiguration using centralised and distributed architectures of the segmented addressable scan, which is an Illinois-scan-based architecture. The results show that the centralised scheme offers better data volume compression, similar automatic test pattern generation (ATPG) runtime results and lower hardware overhead. The cost with the centralised scheme is in the routing congestion. - Author(s): S.P. Mohanty ; E. Kougianos ; D.K. Pradhan
- Source: IET Computers & Digital Techniques, Volume 2, Issue 2, p. 118 –131
- DOI: 10.1049/iet-cdt:20070108
- Type: Article
- + Show details - Hide details
-
p.
118
–131
(14)
The authors present two polynomial time-complexity heuristic algorithms for optimisation of gate-oxide leakage (tunnelling current) during behavioural synthesis through simultaneous schedulling and binding. One algorithm considers the time-constraint explicitly and the other considers it implicitly, whereas both account for resource constraints. The algorithms selectively bind the off-critical operations to instances of the pre-characterised resources consisting of transistors of higher oxide thickness, and critical operations to the resources of lower oxide thickness for power and performance optimisation. We design and characterise functional and storage units of different gate-oxide thicknesses and built a data path library. Extensive experiments for several behavioural synthesis benchmarks for 45 nm complementary metal-oxide-semiconductor technology showed that reduction as high as 85% can be obtained. - Author(s): H.-T. Lin and J.C.-M. Li
- Source: IET Computers & Digital Techniques, Volume 2, Issue 2, p. 132 –141
- DOI: 10.1049/iet-cdt:20070088
- Type: Article
- + Show details - Hide details
-
p.
132
–141
(10)
An automatic test pattern generation (ATPG) technique, which simultaneously reduces capture and shift power during scan testing, is presented. This ATPG performs power reduction during dynamic test compaction so the test length overhead is very small. This low-power test generator implements several novel techniques, such as parity backtrace, confined propagation, dynamic controllability and post-fill test regeneration. The experimental data on ISCAS benchmark circuits show that the peak capture power and the peak shift power are reduced by 31% and 26%, respectively. - Author(s): J.W. Kwak and C.S. Jhon
- Source: IET Computers & Digital Techniques, Volume 2, Issue 2, p. 142 –154
- DOI: 10.1049/iet-cdt:20060130
- Type: Article
- + Show details - Hide details
-
p.
142
–154
(13)
To achieve higher performance in embedded systems, recent embedded microprocessor cores have gradually taken to adopting the technologies of general high-performance microprocessor cores. In branch prediction techniques, usually, the embedded microprocessor cores have used simple bimodal branch predictors. That is, until now, most branch predictors in embedded processor cores have utilised the address of the branch instruction (program counter, PC), and recently some predictors in advanced embedded cores use dynamic branch predictor with global branch history (GBH).The authors suggest branch direction history (BDH) as a new component of the input vector for branch prediction. Additionally, a new embedded branch predictor is proposed, called direction–gshare predictor, which utilises BDH information, as an implementation example. In simulation parts, a neural network with three branch prediction input vectors (PC, GBH and BDH) is modelled and their actual impact upon the branch prediction accuracy is analysed. Then, the new embedded branch predictor, the direction–gshare predictor is simulated. The simulation results show that the aliasings in pattern history table are reduced, 48.9% on average, by the additional use of BDH information. Moreover, the direction–gshare predictor outperforms previous embedded branch predictors, such as bimodal predictor, two-level adaptive predictor and gshare predictor, up to 15.32%, 5.41% and 5.74%, respectively.
Low-power and real-time address translation through arithmetic operations for virtual memory support in embedded systems
Embedding torus in hexagonal honeycomb torus
Energy reduction of the fetch mechanism through dynamic adaptation
Comparative study of centralised and distributed compatibility-based test data compression
Simultaneous scheduling and binding for low gate leakage nano-complementary metal-oxide-semiconductor data path circuit behavioural synthesis
Simultaneous capture and shift power reduction test pattern generator for scan testing
High-performance embedded branch predictor by combining branch direction history and global branch history
Most viewed content for this Journal
Article
content/journals/iet-cdt
Journal
5
Most cited content for this Journal
-
High-performance elliptic curve cryptography processor over NIST prime fields
- Author(s): Md Selim Hossain ; Yinan Kong ; Ehsan Saeedi ; Niras C. Vayalil
- Type: Article
-
Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits
- Author(s): Aiman H. El-Maleh
- Type: Article
-
Scalable GF(p) Montgomery multiplier based on a digit–digit computation approach
- Author(s): M. Morales-Sandoval and A. Diaz-Perez
- Type: Article
-
Fabrication and characterisation of Al gate n-metal–oxide–semiconductor field-effect transistor, on-chip fabricated with silicon nitride ion-sensitive field-effect transistor
- Author(s): Rekha Chaudhary ; Amit Sharma ; Soumendu Sinha ; Jyoti Yadav ; Rishi Sharma ; Ravindra Mukhiya ; Vinod K. Khanna
- Type: Article
-
Adaptively weighted round-robin arbitration for equality of service in a many-core network-on-chip
- Author(s): Hanmin Park and Kiyoung Choi
- Type: Article