Home
>
Journals & magazines
>
IEE Proceedings - Computers and Digital Technique...
>
Volume 153
Issue 6
IEE Proceedings - Computers and Digital Techniques
Volume 153, Issue 6, November 2006
Volumes & issues:
Volume 153, Issue 6
November 2006
-
- Author(s): C.-W. Chiang ; Y.-C. Lee ; C.-N. Lee ; T.-Y. Chou
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 6, p. 373 –380
- DOI: 10.1049/ip-cdt:20050196
- Type: Article
- + Show details - Hide details
-
p.
373
–380
(8)
PC clusters have recently received considerable interest as cost-effective parallel platforms for CPU-intensive applications. A cluster of PCs generally comprises of a collection of heterogeneous process elements (PEs). To make effective use of a PC cluster, a parallel program, which is characterised by a node- and edge-weighted directed acyclic graph (DAG), can usually be decomposed into a set of precedence-constrained atomic tasks such that PEs are able to accommodate these tasks and minimise the overall program-completion time. Consequently, techniques for task matching and scheduling become extremely important for effectively harnessing the computing power of the target cluster-based system. This work presents a constructive algorithm based on ant colony optimisation (ACO). The proposed algorithm, namely ACO-TMS, adopts a new state transition rule that reduces the time required when finding the satisfactory scheduling results. The proposed algorithm also integrates a local search procedure that proposed to help improve the scheduling results. The performance of this algorithm is demonstrated by comparing it against other existing algorithms, such as the genetic-algorithm-based scheduling method and the dynamic priority scheduling (DPS) heuristic, in terms of overall schedule length of randomly generated DAGs. Experimental results indicate that the proposed algorithm outperforms the genetic algorithm and the DPS heuristic algorithm for high communication to computation and heterogeneous computing environment. - Author(s): A. Mahjur and A.H. Jahangir
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 6, p. 381 –388
- DOI: 10.1049/ip-cdt:20050197
- Type: Article
- + Show details - Hide details
-
p.
381
–388
(8)
Hardware prefetching schemes which divide the misses into streams are generally preferred to other hardware based schemes. But, as they do not know when the next miss of a stream happens, they cannot prefetch a block in appropriate time. Some of them use a substantial amount of hardware storage to keep the predicted miss blocks from all streams. The other approaches follow the program flow and prefetch all target addresses including those blocks which already exist in the L1 data cache. The approach presented predicts the stream of next miss and then prefetches only the next miss address of the stream. It offers a general prefetching framework, two-phase prediction algorithm (TPP), that lets each stream have its own address predictor. Comparing the TPP algorithm with the latest variant of stream buffers and Markov predictor using SPEC CPU 2000 benchmarks shows that in average (1) the TPP approach has 18% speedup compared to 1% speedup in Markov and 0.05% in stream buffers. (2) 78% of the TPP prefetches have been useful, whereas in stream buffers and Markov, only 18% and 24% of them were useful, respectively. - Author(s): F. Castro ; D. Chaver ; L. Pinuel ; M. Prieto ; M.C. Huang ; F. Tirado
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 6, p. 389 –398
- DOI: 10.1049/ip-cdt:20050218
- Type: Article
- + Show details - Hide details
-
p.
389
–398
(10)
The load–store queue (LSQ) of modern superscalar processors is a critical and non-scalable component responsible for keeping the order of memory operations. As new architectures become more aggressive, the number of in-flight memory instructions increases, and the LSQ must satisfy higher capacity requirements. An efficient LSQ state filtering mechanism based on Bloom filtering is proposed, which, in conjunction with a dynamic or profiling-based predictor, provides significant energy reduction (up to 55% in the LSQ and 4% in the whole processor), and only incurs a small performance loss. - Author(s): R.A. Patel ; M. Benaissa ; S. Boussakta
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 6, p. 399 –405
- DOI: 10.1049/ip-cdt:20050166
- Type: Article
- + Show details - Hide details
-
p.
399
–405
(7)
A new modulo 2n−1 addition algorithm is presented, which is applicable in the residue number system. In contrast to previous work, the input carry in the first stage of the addition is set to one. The associated output carry is then used to conditionally modify the sum to produce the correct modulo 2n−1 result. Moreover, unlike recent adders in the literature, the result never exceeds the dynamic range of the modulus. Actual VLSI implementations using 130 nm standard-cell technology show that the corresponding architectures provide improved trade-offs in the power–delay–area space when compared against existing designs.
Ant colony optimisation for task matching and scheduling
Two-phase prediction of L1 data cache misses
LSQ: a power efficient and scalable implementation
Efficient new approach for modulo 2n−1 addition in RNS
Most viewed content for this Journal
Article
content/journals/ip-cdt
Journal
5
Most cited content for this Journal
We currently have no most cited data available for this content.