© The Institution of Electrical Engineers
A fine grain mapping strategy for multiprocessor systems especially suitable for RISC and VLIW type MIMD computers is described. The objective of the strategy is to minimise the total execution time of the application algorithms that are to be executed on the target systems. An application algorithm which has been coded as a straight line program is compiled to generate intermediate codes. This code is then represented by a data dependence graph. A node in the graph corresponds to the operation of one intermediate code statement, and arcs between nodes represent the data dependence between operations. The mapping strategy, called SR-mapper, is a generalised list scheduler. When SR-mapper does the mapping procedure, the data dependence between nodes, the pipeline effect of each processing element within the system, and the communication cost between processing elements are considered. SR-mapper uses slot reservation technique to insert send nodes for the immediate successor nodes when a node has been scheduled to maintain the data dependence between nodes and to achieve the synchronisation between processing elements. SR-mapper has been implemented and several scientific application algorithms, such as matrix multiplication, L-U decomposition, weighted median filter, convolution kernel etc. have been used as the testing inputs. Results show that on the average the speedup obtained by SR-mapper is 4.480 times as large as the one obtained by the mapper which does not employ the pipeline effect assuming the weight of operations are 4 for addition and subtraction and 6 for multiplication and division. During the mapping phase, if the slot reservation technique is adopted, the gain of speedup is 21%, on the average, more than the one obtained by the mapper which does not apply the slot reservation technique. If all nodes have an equal weight of 1, the average efficiency of the systems with the number of processing elements ranging from 2 to 32 and communication costs ranging from 0 to 20 is 0.481. These results are encouraging.
References
-
-
1)
-
S. Aryak
.
An optimal instruction — scheduling models for class of vectors processor.
IEEE Trans. Computers
,
11 ,
981 -
995
-
2)
-
W.W. Chu ,
L.J. Holloway ,
M.T. Lan ,
K. Efe
.
Task allocation in distributed data processing.
Computer
,
57 -
69
-
3)
-
J.R. Ellis
.
, Bulldog: A compiler for VLIW architectures.
-
4)
-
J.A. Fisher
.
Trace scheduling: A technique for global microcode compaction.
IEEE Trans. Computers
,
7 ,
478 -
490
-
5)
-
Solworth, J.A.: `The microflow architecture', Proc. of Int. Conf. Parallel Processing, 1988, 1, p. 113–117.
-
6)
-
M. Annaratone ,
E. Arnould ,
T. Gross ,
H.T. Kung
.
The WARP computer: architecture, implementation and performance.
IEEE Trans. Computer
,
12 ,
1523 -
1538
-
7)
-
M.J. Gonzalez
.
Deterministic processor scheduling.
ACM Computing Surveys
,
3 ,
173 -
204
-
8)
-
G.S. Almasi ,
A. Gottlieb
.
(1989)
, Highly parallel computing.
-
9)
-
B. Krautrachue ,
T. Lewis
.
Grain size determination for parallel processing.
IEEE Software
,
23 -
32
-
10)
-
K. Hwang ,
F.A. Briggs
.
(1984)
, Computer architecture and Parallel processing.
-
11)
-
R.L. Burden ,
J.D. Faires ,
A.C. Reynolds
.
(1981)
, Numerical analysis.
-
12)
-
E.J. Dijkstra
.
(1968)
Co-operating sequential processes, Programming language.
-
13)
-
Shih, L.: `Automatic synthesis of concurrent control for multiprocessor systems of general topology through fine-grain mapping', 1989, PhD thesis, Case Western Reserve University, Dept. of computer Engineering and Science.
-
14)
-
B.J. Smith ,
J. Kowalil
.
(1985)
The architecture of HEP, Parallel MIMD computation: The HEP supercomputer and its application.
-
15)
-
R.L. Graham
.
Bounds on multiprocessing timing anomalies.
SIAM, J. Applied Mathematics
,
2 ,
416 -
429
-
16)
-
R.S. Bucy ,
K.D. Senne
.
Nonlinear filtering algorithms for vector machines.
Computers and Mathematices
,
3 ,
317 -
338
-
17)
-
Shieh, J.J.: `Fine grain mapping strategies for pipelined computer system', 1990, PhD thesis, Case Western Reserve University, Dept. of computer Engineering and Science.
-
18)
-
C.D. Polychronopoulos
.
(1988)
, Parallel programming and compilers.
-
19)
-
B.R. Rau ,
W.J. David ,
W.Y. Yen ,
A.T. Ross
.
The Cydra 5 departmental supercomputer.
IEEE Computer
,
36 -
44
-
20)
-
Kuck, D.J.: `Dependence graphs and compiler optimizations', ACM 8th Annual Symposium on Principles of Programming Languages, January 1981, p. 207–218.
-
21)
-
H.S. Stone
.
Multimicroprocessor scheduling with the aid of network flow algorithm.
IEEE Trans. Software Engineering
,
1 ,
85 -
93
-
22)
-
M. Kumar
.
Measuring parallelism in computation intensive scientific/engineering applications.
IEEE Trans. Computers
,
9 ,
1088 -
1098
-
23)
-
E.G. Coffman
.
(1976)
, Introduction to deterministic scheduling theory.
-
24)
-
Sarkar, V.: `Partitioning and scheduling parallel programs for execution on multiprocessors', , PhD thesis, Stanford University, Computer Systems Laboratory, Dept. of Electrical Engineering and Computer Science, Technique Report No. CSL-TR-87-328.
-
25)
-
J. Bruno ,
J.W. Jones ,
K. So
.
Deterministic scheduling with pipelined processors.
IEEE Trans.
,
308 -
316
-
26)
-
O. Carvalho ,
G. Roucairol ,
Paker ,
Verjus
.
(1983)
Assertion, decomposition and partial correctness of distributed control algorithms, Distributed computing systems.
-
27)
-
H.J. Siegel
.
(1985)
, Interconnection networks for large-scale parallel processing.
-
28)
-
Ramamoorthy, C.V., Gonzalez, M.J.: `A survey of techniques for recognizing parallel processable streams in computer programs', AFIPS, Fall Joint Computer Conference, November 1969, p. 1–14.
-
29)
-
D.A. Patterson
.
RISCs.
Communications the ACM
,
8 -
21
-
30)
-
A.V. Aho ,
J.E. Hopcroft ,
J.D. Ullman
.
(1974)
, The design and analysis of computer algorithms.
-
31)
-
Auslander, M., Hopkins, M.: `An Overview of the PL.8 compiler', Proc. of the SIGPLAN'82 Symposium on computer construction, June 1982, p. 22–31.
-
32)
-
G. Zorpette
.
Computing at 100 MIPS: Architecture for success.
IEEE The Institute
-
33)
-
M.R. Garey ,
D.S. Johnson
.
(1979)
, Computers and intractability: a guide to the theory of NP-completeness.
-
34)
-
A.V. Aho ,
R. Sethi ,
J.D. Ullman
.
(1986)
, Compilers: principles techniques and tools.
-
35)
-
Shrira, L., Francez, N., Rodeh, M.: `Distributed K-selection: from a sequential to a distributed algorithm', ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, August 1983.
-
36)
-
D.A. Padua ,
M.J. Wolfe
.
Advanced compiler optimizations for supercomputers.
Communications the ACM
,
12 ,
1184 -
1200
-
37)
-
R.W. Hockney ,
C.R. Jesshope
.
(1981)
, Parallel computers.
-
38)
-
C.A.R. Hoare
.
Monitors: an operating structuring.
CACM
,
549 -
557
-
39)
-
P.M. Kogge
.
(1981)
, The architecture of pipelined computers.
-
40)
-
H.F. Jordan ,
J. Kowalil
.
(1985)
HEP architecture, programming and performance, Parallel MIMD computation: The HEP supercomputer and its application.
-
41)
-
D.R.K. Brownrigg
.
The weighted median filters.
CACM
,
8 ,
807 -
818
-
42)
-
A. Nicolau ,
J.A. Fisher
.
Measuring the parallelism available for very long instruction word architectures.
IEEE Trans. Computers
,
11
-
43)
-
G. LeLann
.
(1977)
, Distributed systems, towards a formal approach.
-
44)
-
J.A. Fisher
.
The VLIW machine: A multiprocessor for compiling scientific code.
IEEE Computer
,
45 -
53
-
45)
-
T.L. Rodefheffer
.
(1985)
Compiling ordinary programs for execution on an asynchronous multiprocessor, CMU-CS-85-155.
-
46)
-
Cohn, R., Gross, T., Lam, M., Tseng, P.S.: `Architecture and compiler tradeoffs for a long instruction word microprocessor', Proc. of the Third Conf. on Architectural Support for Programming Languages and Operating Systems, 1989, p. 2–14.
-
47)
-
T.L. Adam ,
K.M. Chandy ,
J.R. Dickson
.
A comparision of list schedules for parallel processing systems.
Comm. of ACM
,
12
-
48)
-
G.H. Katevenis
.
(1985)
, Reduced instruction set computer architecture for VLSI.
-
49)
-
Burke, M., Cytron, R., Ferrante, J.: `Automatic discovery of parallelism: a tool and experiment', ACM SIGPLAN Symp. parallel processing: experiments with applications, languages and systems, July 1988, p. 77–84.
-
50)
-
P.B. Hansen
.
The programming language: concurrent pascal.
IEEE Trans. Software Engineering
,
199 -
207
http://iet.metastore.ingenta.com/content/journals/10.1049/ip-e.1991.0015
Related content
content/journals/10.1049/ip-e.1991.0015
pub_keyword,iet_inspecKeyword,pub_concept
6
6