http://iet.metastore.ingenta.com
1887

Fine grain mapping strategy for multiprocessor systems

Fine grain mapping strategy for multiprocessor systems

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IEE Proceedings E (Computers and Digital Techniques) — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

A fine grain mapping strategy for multiprocessor systems especially suitable for RISC and VLIW type MIMD computers is described. The objective of the strategy is to minimise the total execution time of the application algorithms that are to be executed on the target systems. An application algorithm which has been coded as a straight line program is compiled to generate intermediate codes. This code is then represented by a data dependence graph. A node in the graph corresponds to the operation of one intermediate code statement, and arcs between nodes represent the data dependence between operations. The mapping strategy, called SR-mapper, is a generalised list scheduler. When SR-mapper does the mapping procedure, the data dependence between nodes, the pipeline effect of each processing element within the system, and the communication cost between processing elements are considered. SR-mapper uses slot reservation technique to insert send nodes for the immediate successor nodes when a node has been scheduled to maintain the data dependence between nodes and to achieve the synchronisation between processing elements. SR-mapper has been implemented and several scientific application algorithms, such as matrix multiplication, L-U decomposition, weighted median filter, convolution kernel etc. have been used as the testing inputs. Results show that on the average the speedup obtained by SR-mapper is 4.480 times as large as the one obtained by the mapper which does not employ the pipeline effect assuming the weight of operations are 4 for addition and subtraction and 6 for multiplication and division. During the mapping phase, if the slot reservation technique is adopted, the gain of speedup is 21%, on the average, more than the one obtained by the mapper which does not apply the slot reservation technique. If all nodes have an equal weight of 1, the average efficiency of the systems with the number of processing elements ranging from 2 to 32 and communication costs ranging from 0 to 20 is 0.481. These results are encouraging.

References

    1. 1)
      • T.L. Adam , K.M. Chandy , J.R. Dickson . A comparision of list schedules for parallel processing systems. Comm. of ACM , 12
    2. 2)
      • A.V. Aho , J.E. Hopcroft , J.D. Ullman . (1974) , The design and analysis of computer algorithms.
    3. 3)
      • A.V. Aho , R. Sethi , J.D. Ullman . (1986) , Compilers: principles techniques and tools.
    4. 4)
      • G.S. Almasi , A. Gottlieb . (1989) , Highly parallel computing.
    5. 5)
      • M. Annaratone , E. Arnould , T. Gross , H.T. Kung . The WARP computer: architecture, implementation and performance. IEEE Trans. Computer , 12 , 1523 - 1538
    6. 6)
      • S. Aryak . An optimal instruction — scheduling models for class of vectors processor. IEEE Trans. Computers , 11 , 981 - 995
    7. 7)
      • Auslander, M., Hopkins, M.: `An Overview of the PL.8 compiler', Proc. of the SIGPLAN'82 Symposium on computer construction, June 1982, p. 22–31.
    8. 8)
      • D.R.K. Brownrigg . The weighted median filters. CACM , 8 , 807 - 818
    9. 9)
      • J. Bruno , J.W. Jones , K. So . Deterministic scheduling with pipelined processors. IEEE Trans. , 308 - 316
    10. 10)
      • R.S. Bucy , K.D. Senne . Nonlinear filtering algorithms for vector machines. Computers and Mathematices , 3 , 317 - 338
    11. 11)
      • R.L. Burden , J.D. Faires , A.C. Reynolds . (1981) , Numerical analysis.
    12. 12)
      • Burke, M., Cytron, R., Ferrante, J.: `Automatic discovery of parallelism: a tool and experiment', ACM SIGPLAN Symp. parallel processing: experiments with applications, languages and systems, July 1988, p. 77–84.
    13. 13)
      • O. Carvalho , G. Roucairol , Paker , Verjus . (1983) Assertion, decomposition and partial correctness of distributed control algorithms, Distributed computing systems.
    14. 14)
      • W.W. Chu , L.J. Holloway , M.T. Lan , K. Efe . Task allocation in distributed data processing. Computer , 57 - 69
    15. 15)
      • E.G. Coffman . (1976) , Introduction to deterministic scheduling theory.
    16. 16)
      • Cohn, R., Gross, T., Lam, M., Tseng, P.S.: `Architecture and compiler tradeoffs for a long instruction word microprocessor', Proc. of the Third Conf. on Architectural Support for Programming Languages and Operating Systems, 1989, p. 2–14.
    17. 17)
      • E.J. Dijkstra . (1968) Co-operating sequential processes, Programming language.
    18. 18)
      • J.R. Ellis . , Bulldog: A compiler for VLIW architectures.
    19. 19)
      • J.A. Fisher . Trace scheduling: A technique for global microcode compaction. IEEE Trans. Computers , 7 , 478 - 490
    20. 20)
      • J.A. Fisher . The VLIW machine: A multiprocessor for compiling scientific code. IEEE Computer , 45 - 53
    21. 21)
      • M.R. Garey , D.S. Johnson . (1979) , Computers and intractability: a guide to the theory of NP-completeness.
    22. 22)
      • M.J. Gonzalez . Deterministic processor scheduling. ACM Computing Surveys , 3 , 173 - 204
    23. 23)
      • R.L. Graham . Bounds on multiprocessing timing anomalies. SIAM, J. Applied Mathematics , 2 , 416 - 429
    24. 24)
      • P.B. Hansen . The programming language: concurrent pascal. IEEE Trans. Software Engineering , 199 - 207
    25. 25)
      • C.A.R. Hoare . Monitors: an operating structuring. CACM , 549 - 557
    26. 26)
      • R.W. Hockney , C.R. Jesshope . (1981) , Parallel computers.
    27. 27)
      • K. Hwang , F.A. Briggs . (1984) , Computer architecture and Parallel processing.
    28. 28)
      • H.F. Jordan , J. Kowalil . (1985) HEP architecture, programming and performance, Parallel MIMD computation: The HEP supercomputer and its application.
    29. 29)
      • G.H. Katevenis . (1985) , Reduced instruction set computer architecture for VLSI.
    30. 30)
      • P.M. Kogge . (1981) , The architecture of pipelined computers.
    31. 31)
      • B. Krautrachue , T. Lewis . Grain size determination for parallel processing. IEEE Software , 23 - 32
    32. 32)
      • Kuck, D.J.: `Dependence graphs and compiler optimizations', ACM 8th Annual Symposium on Principles of Programming Languages, January 1981, p. 207–218.
    33. 33)
      • M. Kumar . Measuring parallelism in computation intensive scientific/engineering applications. IEEE Trans. Computers , 9 , 1088 - 1098
    34. 34)
      • G. LeLann . (1977) , Distributed systems, towards a formal approach.
    35. 35)
      • A. Nicolau , J.A. Fisher . Measuring the parallelism available for very long instruction word architectures. IEEE Trans. Computers , 11
    36. 36)
      • D.A. Padua , M.J. Wolfe . Advanced compiler optimizations for supercomputers. Communications the ACM , 12 , 1184 - 1200
    37. 37)
      • D.A. Patterson . RISCs. Communications the ACM , 8 - 21
    38. 38)
      • C.D. Polychronopoulos . (1988) , Parallel programming and compilers.
    39. 39)
      • Ramamoorthy, C.V., Gonzalez, M.J.: `A survey of techniques for recognizing parallel processable streams in computer programs', AFIPS, Fall Joint Computer Conference, November 1969, p. 1–14.
    40. 40)
      • B.R. Rau , W.J. David , W.Y. Yen , A.T. Ross . The Cydra 5 departmental supercomputer. IEEE Computer , 36 - 44
    41. 41)
      • T.L. Rodefheffer . (1985) Compiling ordinary programs for execution on an asynchronous multiprocessor, CMU-CS-85-155.
    42. 42)
      • Sarkar, V.: `Partitioning and scheduling parallel programs for execution on multiprocessors', , PhD thesis, Stanford University, Computer Systems Laboratory, Dept. of Electrical Engineering and Computer Science, Technique Report No. CSL-TR-87-328.
    43. 43)
      • Shieh, J.J.: `Fine grain mapping strategies for pipelined computer system', 1990, PhD thesis, Case Western Reserve University, Dept. of computer Engineering and Science.
    44. 44)
      • Shih, L.: `Automatic synthesis of concurrent control for multiprocessor systems of general topology through fine-grain mapping', 1989, PhD thesis, Case Western Reserve University, Dept. of computer Engineering and Science.
    45. 45)
      • Shrira, L., Francez, N., Rodeh, M.: `Distributed K-selection: from a sequential to a distributed algorithm', ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, August 1983.
    46. 46)
      • H.J. Siegel . (1985) , Interconnection networks for large-scale parallel processing.
    47. 47)
      • B.J. Smith , J. Kowalil . (1985) The architecture of HEP, Parallel MIMD computation: The HEP supercomputer and its application.
    48. 48)
      • Solworth, J.A.: `The microflow architecture', Proc. of Int. Conf. Parallel Processing, 1988, 1, p. 113–117.
    49. 49)
      • H.S. Stone . Multimicroprocessor scheduling with the aid of network flow algorithm. IEEE Trans. Software Engineering , 1 , 85 - 93
    50. 50)
      • G. Zorpette . Computing at 100 MIPS: Architecture for success. IEEE The Institute
http://iet.metastore.ingenta.com/content/journals/10.1049/ip-e.1991.0015
Loading

Related content

content/journals/10.1049/ip-e.1991.0015
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address