http://iet.metastore.ingenta.com
1887

SPARC-based VLIW testbed

SPARC-based VLIW testbed

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IEE Proceedings - Computers and Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

The performance of very long instruction word (VLIW) microprocessors depends on the close co-operation between the compiler and the architecture. To design a high-performance VLIW a testbed is required that allows detailed co-evaluation of both compilation techniques and architectural features. The paper introduces a new VLIW testbed based on the SPARC instruction set architecture, which includes an aggressive scheduling compiler and a fast VLIW simulator. The compiler takes gcc-generated optimised SPARC code as input and generates parallelised VLIW code, targeting advanced VLIW architectures. The compiler can generate high-performance VLIW code, especially for non-numerical integer programs. The VLIW code is translated into a dedicated C program for fast and simple compiled simulation which generates detailed data for performance evaluation. The authors have performed a comprehensive empirical study on the testbed for both large-resource and small-resource machines. The result shows that a geometric mean of as much as fourfold speedup is obtainable on nontrivial integer benchmarks without using branch probability when performing speculative code motion. Also analysed are the characteristics of the useful and useless ALU operations in each cycle to see how the speedup is obtained. The analysis indicates that around half of the useful ALUs execute speculative instructions whose original paths are taken (thus being ‘hit’), yet a substantial number of ALUs are also wasted owing to useless speculative execution or copy execution.

References

    1. 1)
      • B. Rau , J. Fisher . Instruction-level parallel processing: History, overview, and perspective. J. Supercomput. , 9 - 50
    2. 2)
      • Park, J., Chung, H., Moon, S.-M.: `Elimination of register windows in SPARC', Proceedings of 23rd KISS spring conference, KISS Spring '96, April 1996, p. 373–376.
    3. 3)
      • Silberman, G., Ebcioğlu, K.: `An architectural framework for migration from CISC tohigher performance platforms', Proceedings 1992 international conference on Supercomputing,ICS'92, July 1992, p. 198–215.
    4. 4)
      • S.-M. Moon , K. Ebcioğlu . Parallelizing nonnumerical code with selective scheduling andsoftware pipelining. ACM Trans. Program. Lang. Syst. , 6 , 853 - 898
    5. 5)
      • T. Nakatani , K. Ebcioğlu . Making compaction based parallelization affodable. IEEE Trans. Parallel Distrib. Syst. , 9 , 1014 - 1529
    6. 6)
      • Zivojnovic, V., Tjiang, S., Meyr, H.: `Compiled simulation of programmable DSP architectures', Proceedings of 1995 IEEE workshop on VLSI signal processing, 1995, p. 187–196.
    7. 7)
      • Mahlke, S.: `Exploiting instruction-level parallelism in the presence of conditionalbranches', 1996, PhD, University of Illinois, Department of Electrical and Computer Engineering.
    8. 8)
      • F. Steven , G. Steven , L. Wang . Using a resource-limited instruction scheduler toevaluate the iHARP processor. IEE Proc., Computers and Digital Techniques , 1 , 23 - 31
    9. 9)
      • G. Sander . Graph layout through the VCG tool. Lec. Notes Comput. Sci. 894 , 194 - 205
    10. 10)
      • S.-M. Moon , S. Carson . Generalized multi-way branch unit for VLIW microprocessors. IEEE Trans. Parallel Distrib. Syst. , 8 , 850 - 862
    11. 11)
      • R. Colwell , R. Nix , J. O'Donnel , D. Papworth , P. Rodman . A VLIW architecturefor a trace scheduling compiler. IEEE Trans. , 8 , 967 - 979
http://iet.metastore.ingenta.com/content/journals/10.1049/ip-cdt_19982025
Loading

Related content

content/journals/10.1049/ip-cdt_19982025
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address