Exploiting parallelism in configurable architectures through custom array mapping
Configurable architectures offer the unique opportunity of customising the storage allocation to meet specific applications' needs. A compiler approach to map the arrays of a loop-based computation to internal memories of a configurable architecture with the objective of minimising the overall execution time is described. An algorithm that considers the data access patterns of the arrays along the critical path of the computation as well as the available storage and memory bandwidth is presented. Experimental results are presented which demonstrate the application of this approach for a set of kernel codes when targeting a field-programmable gate-array. The results reveal that the proposed algorithm outperforms the naive and custom data layout techniques by an average of 33% and 15% in terms of execution time, while taking into account the available hardware resources.