© The Institution of Engineering and Technology
Contemporary FPGA-based reconfigurable systems have been widely used to implement data-dominated applications. In these applications, data transfer and storage consume a large proportion of the system energy. Exploiting data-reuse can introduce significant power savings, but also introduces the extra requirement for on-chip memory. To aid data-reuse design exploration early during the design cycle, the authors present an optimisation approach to achieve a power-optimal design satisfying an on-chip memory constraint in a targeted FPGA-based platform. The data-reuse exploration problem is mathematically formulated and shown to be equivalent to the multiple-choice knapsack problem. The solution to this problem for an application code corresponds to the decision of which array references are to be buffered on-chip and where loading reused data of the array references into on-chip memory happen in the code, in order to minimise power consumption for a fixed on-chip memory size. The authors also present an experimentally verified power model, capable of providing the relative power information between different data-reuse design options of an application, resulting in a fast and efficient design-space exploration. The experimental results demonstrate that the approach enables us to find the most power-efficient design for all the benchmark circuits tested.
References
-
-
1)
-
P.R. Panda ,
N.D. Dutt ,
A. Nicolau
.
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems.
ACM Trans. Des. Autom. Electron. Syst.
,
3 ,
682 -
704
-
2)
-
‘1Mx36 & 2Mx18 flow-through NtRAM datasheet’, http://datasheet.digchip.com, accessed January 2007.
-
3)
-
Liu, Q., Masselos, K., Constantinides, G.A.: `Data reuse exploration for FPGA based platforms applied to the full search motion estimation algorithm', 2006 Int. Conf. Field Programmable Logic and Applications, 2006, p. 389–394.
-
4)
-
http://www.pages.drexel.edu/~weg22/edge.html, accessed August 2006.
-
5)
-
Soudris, D., Zervas, N.D., Argyriou, A., Dasygenis, M., Tatas, K., Goutis, C.E., Thanailakis, A.: `Data-reuse and parallel embedded architectures for low-power, real-time multimedia applications', PATMOS '00: Proc. 10th Int. Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation, 2000, London, UK, p. 243–254, Springer-Verlag.
-
6)
-
K. Compton ,
S. Hauck
.
Reconfigurable computing: a survey of systems and software.
ACM Comput. Surv.
,
2 ,
171 -
210
-
7)
-
W.-T. Shiue ,
S. Udayanarayanan ,
C. Chakrabarti
.
Data memory design and exploration for low-power embedded systems.
ACM Trans. Des. Autom. Electron. Syst.
,
4 ,
553 -
568
-
8)
-
Celoxica, ‘RC300 board specifications’, accessed January 2007. Available: http://www.celoxica.com/techlib/files/cel-w040216143f-257.pdf.
-
9)
-
Guo, Z., Najjar, W., Vahid, F., Vissers, K.: `A quantitative analysis of the speedup factors of FPGAs over processors', FPGA '04: Proc. 2004 ACM/SIGDA 12th Int. Symp. Field Programmable Gate Arrays, ACM, 2004, New York, NY, USA, p. 162–170.
-
10)
-
M. Kandemir ,
J. Ramanujam ,
M.J. Irwin ,
N. Vijaykrishnan ,
I. Kadayif ,
A. Parikh
.
A compiler based approach for dynamically managing scratch-pad memories in embedded systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.
,
2 ,
243 -
260
-
11)
-
P. Petrov ,
A. Orailoglu
.
Performance and power effectiveness in embedded processors-customizable partitioned caches.
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
,
11 ,
1309 -
1318
-
12)
-
Kuon, I., Rose, J.: `Measuring the gap between FPGAs and ASICs', FPGA' 06: Proc. 2006 ACM/SIGDA 14th Int. Symp. Field Programmable Gate Arrays, ACM, 2006, New York, NY, USA, p. 21–30.
-
13)
-
M. Weinhardt ,
W. Luk
.
Memory access optimization for reconfigurable systems.
IEE Proc Comput. Digit. Tech.
,
105 -
112
-
14)
-
‘Handel-C language reference manual’, http://www.celoxica.comaccessed August 2006.
-
15)
-
L. Wang ,
M. French ,
A. Davoodi ,
D. Agarwal
.
FPGA dynamic power minimization through placement and routing constraints.
EURASIP J. Embedded Syst.
,
1 ,
1 -
10
-
16)
-
M. Dasygenis ,
N. Kroupis ,
K. Tatas ,
A. Argyriou ,
D. Soudris ,
A. Thanailakis
.
Power and performance exploration of embedded systems executing multimedia kernels.
IEE Proc.-Comput. Digit. Tech.
,
4 ,
164 -
172
-
17)
-
Xilinx: ‘Xilinx Xpower estimator user guide’, accessed January 2007. Available: http://www.xilinx.com/products/designresources/power central.
-
18)
-
V. Bhaskaran ,
K. Konstantinides
.
(1997)
Image and video compression standards: algorithms and architectures.
-
19)
-
Clarke, J.A.: `High-level power optimization for digital signal processing in reconfigurable logic', , PhD, , dissertation, Imperial College London, London, UK, 2008.
-
20)
-
F. Catthoor ,
E. de Greef ,
S. Suytack
.
(1998)
Custom memory management methodology: exploration of memory organisation for embedded multimedia system design.
-
21)
-
Baradaran, N., Park, J., Diniz, P.C.: `Compiler reuse analysis for the mapping of data in FPGAs with RAM blocks', 2004 IEEE Int. Conf. Field-Programmable Technology, December 2004, p. 145–152.
-
22)
-
Liu, Q., Constantinides, G.A., Masselos, K., Cheung, P.Y.K.: `Automatic on-chip memory minimization for data reuse', FCCM '07: Proc. 15th Annual IEEE Symp. Field-Programmable Custom Computing Machines, IEEE Computer Society, 2007, DC, USA, Washington, p. 251–260.
-
23)
-
Brockmeyer, E., Miranda, M., Corporaal, H., Catthoor, F.: `Layer assignment techniques for low energy in multi-layered memory organisations', Proc. 6th ACM/IEEE Design, Automation and Test in Europe Conf. and Exhibition, March 2003, Munich, Germany, p. 1070–1075.
-
24)
-
D. Pisinger
.
A minimal algorithm for the multiple-choice knapsack problem.
Eur. J. Oper. Res.
,
394 -
410
-
25)
-
J. Absar ,
F. Catthoor
.
Reuse analysis of indirectly indexed arrays.
ACM Trans. Des. Autom. Electron. Syst.
,
282 -
305
-
26)
-
Baradaran, N., Diniz, P.C.: `A register allocation algorithm in the presence of scalar replacement for fine-grain configurable architectures', DATE' 05: Proc. Conf. Design, Automation and Test in Europe, IEEE Computer Society, 2005, Washington, DC, USA, p. 6–11.
-
27)
-
V. Bonato ,
E. Marques ,
G.A. Constantinides
.
A floating-point extended kalman filter implementation for autonomous mobile robots.
VLSI Signal Process.
-
28)
-
S. Martello ,
P. Toth
.
(1990)
Knapsack problems: algorithms and computer implementations.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2008.0039
Related content
content/journals/10.1049/iet-cdt.2008.0039
pub_keyword,iet_inspecKeyword,pub_concept
6
6