© The Institution of Engineering and Technology
Recent advances in multi-million-gate platform field-programmable gate arrays (FPGAs) have made it possible to design and implement complex parallel systems on a programmable chip that also incorporate hardware floating-point units (FPUs). These options take advantage of resource reconfiguration. In contrast to the majority of the FPGA community that still employs reconfigurable logic to develop algorithm-specific circuitry, our FPGA-based mixed-mode reconfigurable computing machine can implement simultaneously a variety of parallel execution modes and is also user programmable. Our heterogeneous reconfigurable architecture (HERA) machine can implement the single-instruction, multiple-data (SIMD), multiple-instruction, multiple-data (MIMD) and multiple-SIMD (M-SIMD) execution modes. Each processing element (PE) is centred on a single-precision IEEE 754 FPU with tightly-coupled local memory, and supports dynamic switching between SIMD and MIMD at runtime. Mixed-mode parallelism has the potential to best match the characteristics of all subtasks in applications, thus resulting in sustained high performance. HERA's performance is evaluated by two common computation-intensive testbenches: matrix–matrix multiplication (MMM) and LU factorisation of sparse doubly-bordered-block-diagonal (DBBD) matrices. Experimental results with electrical power network matrices show that the mixed-mode scheduling for LU factorisation can result in speedups of about 19% and 15.5% compared to the SIMD and MIMD implementations, respectively.
References
-
-
1)
-
I.S. Duff ,
A.M. Erisman ,
J.K. Reid
.
(1990)
Direct methods for sparse matrices.
-
2)
-
Intel Math Kernel Library (MKL) 8.0. Available at: http://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/219823.htm.
-
3)
-
Rajopadhye, S.V.: `Systolic arrays for LU decomposition', IEEE Int. Symp. Circuits and Systems, June 1988, 3, p. 2513–2516.
-
4)
-
Underwood, K.: `FPGAs vs. CPUs: trends in peak floating-point performance', 12thACM/SIGDA Int. Symp. on Field Programmable Gate Arrays, February 2004, Monterey, CA, p. 171–180.
-
5)
-
X. Wang ,
S.G. Ziavras
.
A multiprocessor-on-a-programmable-chip reconfigurable system for matrix operations with power-grid case studies.
Int. J. Comput. Sci. Eng.
-
6)
-
Yi, Y., Woods, R., McCanny, J.V.: `Hierarchical synthesis of complex DSP functions on FPGAs', 37thAsilomar Conf. Signals, Systems and Computers, November 2003, 2, p. 1421–1425.
-
7)
-
A.M. Tyrrell ,
R.A. Krohling ,
Y. Zhou
.
Evolutionary algorithm for the promotion of evolvable hardware.
IEE Proc., Comput. Digit. Tech.
,
4 ,
267 -
275
-
8)
-
Mirsky, E., DeHon, A.: `MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources', 1996 IEEE Symp. FPGAs for Custom Computing Machines, 1996, p. 157–166.
-
9)
-
Liang, J., Tessier, R., Mencer, O.: `Floating point unit generation and evaluation for FPGAs', 11thAnnual IEEE Symp. on Field-Programmable Custom Computing Machines, April 2003, p. 185–194.
-
10)
-
TMS320C6711/11B/11C/11D Floating-Point Digital Signal Processors. Available at: http://focus.ti.com/docs/prod/folders/print/tms320c6711.html.
-
11)
-
Hannig, F., Dutta, H., Teich, J.: `Regular mapping for coarse-grained reconfigurable architectures', 2004 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, May 2004, Montréal, Canada, V, p. 57–60.
-
12)
-
R. Ronen ,
A. Mendelson ,
K. Lai ,
S.-L. Lu ,
F. Pollack ,
J. Shen
.
Coming challenges in microarchitecture and architecture.
Proc. IEEE
,
3 ,
325 -
340
-
13)
-
Meilander, W.C., Baker, J.W., Jin, M.: `Importance of SIMD computation reconsidered', 17thIEEE Int. Parallel Distributed Processing Symp. (IPDPS2003), April 2003, p. 266–273.
-
14)
-
Wang, X., Ziavras, S.G.: `A framework for dynamic resource management and scheduling on reconfigurable mixed-mode multiprocessors', IEEE Int. Conf. on Field-Programmable Technology, December 2005, Singapore, p. 51–58.
-
15)
-
Matrix Market, Available at: http://math.nist.gov/MatrixMarket/.
-
16)
-
Zhuo, L., Prasanna, V.K.: `Scalable and modular algorithms for floating-point matrix multiplication on FPGAs', 18thInt. Parallel and Distributed Processing Symp., April 2004, p. 92–101.
-
17)
-
H. Singh ,
M.-H. Lee ,
G. Lu ,
F.J. Kurdahi ,
N. Bagherzadeh ,
E.M.C. Filho
.
MorphoSys: an integrated reconfigurable system for data-parallel and computation-Intensive Applications.
IEEE Trans. Comput.
,
5 ,
465 -
481
-
18)
-
A. Sangiovanni-Vincentelli ,
L.K. Chen ,
L.O. Chua
.
An efficient heuristic cluster algorithm for tearing large-scale networks.
IEEE Trans. Circuits Syst.
,
12 ,
709 -
717
-
19)
-
Govindu, G., Choi, S., Prasanna, V.K., Daga, V., Gangadharpalli, S., Sridhar, V.: `A high-performance and energy-efficient architecture for floating-point based LU decomposition on FPGAs', 12thReconfigurable Architectures Workshop, April 2004.
-
20)
-
K. Compton ,
S. Hauck
.
Reconfigurable computing: a survey of systems and software.
ACM Comput. Surv.
,
2 ,
171 -
210
-
21)
-
S.G. Ziavras ,
A. Gerbessiotis ,
R. Bafna
.
Coprocessor design to support MPI primitives in configurable multiprocessors.
Integr. VLSI J.
-
22)
-
Codito Technologies Pvt. Ltd. Available at: http://www.codito.com/prodtech_framework.html.
-
23)
-
Tensilica. Available at: http://tensilica.com.
-
24)
-
Baker, J.M., Bennett, S., Bucciero, M., Gold, B., Mahajan, R.: `SCMP: a single-chip message-passing parallel computer', The 2002 Int. Conf. on Parallel and Distributed Processing Techniques and Applications, June 2002, Las Vegas, NV, p. 1485–1491.
-
25)
-
Annapolis Micro Systems, Inc., Available at http://www.annapmicro.com/.
-
26)
-
Wang, X., Ziavras, S.G.: `HERA: a reconfigurable and mixed-mode parallel computing engine on platform FPGAs', 16thInt. Conf. Parallel and Distributed Computing and Systems, 9–11 November 2004, Boston, Massachusetts, p. 374–379.
-
27)
-
X. Wang ,
S.G. Ziavras
.
Parallel LU factorization of sparse matrices on FPGA-based configurable computing engines.
Concurrency Comput. Pract. Exp.
,
4 ,
319 -
343
-
28)
-
Olukotun, K., Nayfeh, B.A., Hammond, L., Wilson, K., Chang, K.: `The case for a single-chip multiprocessor', Seventh Int. Symp. Architectural Support for Programming Languages and Operating Systems, October 1996, p. 2–11.
-
29)
-
Parhami, B.: `SIMD machines: do they have a significant future?', Report on a Panel Discussion, 5th Symp. Frontiers Massively Parallel Computation, February 1995, McLean, LA, p. 19–22.
-
30)
-
Wunderlich, R., Püschel, M., Hoe, J.: `Accelerating blocked matrix-matrix multiplication using a software-managed memory hierarchy with DMA', High Performance Embedded Computing Workshop, 2005, MIT.
-
31)
-
R. Bergamaschi ,
I. Bolsens ,
R. Gupta ,
R. Harr ,
A. Jerraya ,
K. Keutzer ,
K. Olukotun ,
K. Vissers
.
Are single-chip multiprocessors in reach?.
IEEE Des. Test Comput.
,
1 ,
82 -
89
-
32)
-
OpenMP. Available at: http://www.openmp.org.
-
33)
-
J. Demmel ,
K. Yelick
.
(2002)
Automatic performance tuning of linear algebra kernels, TOPS-SciDAC (http://www.tops-scidac.org).
-
34)
-
Krashinsky, R., Batten, C., Hampton, M., Gerding, S., Pharris, B., Casper, J., Asanovic, K.: `The vector-thread architecture', IEEE 31st Int. Symp. on Computer Architecture, June 2004, Munich, Germany, p. 52–63.
-
35)
-
Khawam, S., Arslan, T., Westall, F.: `Synthesizable reconfigurable array targeting distributed arithmetic for system-on-chip applications', 12thReconfigurable Architectures Workshop, 2004.
-
36)
-
F. Barat ,
L. Rudy ,
D. Geert
.
Reconfigurable instruction set processors from a hardware/software perspective.
IEEE Trans. Softw. Eng.
,
9 ,
847 -
862
-
37)
-
H.J. Siegel ,
M. Maheswaran ,
D.W. Watson ,
J.K. Antonio ,
M.J. Atallah ,
M.M. Eshaghian
.
(1996)
Mixed-mode system heterogeneous computing, Heterogeneous computing.
-
38)
-
Cannon, L.E.: `A cellular computer to implement the Kalman filter algorithm', 1969, PhD, Montana State University.
-
39)
-
R. Tessier ,
W. Burleson
.
Reconfigurable computing and digital signal processing: a survey.
J. VLSI Signal Process.
,
7 -
27
-
40)
-
M. Taylor
.
The RAW microprocessor: a computational fabric for software circuits and general purpose programs.
IEEE Micro
,
2 ,
25 -
35
-
41)
-
Dou, Y., Vassiliadis, S., Kuzmanov, G.K., Gaydadjiev, G.N.: `64-bit floating-point FPGA matrix multiplication', ACM/SIGDA Int. Symp. on Field Programmable Gate Arrays, February 2005, Monterey, CA, p. 86–95.
-
42)
-
Bensaali, F., Amira, A., Bouridane, A.: `An FPGA based coprocessor for large matrix product implementation', 2003 IEEE Int. Conf. Field-Programmable Technology, December 2003, p. 292–295.
-
43)
-
Wang, X., Ziavras, S.G.: `Parallel direct solution of linear equations on FPGA-based machines', 11thIEEE Int. Workshop on Parallel and Distributed Real-Time Systems (Proc. 17th IEEE International Parallel and Distributed Processing Symp.), 22–26 April 2003, Nice, France.
http://iet.metastore.ingenta.com/content/journals/10.1049/ip-cdt_20045136
Related content
content/journals/10.1049/ip-cdt_20045136
pub_keyword,iet_inspecKeyword,pub_concept
6
6