access icon free Performance Optimization and Comparison of the Alternating Direction Implicit CFD Solver on Multi-core and Many-Core Architectures

We accelerate a double precision Alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational fluid dynamics (CFD) software on the latest multi-core and many-core architectures (Intel Sandy Bridge CPUs, Intel Many integrated core (MIC) coprocessors and NVIDIA Kepler K20c GPUs). Some performance optimization techniques are detailed discussed. We provide an in-depth analysis on the performance difference between Sandy Bridge and MIC. Experimental results show that the proposed GPU-enabled ADI solver can achieve a speedup of 5.5 on a Kepler GPU in contrast to two Sandy Bridge CPUs and our optimization techniques can improve the performance of the ADI solver by 2.5-fold on two Sandy Bridge CPUs and 1.7-fold on an Intel MIC coprocessor. We perform a cross-platform performance analysis (between GPU and MIC), which serves as case studies for developers to select the right accelerators for their target applications.

Inspec keywords: multiprocessing systems; parallel algorithms; parallel architectures; graphics processing units; computational fluid dynamics; Navier-Stokes equations

Other keywords: many-core architectures; performance optimization techniques; NVIDIA Kepler K20c GPUs; multicore; three-dimensional compressible Navier-Stokes equations; in-house Computational fluid dynamics software; performance difference; Intel Many integrated core coprocessors; cross-platform performance analysis; Kepler GPU; alternating direction implicit CFD solver; GPU-enabled ADI solver; double precision Alternating direction implicit solver; in-depth analysis; Intel Sandy Bridge CPUs; Sandy Bridge CPU; Intel MIC coprocessor

Subjects: Multiprocessing systems; Physics and chemistry computing; General fluid dynamics theory, simulation and other computational methods; Parallel software; Microprocessors and microcomputers; Microprocessor chips; Parallel architecture

http://iet.metastore.ingenta.com/content/journals/10.1049/cje.2018.03.011
Loading

Related content

content/journals/10.1049/cje.2018.03.011
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading