access icon free Intensive computing on a large data volume with a short-vector single instruction multiple data processor

In this study, the authors want to evaluate the performances of the PowerXCell 8i processor, which is based on Cell Broadband Engine architecture. For this purpose, the authors chose an algorithm for the k-nearest neighbour problem. The authors optimised this algorithm for efficient exploitation of the facilities provided by this architecture. The authors evaluated the PowerXCell 8i performances by algorithm execution with single- and double-precision calculations. For both cases, the performances were evaluated with and without SIMDisation. For single-precision calculations, the authors achieved a maximum speed-up of 43.85 with SIMDisation by activating 6 synergetic processor element (SPE) processors and 39.73 without SIMDisation by activating 16 SPE processors. For double-precision calculations, the authors achieved a maximum speed-up of 34.79 with SIMDisation by activating 9 SPE processors and 32.71 without SIMDisation by activating 12 SPE processors. These values related to the execution on the PowerPC processor element processor and are due to the accessing way of the main memory by the SPE cores, through the DMA transfers who are performed in parallel with the computing operations. The authors conclude that this process can be efficiently used for the execution of algorithms that require intensive computations on huge data volume.

Inspec keywords: multiprocessing systems; microprocessor chips; performance evaluation

Other keywords: SPE cores; computing operations; double-precision calculations; DMA transfers; Cell Broadband Engine architecture; k-nearest neighbour problem; large data volume; SPE processors; PowerPC processor element processor; PowerXCell 8i processor; synergetic processor element processors; performance evaluation; single-precision calculations; short-vector single instruction multiple data processor

Subjects: Microprocessors and microcomputers; Performance evaluation and testing; Multiprocessing systems; Microprocessor chips

References

    1. 1)
    2. 2)
    3. 3)
    4. 4)
    5. 5)
    6. 6)
    7. 7)
      • 20. Deza, M., Deza, E.: ‘Encyclopedia of distances’ (Springer, 2009).
    8. 8)
      • 21. An implementation of the k-NN algorithm on STI's Cell processor, https://code.google.com/p/cell-knn/.
    9. 9)
    10. 10)
    11. 11)
    12. 12)
      • 17. Turner, J.A.: ‘The Los Alamos roadrunner Petascale hybrid supercomputer – overview of applications, results and programming’, Roadrunner Technical Seminar Series, Los Alamos National Laboratory, March 2008.
    13. 13)
    14. 14)
    15. 15)
    16. 16)
      • 23. Jimenez-Gonzalez, D., Martorell, X., Ramirez, A.: ‘Performance analysis of cell broadband engine for high memory bandwidth applications’. IEEE Int. Symp. Performance Analysis of Systems & Software (ISPASS 2007), 25–27 April 2007, pp. 210219.
    17. 17)
      • 2. IBM: Software Development Kit for Multicore Acceleration Version 3.1 – Programming Tutorial [Online]. Available at http://public.dhe.ibm.com/software/dw/cell/CBE Programming Tutorial v3.1.pdf.
    18. 18)
      • 14. Katosi, K., Hosino, T.: ‘Multi-GPU algorithm for k-nearest neighbor problem’, Concurrency Comput. Pract. Exp., 2011, 24, (1), pp. 4553.
    19. 19)
    20. 20)
    21. 21)
    22. 22)
      • 22. Altevogt, P., Boettiger, H., Kiss, T., Krnjajic, Z.: ‘Evaluating IBM BladeCenter QS21 hardware performance’ (IBM Multicore Acceleration Technical Library, 2008), http://www.ibm.com/developerworks/library/pa-qs21perf/index.html.
    23. 23)
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2013.0149
Loading

Related content

content/journals/10.1049/iet-cdt.2013.0149
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading