© The Institution of Engineering and Technology
Dynamic dataflow allows simultaneous execution of instructions in different iterations of a loop, boosting parallelism exploitation. In this model, operands are tagged with their associated instance number, which is incremented as they go through the loop. Instruction execution is triggered when all input operands with the same tag become available. However, this traditional tagging mechanism often requires the generation of several control instructions to manipulate tags and guarantee the correct match. To address this problem, this work presents three dataflow loop optimisation techniques. The stack-tagged dataflow is a tagging mechanism that uses stacks of tags to reduce control overheads in dataflow. On the other hand, as nested loops may increase the overhead of stack-tag comparison, tag resetting can be used to set the tag to zero whenever it is safe, allowing a one-level reduction at the stack depth. Finally, loop skipping allows to further avoid stack comparison overhead in loops, when the number of iterations can be determined by the compiler. Experimental results show the overhead, drawbacks and benefits for the three optimisations presented. Moreover, the results suggested that a hybrid compiling approach can be used to get the best performance of each technique.
References
-
-
1)
-
7. Bosilca, G., Bouteiller, A., Danalis, A., et al: ‘Dague: a generic distributed dag engine for high performance computing’, Parallel Comput., 2012, 38, (1–2), pp. 37–51 (doi: 10.1016/j.parco.2011.10.003).
-
2)
-
2. Nikhil, A.R.: ‘Executing a program on the MIT tagged-token dataflow architecture’, IEEE Trans. Comput., 1990, 39, (3), pp. 300–318 (doi: 10.1109/12.48862).
-
3)
-
10. Alves, T.A.O.: ‘Dataflow execution for reliability and performance on current hardware’. PhD thesis, COPPE – UFRJ, 2014.
-
4)
-
5. Duran, A., Ayguadé, E., Badia, R.M., et al: ‘Ompss: a proposal for programming heterogeneous multi-core architectures’, Parallel Process. Lett., 2011, 21, pp. 173–193 (doi: 10.1142/S0129626411000151).
-
5)
-
13. Marzulo, L.A.J.: ‘Exploring multithreaded execution with dataflow oriented programming’. PhD thesis, COPPE – UFRJ, 2011.
-
6)
-
4. Alves, T.A., Marzulo, L.A., Franca, F.M., et al: ‘Trebuchet: exploring TLP with dataflow virtualisation’, Int. J. High Perform. Syst. Archit., 2011, 3, (2/3), pp. 137–130 (doi: 10.1504/IJHPSA.2011.040466).
-
7)
-
3. Swanson, S., Michelson, K., Schwerin, A., et al: ‘WaveScalar’. 36th Annual IEEE/ACM Int. Symp. on Microarchitecture, 2003, MICRO-36, 2003, pp. 291–302.
-
8)
-
9)
-
9. Santiago, L., Marzulo, L., Goldstein, B., et al: ‘Stack-tagged dataflow’. Int. Symp. on Computer Architecture and High Performance Computing Workshop (SBAC-PADW), 2014, 2014, pp. 78–83.
-
10)
-
8. Giorgi, R.R.M., Bodin, F., Cohen, A., et al: ‘TERAFLUX: harnessing dataflow in next generation teradevices’, Microprocess. Microsyst., 2014, 38, (8, part B), pp. 976–990 (doi: 10.1016/j.micpro.2014.04.001).
-
11)
-
12. Alves, T.A.O., Kundu, S., Marzulo, L.A.J., et al: ‘Online error detection recovery for dataflow execution’. IEEE IOLTS, 2014.
-
12)
-
11. Marzulo, L.A.J., Alves, T.A., Franca, F.M.G., et al: ‘TALM: a hybrid execution model with distributed speculation support’. Int. Symp. on Computer Architecture and High Performance Computing Workshops, 2010, pp. 31–36.
-
13)
-
1. Dennis, J.B., Misunas, D.P.: ‘A preliminary architecture for a basic data-flow processor’, SIGARCH Comput. Archit. News, 1974, 3, (4), pp. 126–132 (doi: 10.1145/641675.642111).
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cds.2015.0148
Related content
content/journals/10.1049/iet-cds.2015.0148
pub_keyword,iet_inspecKeyword,pub_concept
6
6