© The Institution of Engineering and Technology
Hadoop on datacentre is a popular analytical platform for enterprises. Cloud vendors host Hadoop clusters on the datacentre to provide high performance analytical computing facilities to its customers, who demand a parallel programming model to deal with huge data. Effective cost/time management and ingenious resource consumption among the concurrent users, must be the primary concern without which the key aspiration behind high performance cloud computing would suffer. Workflows portray such high performance applications in terms of individual jobs and dependencies between them. Workflows can be scheduled on virtual machines (VMs) in datacentre to make best possible use of resources. In the authors’ earlier work, a mechanism to pack and execute the customer jobs as workflows on Hadoop platform was proposed which minimises the VM cost and also executes the workflow jobs within deadline. In this work, the authors try to optimise certain other parameters such as load on cloud, response time for workflows, resource usage effectiveness by applying soft computing methods. Stochastic hill climbing (SCH) is a soft computing approach used to solve many optimisation problems. In this study, they have employed the SHC approach to schedule workflow jobs to VMs and thereby optimise the above mentioned multiple parameters in cloud datacentre.
References
-
-
1)
-
2)
-
17. Krish, K.R., Anwar, A., Butt, A.R.: ‘ΦSched: a heterogeneity-aware Hadoop workflow scheduler’. Proc. of 22nd Int. Symp. Modelling, Analysis & Simulation of Computer and Telecommunication Systems, September 2014, pp. 255–264.
-
3)
-
4. Li, S., Hu, S., Wang, S., et al: ‘WOHA: deadline-aware map-reduce workflow scheduling framework over Hadoop clusters’. Int. Conf. Distributed Computing Systems, 2014, pp. 93–103.
-
4)
-
19. Wang, J., Crawl, D., Altintas, I.: ‘Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems’. Proc. WORKS, 2009.
-
5)
-
20. de Oliveira, D., Ocãna, K.A.C.S., Baĩao, F., et al: ‘A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds’, J. Grid Comput., 2012, 10, (3), pp. 521–552.
-
6)
-
9. Liu, Z., Zhang, Q., Zhani, M.F., et al ‘DREAMS: dynamic resource allocation for MapReduce with data skew’, 2015.
-
7)
-
3. Gu, R., Yang, X., Yan, J., et al: ‘SHadoop: improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters’, J. Parallel Distrib. Comput., 2014, 74, (3), pp. 2166–2179.
-
8)
-
22. Deng, K., Kong, L., Song, J., et al: ‘A weighted k-means clustering based co-scheduling strategy towards efficient execution of scientific workflows in collaborative cloud environments’. IEEE 9th Int. Conf. Dependable, Autonomic and Secure Computing (DASC), 2011, pp. 547–554.
-
9)
-
11. Guo, Z., Fox, G.: ‘Improving MapReduce performance in heterogeneous network environments and resource utilization’. 12th IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing, 2012.
-
10)
-
12. Rashmi, S., Basu, A.: ‘Scheduling strategies in Hadoop: a survey’, OJCST, 2015, 8, (3), pp. 234–240, .
-
11)
-
21. de Oliveira, D., Ogasawara, E., Ocãna, K., et al: ‘An adaptive parallel execution strategy for cloud-based scientific workflows’, Concurrency Comput. Pract. Exp., 2012, 24, (13), pp. 1531–1550.
-
12)
-
13. Islam, M., Huang, A.K., Battisha, M., et al: ‘Oozie: towards a scalable workflow management system for Hadoop’. Proc. ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, 2012.
-
13)
-
1. Dean, J., Ghemawat, S.: ‘MapReduce: simplified data processing on large clusters’, Commun. ACM, 2008, 51, (1), pp. 107–113.
-
14)
-
15)
-
23. Rashmi, S., Basu, A.: ‘Deadline constrained cost effective workflow scheduler for Hadoop clusters in cloud datacenter’. Computational Systems and Information Technology for Sustainable Solution, R.V College of Engineering, Bengaluru, 6–8 October 2016.
-
16)
-
18. Chen, Q., Zhang, D., Guo, M., et al: ‘SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment’. Proc. 10th IEEE Int. Conf. CIT, 2010, pp. 2736–2743.
-
17)
-
2. White, T.: ‘Hadoop: the definitive guide’ (O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472, 2nd edn.).
-
18)
-
8. Rasooli, A., Down, D.G.: ‘COSHH: A Classi_cation and optimization based scheduler for heterogeneous Hadoop systems’, July 2014.
-
19)
-
14. Wensel, C.: ‘Cascading: defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster’, 2008.
-
20)
-
5. Palanisamy, B., Singh, A., Liu, L.: ‘Cost-effective resource provisioning for mapreduce in a cloud’, IEEE Trans. Parallel Distrib. Syst., 2015, 26, (5), pp. 1265–1279.
-
21)
-
24. Mondala, B., Dasgupta, K., Dutta, P.: ‘Load balancing in cloud computing using stochastic hill climbing – a soft computing approach’, Procedia Technology, 2012, 4, pp. 783–789.
-
22)
-
15. Olston, C., Chiou, G., Chitnis, L., et al: ‘Nova: continuous pig/Hadoop workflows’. Proc. ACM SIGMOD, 2011.
-
23)
-
10. Zhang, Q., Zhani, M.F., Yang, Y., et al: ‘PRISM: fine-grained resource-aware scheduling for MapReduce’, June 2015.
-
24)
-
16. Dong, F., Akl, S.G.: ‘PFAS: a resource-performance-fluctuation-aware workflow scheduling algorithm for grid computing’. Proc. IEEE IPDPS, 2007.
-
25)
-
25. Russell, S., Norvig, P.: ‘Artificial intelligence: a modern approach’ (Prentice-Hall, 2010, 3rd edn.), .
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-sen.2016.0289
Related content
content/journals/10.1049/iet-sen.2016.0289
pub_keyword,iet_inspecKeyword,pub_concept
6
6