access icon free Resource optimised workflow scheduling in Hadoop using stochastic hill climbing technique

Hadoop on datacentre is a popular analytical platform for enterprises. Cloud vendors host Hadoop clusters on the datacentre to provide high performance analytical computing facilities to its customers, who demand a parallel programming model to deal with huge data. Effective cost/time management and ingenious resource consumption among the concurrent users, must be the primary concern without which the key aspiration behind high performance cloud computing would suffer. Workflows portray such high performance applications in terms of individual jobs and dependencies between them. Workflows can be scheduled on virtual machines (VMs) in datacentre to make best possible use of resources. In the authors’ earlier work, a mechanism to pack and execute the customer jobs as workflows on Hadoop platform was proposed which minimises the VM cost and also executes the workflow jobs within deadline. In this work, the authors try to optimise certain other parameters such as load on cloud, response time for workflows, resource usage effectiveness by applying soft computing methods. Stochastic hill climbing (SCH) is a soft computing approach used to solve many optimisation problems. In this study, they have employed the SHC approach to schedule workflow jobs to VMs and thereby optimise the above mentioned multiple parameters in cloud datacentre.

Inspec keywords: scheduling; virtual machines; parallel programming; workflow management software; stochastic processes; cloud computing; data handling; operating systems (computers)

Other keywords: parallel programming model; stochastic hill climbing technique; SCH; cloud computing; VM; workflows portray; Hadoop platform; resource consumption; computing facilities; datacentre; resource optimised workflow scheduling; cloud vendors; Hadoop clusters; virtual machines; concurrent users

Subjects: Parallel software; Parallel programming; Other topics in statistics; Operating systems; Internet software; Data handling techniques

References

    1. 1)
      • 7. Hadoop's Capacity Scheduler. Available at http://hadoop.apache.org/core/docs/current/capacity_scheduler.html.
    2. 2)
      • 17. Krish, K.R., Anwar, A., Butt, A.R.: ‘ΦSched: a heterogeneity-aware Hadoop workflow scheduler’. Proc. of 22nd Int. Symp. Modelling, Analysis & Simulation of Computer and Telecommunication Systems, September 2014, pp. 255264.
    3. 3)
      • 4. Li, S., Hu, S., Wang, S., et al: ‘WOHA: deadline-aware map-reduce workflow scheduling framework over Hadoop clusters’. Int. Conf. Distributed Computing Systems, 2014, pp. 93103.
    4. 4)
      • 19. Wang, J., Crawl, D., Altintas, I.: ‘Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems’. Proc. WORKS, 2009.
    5. 5)
      • 20. de Oliveira, D., Ocãna, K.A.C.S., Baĩao, F., et al: ‘A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds’, J. Grid Comput., 2012, 10, (3), pp. 521552.
    6. 6)
      • 9. Liu, Z., Zhang, Q., Zhani, M.F., et alDREAMS: dynamic resource allocation for MapReduce with data skew’, 2015.
    7. 7)
      • 3. Gu, R., Yang, X., Yan, J., et al: ‘SHadoop: improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters’, J. Parallel Distrib. Comput., 2014, 74, (3), pp. 21662179.
    8. 8)
      • 22. Deng, K., Kong, L., Song, J., et al: ‘A weighted k-means clustering based co-scheduling strategy towards efficient execution of scientific workflows in collaborative cloud environments’. IEEE 9th Int. Conf. Dependable, Autonomic and Secure Computing (DASC), 2011, pp. 547554.
    9. 9)
      • 11. Guo, Z., Fox, G.: ‘Improving MapReduce performance in heterogeneous network environments and resource utilization’. 12th IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing, 2012.
    10. 10)
      • 12. Rashmi, S., Basu, A.: ‘Scheduling strategies in Hadoop: a survey’, OJCST, 2015, 8, (3), pp. 234240, ISSN: 0974-6471.
    11. 11)
      • 21. de Oliveira, D., Ogasawara, E., Ocãna, K., et al: ‘An adaptive parallel execution strategy for cloud-based scientific workflows’, Concurrency Comput. Pract. Exp., 2012, 24, (13), pp. 15311550.
    12. 12)
      • 13. Islam, M., Huang, A.K., Battisha, M., et al: ‘Oozie: towards a scalable workflow management system for Hadoop’. Proc. ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, 2012.
    13. 13)
      • 1. Dean, J., Ghemawat, S.: ‘MapReduce: simplified data processing on large clusters’, Commun. ACM, 2008, 51, (1), pp. 107113.
    14. 14)
      • 6. Hadoop's Fair Scheduler. Available at https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.
    15. 15)
      • 23. Rashmi, S., Basu, A.: ‘Deadline constrained cost effective workflow scheduler for Hadoop clusters in cloud datacenter’. Computational Systems and Information Technology for Sustainable Solution, R.V College of Engineering, Bengaluru, 6–8 October 2016.
    16. 16)
      • 18. Chen, Q., Zhang, D., Guo, M., et al: ‘SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment’. Proc. 10th IEEE Int. Conf. CIT, 2010, pp. 27362743.
    17. 17)
      • 2. White, T.: ‘Hadoop: the definitive guide’ (O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472, 2nd edn.).
    18. 18)
      • 8. Rasooli, A., Down, D.G.: ‘COSHH: A Classi_cation and optimization based scheduler for heterogeneous Hadoop systems’, July 2014.
    19. 19)
      • 14. Wensel, C.: ‘Cascading: defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster’, 2008.
    20. 20)
      • 5. Palanisamy, B., Singh, A., Liu, L.: ‘Cost-effective resource provisioning for mapreduce in a cloud’, IEEE Trans. Parallel Distrib. Syst., 2015, 26, (5), pp. 12651279.
    21. 21)
      • 24. Mondala, B., Dasgupta, K., Dutta, P.: ‘Load balancing in cloud computing using stochastic hill climbing – a soft computing approach’, Procedia Technology, 2012, 4, pp. 783789.
    22. 22)
      • 15. Olston, C., Chiou, G., Chitnis, L., et al: ‘Nova: continuous pig/Hadoop workflows’. Proc. ACM SIGMOD, 2011.
    23. 23)
      • 10. Zhang, Q., Zhani, M.F., Yang, Y., et al: ‘PRISM: fine-grained resource-aware scheduling for MapReduce’, June 2015.
    24. 24)
      • 16. Dong, F., Akl, S.G.: ‘PFAS: a resource-performance-fluctuation-aware workflow scheduling algorithm for grid computing’. Proc. IEEE IPDPS, 2007.
    25. 25)
      • 25. Russell, S., Norvig, P.: ‘Artificial intelligence: a modern approach’ (Prentice-Hall, 2010, 3rd edn.), ISBN 0136042597.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-sen.2016.0289
Loading

Related content

content/journals/10.1049/iet-sen.2016.0289
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading