Support vector regression for predicting the productivity of higher education graduate students from individually developed software projects
- Author(s): Cuauhtémoc López-Martín 1 ; Rosa Leonor Ulloa-Cazarez 2 ; Andrés García-Floriano 3
-
-
View affiliations
-
Affiliations:
1:
Department of Information Systems , Universidad de Guadalajara , Periférico Norte # 799, Zapopan Jalisco , México ;
2: Sistema de Universidad Virtual , Universidad de Guadalajara , Av. La Paz # 2453, Guadalajara Jalisco , México ;
3: Centro de Investigación en Computación , Instituto Politécnico Nacional , Av. Juan de Dios Bátiz, México , México
-
Affiliations:
1:
Department of Information Systems , Universidad de Guadalajara , Periférico Norte # 799, Zapopan Jalisco , México ;
- Source:
Volume 11, Issue 5,
October
2017,
p.
265 – 270
DOI: 10.1049/iet-sen.2016.0304 , Print ISSN 1751-8806, Online ISSN 1751-8814
Productivity prediction of a software engineer is necessary to determine whether corrective actions are needed and to identify improvement options to produce better results. It can be performed from abstraction levels such as organisation, team project, individual project, or task. Software engineering education and training has approached its efforts at individual level. In this study, the authors propose the application of a data mining technique named support vector regression (SVR) to predict the productivity of individuals (i.e. graduate students). Its prediction accuracy was compared with that of a statistical regression model, and with those of two neural networks. After applying a Wilcoxon statistical test, results suggest that an SVR with linear kernel using new and changed lines of code, and programming language experience as independent variables, could be used for predicting the individual productivity of a higher education graduate student, when software projects coded in either Java or C++ programming languages, have been developed by following a disciplined process specifically proposed for academic environments.
Inspec keywords: Java; computer science education; statistical testing; training; software engineering; C++ language; regression analysis; support vector machines; further education; data mining
Other keywords: software engineering education; SVR; higher education graduate student productivity; data mining technique; linear kernel; software engineering training; higher education graduate student prediction; programming language experience; support vector regression; software engineer productivity prediction; C++ programming languages; academic environments; statistical regression model; software projects; Java; neural networks; Wilcoxon statistical test
Subjects: Software engineering techniques; Other topics in statistics; Computing education and training; Knowledge engineering techniques; Data handling techniques
References
-
-
1)
-
52. Conover, W.J.: ‘Practical nonparametric statistics’ (Wiley, New York, NY, 1999).
-
-
2)
-
15. Yilmaz, M., O'Connor, R., Clarke, P.: ‘Effective social productivity measurements during software development – an empirical study’, IJSEKE Int. J. Softw. Eng. Knowledge Eng., 2015, 26, (3), pp. 457–490.
-
-
3)
-
29. Li, Y.F., Xie, M., Goh, T.N.: ‘A study of project selection and feature weighting for analogy based software cost estimation’, J. Syst. Softw., 2009, 82, (2), pp. 241–252.
-
-
4)
-
33. López-Martín, C., Kalichanin-Balich, I., Ulloa-Cazarez, R.L., et al: ‘A radial basis function neural network for predicting the effort of software projects individually developed in laboratory learning environments’, Int. J. Eng. Educ. (IJEE), 2016, 32, (2), pp. 982–994.
-
-
5)
-
21. Hernández-López, A., Colomo-Palacios, R., García-Crespo, Á.: ‘Software Egineering job productivity – a systematic review’, Int. J. Softw. Eng. Know. Eng., 2013, 23, (3), pp. 387–406.
-
-
6)
-
22. López-Martín, C., Chavoya-Peña, A., Meda-Campaña, M.E.: ‘A machine learning technique for predicting the productivity of practitioners from individually developed software projects’. Int. Conf. on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2014), Las Vegas, 2014.
-
-
7)
-
40. Schölkopf, B., Smola, A., Williamson, R.C., et al: ‘New support vector algorithms’, Neural Comput., 2000, 12, (5), pp. 1207–1245.
-
-
8)
-
12. Graziotin, D., Wang, X., Abrahamsson, P.: ‘Do feelings matter? On the correlation of affects and the selfassessed productivity in software engineering’, J. Softw. Evol. Proc., 2015, 27, (7), pp. 467–487.
-
-
9)
-
6. CSEE&T: Conference on Software Engineering Education and Training (CSEE&T), 18–19 May 2015. [Online]. Available at http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7195873, accessed 20 November 2016.
-
-
10)
-
11. Scholtes, I., Mavrodiev, P., Schweitzer, F.: ‘From Aristotle to Ringelmann: a large-scale analysis of team productivity and coordination in Open Source Software projects’, Empir. Softw. Eng., 2016, 21, pp. 642–683.
-
-
11)
-
24. López-Martín, C., Kalichanin-Balich, I., Meda-Campana, M.E., et al: ‘Software development productivity prediction of small programs using Fuzzy Logic’. ITNG Int. Conf. on Information Technology: New Generations, Las Vegas, 2010.
-
-
12)
-
4. Iqbal, J., Binti Ahmad, R., Hairul Nasir, M., et al: ‘Software SMEs’ unofficial readines for CMMI-based software process improvement’, Softw. Qual. J., 2016, 24, (4), pp. 997–1023.
-
-
13)
-
42. Chang, C.-C., Lin, C.-J.: ‘LIBSVM: a library for support vector machines’, ACM Trans. Intelligent Syst. Technol., 2011, 2, (3), pp. 1–27.
-
-
14)
-
16. Colomo-Palacios, R., Casado-Lumbreras, C., Soto-Acosta, P., et al: ‘Project Manager in global software development teams: a study of the effects on productivity and performance’, Softw. Qual. J., 2014, 22, (1), pp. 3–19.
-
-
15)
-
44. Min, J.H., Lee, Y.-C.: ‘Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters’, Expert Syst. Appl., 2005, 28, (4), pp. 603–614.
-
-
16)
-
39. Boser, B.E., Guyon, I.M., Vapnik, V.N.: ‘A training algorithm for optimal margin classifiers’. 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA, 1992.
-
-
17)
-
47. Horváth, G.: ‘Neural Networks in Measurement Systems (an engineering view)’, in Suykens, J.A.K., et al (Eds.): ‘Advances in learning theory: methods, models and applications’ (IOS Press, Leuven, Belgium, 2008), pp. 375–410.
-
-
18)
-
50. Jiang, Z., Comstock, C.: ‘The factors significant to software development productivity’, Int. J. Comp. Electr. Automat. Contr. Inf. Eng., 2007, 1, (1), pp. 68–72.
-
-
19)
-
26. Cortes, C., Vapnik, V.: ‘Support-vector networks’, Mach. Learn., 1995, 20, (3), pp. 273–297.
-
-
20)
-
17. Rodríguez, D., Sicilia, M.A., García, E., et al: ‘Empirical findings on team size and productivity in software development’, J. Syst. Softw., 2012, 85, (3), pp. 562–570.
-
-
21)
-
8. Begel, A., Zimmermann, T.: ‘Analyze this! 145 questions for data scientists in software engineering’. ICES 2014 Proc. of the 36th Int. Conf. on Software Engineering, Hyderabad, 2014.
-
-
22)
-
41. Smola, A.J., Schölkopf, B.: ‘A tutorial on support vector regression’, Stat. Comput., 2004, 14, (3), pp. 199–222.
-
-
23)
-
27. Corazza, A., Martino, S.D., Ferrucci, F., et al: ‘Investigating the use of support vector regression for web effort estimation’, Empir. Softw. Eng., 2010, 16, (2), pp. 1–33.
-
-
24)
-
43. Herbrich, R.: ‘Learning kernel classifiers theory and algorithms’ (MIT Press, Cambridge, MA, 2001).
-
-
25)
-
31. Kitchenham, B., Mendes, E.: ‘Why comparative effort prediction studies may be invalid’. PROMISE ‘09 Proc. of the 5th Int. Conf. on Predictor Models in Software Engineering, Vancouver, 2009.
-
-
26)
-
18. Wu, J., Gao, S.: ‘Software productivity estimation by regression and naïve-bayes classifier. An empirical research’. ICPIT 2016 the Int. Conf. on Promotion of Information Technology, Banff, 2016.
-
-
27)
-
25. Humphrey, W.S., Singpurwall, N.D.: ‘Predicting (individual) software productivity’, IEEE Trans. Softw. Eng., 1991, 17, pp. 196–207.
-
-
28)
-
1. OECD: ‘OECD science, technology and industry scoreboard 2015. Innovation for growth and society’ (OECD Publishing, Paris, 2015).
-
-
29)
-
13. Manoj Ray, D., Samuel, P.: ‘Improving the productivity in global software development’. 6th Int. Conf. on Innovations in Bio-inspired Computing and Applications, Kochi, India, 2015.
-
-
30)
-
19. Petersen, K.: ‘Measuring and predicting software productivity: a systematic map and review’, Inf. Softw. Technol., 2011, 53, (4), pp. 317–343.
-
-
31)
-
30. Oliveira, A.L.I.: ‘Estimation of software project effort with support vector regression’, Neurocomputing, 2006, 69, (13–15), pp. 1749–1753.
-
-
32)
-
49. Humphrey, W.S.: ‘A discipline for software engineering’ (Addison Wesley, Boston MA, 1995).
-
-
33)
-
38. Vapnik, V.N.: ‘Statistical learning theory’ (Wiley, New York, NY, 1998).
-
-
34)
-
7. Rombach, D., Münch, J., Ocampo, A., et al: ‘Teaching disciplined software development’, J. Syst. Softw., 2008, 81, (5), pp. 747–763.
-
-
35)
-
2. CSIMarket: ‘Economic indicators. Performance by industry’, CSIMarket.com, 2017.
-
-
36)
-
3. Bloomberg: ‘Broad global market Americas index’, Bloomberg L.P., 2017.
-
-
37)
-
23. López-Martín, C., Chavoya-Peña, A., Meda-Campana, M.E.: ‘Software development productivity prediction individual projects applying a neural network’. Int. Multi-Conf. on Engineering and Technological Innovation (IMETI), Orlando, 2013.
-
-
38)
-
28. Olivera, A., Braga, P.L., Lima, R.M.F., et al: ‘GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation’, Inform. Softw. Technol., 2010, 52, (11), pp. 1155–1166.
-
-
39)
-
9. Lavazza, L., Morasca, S., Tosi, D.: ‘An empirical study on the effect of programming languages on productivity’. 31st Annual ACM Symp. on Applied Computing, Pisa, Italy, 2016.
-
-
40)
-
32. López-Martín, C.: ‘Predictive accuracy comparison between neural networks and statistical regression for development effort of software projects’, Appl. Soft Comput., 2015, 27, (1), pp. 434–449.
-
-
41)
-
20. C/S2ESC - Software & Systems Engineering Standards Committee: ‘1045-1992 - IEEE Standard for Software Productivity Metrics’ (IEEE Standards Association, 2002).
-
-
42)
-
45. Goldberg, Y., Elhadad, M.: ‘Fast, space-efficient, non-heuristic, polynomial kernel computation for NLP applications’. ACL-08: 46st Annual Meeting of the Association of Computational Linguistics, Columbus, Ohio, 2008.
-
-
43)
-
5. Academic Ranking of World Universities: ‘Academic ranking of world universities’ (Shangai Ranking Consultancy, 2016).
-
-
44)
-
51. Ross, S.M.: ‘Introduction to probability and statistics for engineers and scientists’ (Elsevier Press, Burlington, MA, 2004).
-
-
45)
-
46. Kantardzic, M.: ‘Data mining concepts, models, methods, and algorithms’ (Wiley-IEEE Press, Piscataway, NJ, 2011).
-
-
46)
-
48. Gass, S., Michael, F.C.: ‘Lagrange multipliers, encyclopedia of operations research and management science’ (Springer, New York, NY, 2013).
-
-
47)
-
10. Oliveira, E., Conte, T., Cristo, M., et al: ‘Software project managers’ perceptions of productivity factors: findings from a qualitative study’. 10th ACM/IEEE Int. Symp. on Empirical Software Engineering and Measurement, Ciudad Real, Spain, 2016.
-
-
48)
-
14. Xiaoying Kong, L.L., Chen, J.: ‘How project duration, upfront costs and uncertainty interact and impact on software development productivity? A simulation approach’, Int. J. Agile Syst. Manage., 2015, 8, (1), pp. 39–52.
-
-
49)
-
37. Duarte, C.H.C.: ‘Productivity paradoxes revisited’, Empir. Softw. Eng., 2017, 22, (2), pp. 818–847.
-
-
50)
-
36. Boehm, B., Abts, C., Winsor Brown, A., et al: ‘Software cost estimation with COCOMO II’ (Prentice-Hall Inc., Mexico, 2009).
-
-
51)
-
34. Raza, M., Faria, J.P.: ‘Factors affecting personal software development productivity: a case study with PSP data’. Int. Conf. on Software Engineering, SE 2014, IASTED, Hyderabad, 2014.
-
-
1)