access icon free SamEn-SVR: using sample entropy and support vector regression for bug number prediction

Monitoring and predicting the trend of bug number time series of a software system is crucial for both software project managers and software end-users. For software managers, accurate prediction of bug number of a software system will assist them in making timely decisions, such as effort investment and resource allocation. For software end-users, knowing possible bug number of their systems ahead will enable them to adopt timely actions in coping with the loss caused by possible system failures. This study proposes an approach called SamEn-SVR to combine sample entropy and support vector regression (SVR) to predict software bug number using time series analysis. The basic idea is to use template vectors with the smallest complexity as input vectors for SVR classifiers to ensure predictability of time series. By using Mozilla Firefox bug data, we conduct extensive experiments to compare the proposed approach and state-of-the-art techniques including auto-regressive integrated moving average (ARIMA), X12 enhanced ARIMA and polynomial regression to predict bug number time series. Experimental results demonstrate that the proposed SamEn-SVR approach outperforms state-of-the-art techniques in bug number prediction.

Inspec keywords: pattern classification; vectors; software management; entropy; regression analysis; autoregressive moving average processes; time series; support vector machines; program debugging

Other keywords: template vectors; bug number prediction; software bug number; SVR classifiers; Mozilla Firefox bug data; software system; support vector regression; SamEn-SVR approach; software end-users; input vectors; time series analysis; bug number time series; software project managers; sample entropy

Subjects: Other topics in statistics; Diagnostic, testing, debugging and evaluating systems; Algebra; Software management; Software engineering techniques; Knowledge engineering techniques

References

    1. 1)
      • 26. Kenmei, B., Antoniol, G., Penta, M.: ‘Trend analysis and issue prediction in large-scale open source systems’. Proc. 12th European Conf. on Software Maintenance and Reengineering, 2008, pp. 7382.
    2. 2)
      • 28. Pati, J., Shukla, K.K.: ‘A comparison of ARIMA, neural network and a hybrid technique for Debian bug number prediction’. Proc. 5th Int. Conf. on Computer and Communication Technology, 2014, pp. 4753.
    3. 3)
      • 15. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: ‘Learning internal representations by error propagation’. Parallel Distributed Processing, Exploitations in the Microstructure of Cognition, Cambridge, MA, 1986, vol. 1, pp. 318362.
    4. 4)
      • 2. Zhang, W., Wang, S., Wang, Q.: ‘KSAP an approach to bug report assignment using KNN search and heterogeneous proximity’, Inf. Softw. Technol., 2016, 70, pp. 6884.
    5. 5)
      • 1. Pressman, S.: ‘Software engineering: a practitioner's approach’ (McGraw-Hill Education, Ohio USA, 2005, 6th edn.).
    6. 6)
      • 8. Sauer, T., Yorke, J.A., Casdagli, M.: ‘Embedology’, J. Stat. Phys., 1991, 65, p. 579.
    7. 7)
      • 19. ‘Mozilla Firefox Project’, available at https://www.mozilla.org.
    8. 8)
      • 4. Hassan, A.E.: ‘The road ahead for mining software repositories’. Frontiers of Software Maintenance (FoSM'2008), 2008, pp. 4857.
    9. 9)
      • 9. Shumway, R.H., Stoffer, D.S.: ‘Time series analysis and its applications with R examples (springer texts in statics)’ (Springer, Heidelberg, Berlin, 2006).
    10. 10)
      • 25. Kemerer, C.F., Slaughter, S.: ‘An empirical approach to studying software evolution’, IEEE Trans. Software Eng., 1999, 25, (4), pp. 493509.
    11. 11)
      • 30. Huang, S., Chuang, P., Wub, C., et al: ‘A chaos-based support vector regressions for exchange rate forecasting’, Expert Syst. Appl., 2010, 37, pp. 85908598.
    12. 12)
      • 13. Findley, D.F., Monsell, B.C., Bell, W.R., et al: ‘New capabilities and methods of the X-12-ARIMA seasonal adjustment program’, J. Bus. Econ. Stat., 1998, 16, pp. 127176.
    13. 13)
      • 14. Hazewinkel, M.: ‘Taylor formula’ in ‘Encyclopedia of Mathematics’ (Springer, Heidelberg, Berlin, 2001), ISBN 978-1-55608-010-4.
    14. 14)
      • 27. Caprio, F., Casazza, G., Penta, M.D., et al: ‘Measuring and predicting the Linux kernel evolution’. Proc. Int. Workshop of Empirical Studies on Software Maintenance, Florence, Italy, 2001.
    15. 15)
      • 20. ‘MSR 2010 challenge’, available at http://msr.uwaterloo.ca/msr2010/challenge/.
    16. 16)
      • 3. Zhang, H.: ‘An initial study of the growth of eclipse defects’. Fifth Int. Workshop on Mining Software Repositories (MSR 2008), 10–11 May 2008.
    17. 17)
      • 7. Takens, F.: ‘Detecting strange attractors in turbulence’ in ‘Lecture Notes in Mathematics’, vol. 898 (Springer, Berlin, 1981), pp. 366381.
    18. 18)
      • 21. ‘Eviews Trial Version’, available at http://www.eviews.com/home.html.
    19. 19)
      • 18. Zhang, W., Yoshida, T., Tang, X.: ‘Text classification based on multi-word with support vector machine’, Knowl.-Based Syst., 2008, 21, (8), pp. 879886.
    20. 20)
      • 12. Findley, D.F., Hood, C.C.: ‘X-12-ARIMA and its application to some Italian indicator series’. Seasonal adjustment procedures – experiences and perspectives, Istituto Nazionale di Statistica, Rome, 2000, 10, (20), pp. 231251.
    21. 21)
      • 17. Richman, J.S., Moorman, J.R.: ‘Physiological time-series analysis using approximate entropy and sample entropy’, Am. J. Physiol. Heart Circ. Physiol., 2000, 278, (6), pp. 20392049.
    22. 22)
      • 23. Mann, H.B., Whitney, R.: ‘On a test of whether one of two random variables is stochastically larger than the other’, Ann. Math. Stat., 1947, 18, (1), pp. 5060.
    23. 23)
      • 11. ‘X12-ARIMA’, available at http://www.census.gov/srd/www/x12a/.
    24. 24)
      • 24. Yuen, C.C.H.: ‘On analyzing maintenance process data at the global and detail levels: a case study’. Proc. 6th Int. Conf. on Software Maintenance, 1988, pp. 248255.
    25. 25)
      • 5. Zimmermann, T., Weigerber, P., Diehl, S., et al: ‘Mining version histories to guide software changes’, IEEE Trans. Softw. Eng., 2005, 31, (6), pp. 429445, doi: 10.1109/TSE.2005.72.
    26. 26)
      • 6. Herraiz, I., González-Barahona, J.M., Robles, G.: ‘Forecasting the number of changes in eclipse using time series analysis’. Proc. Fourth Int. Workshop on Mining Software Repositories (MSR 2007), 19–20 May 2007, p. 32.
    27. 27)
      • 29. Wu, W., Zhang, W., Yang, Y., et al: ‘Time series analysis for bug number prediction’. Proc. 2nd Int. Conf. on Software Engineering and Data Mining, 2010, pp. 589596.
    28. 28)
      • 10. Box, G.E.P., Jenkins, G.: ‘Time series analysis: forecasting and control, Holden-day’ (Prentice-Hall, New York, NY, 1994, 3rd edn.).
    29. 29)
      • 16. Cantrell, C.D.: ‘Modern mathematical methods for physicists and engineers’ (Cambridge University Press, Cambridge, 2000).
    30. 30)
      • 22. Briand, L.C., Wieczorek, I.: ‘Resource estimation in software engineering’, in ‘Encyclopedia of software engineering’ (Wiley, Hoboken, 2002), pp. 11601196.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-sen.2017.0168
Loading

Related content

content/journals/10.1049/iet-sen.2017.0168
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading