Multiple-components weights model for cross-project software defect prediction

Shaojian Qiu; Lu Lu; Siyu Jiang

Multiple-components weights model for cross-project software defect prediction

View Fulltext

Author(s): Shaojian Qiu¹ ; Lu Lu^{1, 2} ; Siyu Jiang³
- Affiliations: 1: School of Computer Science and Engineering, South China University of Technology , Guangzhou 510000 , People's Republic of China ;
  2: Modern Industrial Technology Research Institute, South China University of Technology , Zhongshan 528400 , People's Republic of China ;
  3: School of Software Engineering, South China University of Technology , Guangzhou 510000 , People's Republic of China
Source: Volume 12, Issue 4, August 2018, p. 345 – 355
DOI: 10.1049/iet-sen.2017.0111 , Print ISSN 1751-8806, Online ISSN 1751-8814

Received 18/05/2017, Accepted 29/03/2018, Revised 01/03/2018, Published 04/04/2018

Software defect prediction (SDP) technology is receiving widely attention and most of SDP models are trained on data from the same project. However, at an early phase of the software lifecycle, there are little to no within-project training data to learn an available supervised defect-prediction model. Thus, cross-project defect prediction (CPDP), which is learning a defect predictor for a target project by using labelled data from a source project, has shown promising value in SDP. To better perform the CPDP, most current studies focus on filtering instances or selecting features to weaken the impact of irrelevant cross-project data. Instead, the authors propose a novel multiple-components weights (MCWs) learning model to analyse the varying auxiliary power of multiple components in a source project to construct a more precise ensemble classifiers for a target project. By combining the MCW model with kernel mean matching algorithm, their proposed approach adjusts the source-instance weights and source-component weights to jointly alleviate the negative impacts of irrelevant cross-project data. They conducted comprehensive experiments by employing 15 real-world datasets to demonstrate the advantages and effectiveness of their proposed approach.

References

1. 1)
  - 12. Yu, Q., Jiang, S., Qian, J.: ‘Which is more important for cross-project defect prediction: instance or feature?’. Int. Conf. Software Analysis, Testing and Evolution, Kunming, Yunnan, November 2016, pp. 90–95.
2. 2)
  - 10. Panichella, A., Oliveto, R., De Lucia, A.: ‘Cross-project defect prediction models: L'union fait la force’. Software Evolution Week-IEEE Conf. Software Maintenance, Reengineering and Reverse Engineering, Antwerp, Belgium, February 2014, pp. 164–173.
3. 3)
  - 15. Chen, L., Fang, B., Shang, Z., et al: ‘Negative samples reduction in cross-company software defects prediction’, Inf. Softw. Technol., 2015, 62, pp. 67–77.
4. 4)
  - 13. Ma, Y., Luo, G., Zeng, X., et al: ‘Transfer learning for cross-company software defect prediction’, Inf. Softw. Technol., 2012, 54, (3), pp. 248–256.
5. 5)
  - 25. Pan, S.J., Yang, Q.: ‘A survey on transfer learning’, IEEE Trans. Knowl. Data Eng., 2010, 22, (10), pp. 1345–1359.
6. 6)
  - 4. D'Ambros, M., Lanza, M., Robbes, R.: ‘An extensive comparison of bug prediction approaches’. Seventh IEEE Working Conf. Mining Software Repositories, Cape Town, South Africa, May 2010, pp. 31–41.
7. 7)
  - 21. Turhan, B., Misirli, A.T., Bener, A.: ‘Empirical evaluation of the effects of mixed project data on learning defect predictors’, Inf. Softw. Technol., 2013, 55, (6), pp. 1101–1118.
8. 8)
  - 16. Dai, W., Yang, Q., Xue, G.R., et al: ‘Boosting for transfer learning’. Proc. Twenty fourth Int. Conf. on Machine Learning, Corvallis, Oregon, June 2007, pp. 193–200.
9. 9)
  - 24. Freund, Y., Schapire, R.E.: ‘A decision-theoretic generalization of on-line learning and an application to boosting’. European Conf. Computational Learning Theory, Barcelona, Spain, March 1995, pp. 23–37.
10. 10)
  - 20. Zhang, F., Mockus, A., Keivanloo, I., et al: ‘Towards building a universal defect prediction model’. Proc. Eleventh Working Conf. Mining Software Repositories, Hyderabad, India, May 2014, pp. 182–191.
11. 11)
  - 26. Bollegala, D., Mu, T., Goulermas, J.Y.: ‘Cross-domain sentiment classification using sentiment sensitive embeddings’, IEEE Trans. Knowl. Data Eng., 2016, 28, (2), pp. 398–410.
12. 12)
  - 11. Turhan, B., Menzies, T., Bener, A.B., et al: ‘On the relative value of cross-company and within-company data for defect prediction’, Empir. Softw. Eng., 2009, 14, (5), pp. 540–578.
13. 13)
  - 31. Borgwardt, K.M., Gretton, A., Rasch, M.J., et al: ‘Integrating structured biological data by kernel maximum mean discrepancy’, Bioinformatics, 2006, 22, (14), pp. e49–e57.
14. 14)
  - 27. Zhu, Y., Chen, Y., Lu, Z., et al: ‘Heterogeneous transfer learning for image classification’. The Association for the Advancement of Artificial Intelligence, San Francisco, California, August 2011.
15. 15)
  - 5. Lee, T., Nam, J., Han, D.G., et al: ‘Micro interaction metrics for defect prediction’. Proc. Nineteenth ACM SIGSOFT Symp. Thirteenth European Conf. Foundations of Software Engineering, Szeged, Hungary, September 2011, pp. 311–321.
16. 16)
  - 17. Huang, J., Smola, A.J., Gretton, A., et al: ‘Correcting sample selection bias by unlabeled data’, Adv. Neural Inf. Process. Syst., 2007, 19, pp. 601–608.
17. 17)
  - 14. Peng, L., Yang, B., Chen, Y., et al: ‘Data gravitation based classification’, Inf. Sci., 2009, 179, (6), pp. 809–819.
18. 18)
  - 1. Menzies, T., Greenwald, J., Frank, A.: ‘Data mining static code attributes to learn defect predictors’, IEEE Trans. Softw. Eng., 2007, 33, (1), pp. 2–13.
19. 19)
  - 32. ‘CVX: Matlab Software for Disciplined Convex Programming’. Available at http://cvxr.com/cvx, accessed 27 February 2018.
20. 20)
  - 33. Jureczko, M., Madeyski, L.: ‘Towards identifying software project clusters with regard to defect prediction’. Proc. Sixth Int. Conf. Predictive Models in Software Engineering, New York, USA, September 2010, p. 9.
21. 21)
  - 7. Zimmermann, T., Nagappan, N., Gall, H., et al: ‘Cross-project defect prediction: a large scale experiment on data vs. Domain vs. Process’. Proc. Seventh Joint Meeting of the European Software Engineering Conf. the ACM SIGSOFT Symp. Foundations of Software Engineering, Amsterdam, The Netherlands, August 2009, pp. 91–100.
22. 22)
  - 23. Xia, X., Lo, D., Pan, S.J., et al: ‘HYDRA: massively compositional model for cross-project defect prediction’, IEEE Trans. Softw. Eng., 2016, 42, (10), pp. 977–998.
23. 23)
  - 28. Pan, S.J., Kwok, J.T., Yang, Q.: ‘Transfer learning via dimensionality reduction’. The Association for the Advancement of Artificial Intelligence, Chicago, Illinois, July 2008, pp. 677–682.
24. 24)
  - 29. Ng, A.Y., Jordan, M.I., Weiss, Y.: ‘On spectral clustering: analysis and an algorithm’. Conf. Workshop on Neural Information Processing Systems, Vancouver, Canada, December 2001, vol. 14, (2), pp. 849–856.
25. 25)
  - 3. Menzies, T., Milton, Z., Turhan, B., et al: ‘Defect prediction from static code features: current results, limitations, new approaches’, Autom. Softw. Eng., 2010, 17, (4), pp. 375–407.
26. 26)
  - 8. He, Z., Shu, F., Yang, Y., et al: ‘An investigation on the feasibility of cross-project defect prediction’, Autom. Softw. Eng., 2012, 19, (2), pp. 167–199.
27. 27)
  - 34. Ryu, D., Choi, O., Baik, J.: ‘Value-cognitive boosting with a support vector machine for cross-project defect prediction’, Empir. Softw. Eng., 2016, 21, (1), pp. 43–71.
28. 28)
  - 30. Sun, Q., Amin, M., Yan, B., et al: ‘Transfer learning for bilingual content classification’. Proc. Twenty First ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, New York, USA, August 2015, pp. 2147–2156.
29. 29)
  - 37. Jimenez, M., Papadakis, M., Traon, Y.L.: ‘Vulnerability prediction models: a case study on the linux Kernel’. The Int. Working Conf. Source Code Analysis and Manipulation, Raleigh, USA, October 2016, pp. 1–10.
30. 30)
  - 22. Yu, X., Liu, J., Fu, M., et al: ‘A multi-source tradaboost approach for cross-company defect prediction’. Proc. Twenty-Eighth Int. Conf. Software Engineering and Knowledge Engineering, San Francisco, California, July 2016, pp. 237–242.
31. 31)
  - 35. Yu, X., Zhou, M., Chen, X., et al: ‘Using class imbalance learning for cross-company defect prediction’. The 29th Int. Conf. Software Engineering and Knowledge Engineering, Pittsburgh, USA, July 2017, pp. 117–122.
32. 32)
  - 6. ‘The Promise Repository of Empirical Software Engineering Data’. Available at http://openscience.us/repo, accessed 06 June 2014.
33. 33)
  - 9. Nam, J., Pan, S.J., Kim, S.: ‘Transfer defect learning’. Proc. Int. Conf. Software Engineering, San Francisco, California, May 2013, pp. 382–391.
34. 34)
  - 19. Pan, S.J., Tsang, I.W., Kwok, J.T., et al: ‘Domain adaptation via transfer component analysis’, IEEE Trans. Neural Netw., 2011, 22, (2), pp. 199–210.
35. 35)
  - 2. Hassan, A.E.: ‘Predicting faults using the complexity of code changes’. Proc. Thirty-First Int. Conf. on Software Engineering, Vancouver, Canada, May 2009, pp. 78–88.
36. 36)
  - 18. Peters, F., Menzies, T., Marcus, A.: ‘Better cross company defect prediction’. Tenth IEEE Working Conf. Mining Software Repositories, San Francisco, California, May 2013, pp. 409–418.
37. 37)
  - 36. Chawla, N.V., Bowyer, K.W., Hall, L.O., et al: ‘SMOTE: synthetic minority over-sampling technique’, J. Artif. Intell. Res., 2011, 16, (1), pp. 321–357.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Multiple-components weights model for cross-project software defect prediction

References

Related content