An Extended Software Defect Prediction Framework for Improving Accuracy
Abstract
Ensuring the quality and reliability of software systems is a critical challenge, mainly due to the difficulty in accurately predicting software defects. Traditional models often struggle with maintaining high accuracy across diverse datasets and classifiers. This study addresses this challenge by exploring the enhancement of defect prediction accuracy by integrating L2 regularisation with feature selection and dataset resampling techniques, forming the With Feature Selection and Regularization (WFS+RG) model.
The evaluation was conducted using five classifiers – Multilayer Perceptron, Bayes Net, Lazy IBK, J48, and Logistic Regression – applied to two datasets, PC1 and JM1, within the Waikato Environment for Knowledge Analysis (WEKA). The results indicate that the WFS+RG model consistently outperforms the traditional model With Feature Selection WFS only across most classifiers. Specifically, on the PC1 dataset, Lazy IBK achieved an accuracy of 97.29%, J48 reached 95.67%, and Multilayer Perceptron obtained 94.40%. Bayes Net and Logistic Regression also showed strong performances with 91.25 and 93.86% accuracy, respectively. On the JM1 dataset, Lazy IBK demonstrated significant improvement with an accuracy of 90.21%, while J48 and Multilayer Perceptron achieved 84.92 and 81.02%, respectively. Bayes Net and Logistic Regression reported 76.93 and 81.10% accuracy, respectively. However, the improvements are not uniform across all classifiers, with some showing minimal change. Statistical analysis conducted via Minitab confirms the robustness and significance of the WFS+RG model's impact, underscoring its potential as a more practical approach in software defect prediction tasks.Keywords
Full Text:
PDFReferences
1. Harizaj, M., Harizaj, A., Idrizi, O., & Tomco, V. (2023). Analysis And Potential Improvement Of Software Fault Prediction Approaches. Journal of Southwest Jiaotong University, 58(4).
2. Gupta, R., Rajkumar, A., & N., B. (2023). Predicting software defects with swarm-intelligence-based machine learning algorithm for improved process quality. Multidisciplinary Science Journal, 5, 2023ss0311. doi: 10.31893/multiscience.2023ss0311
3. Ali, U., Aftab, S., Iqbal, A., Nawaz, Z., Salman Bashir, M., & Anwaar Saeed, M. (2020). Software Defect Prediction Using Variant based Ensemble Learning and Feature Selection Techniques. International Journal of Modern Education and Computer Science, 12(5), 29–40. doi: 10.5815/ijmecs.2020.05.03
4. Thota, M. K., Shajin, F. H., Rajesh, P. (2020). Survey on software defect prediction techniques. International Journal of Applied Science and Engineering, 17, 331–344.
5. Qiao, L., Li, X., Umer, Q., & Guo, P. (2020). Deep learning based software defect prediction. Neurocomputing, 385, 100–110. doi: 10.1016/j.neucom.2019.11.067
6. Kalaivani, N., & Beena, R. (2018). Overview of software defect prediction using machine learning algorithms. International Journal of Pure and Applied Mathematics, 118(20), 3863–3873.
7. Meng, F., Cheng, W., & Wang, J. (2021). Semi-supervised Software Defect Prediction Model Based on Tri-training. KSII Transactions on Internet and Information Systems, 15(11). doi: 10.3837/tiis.2021.11.009
8. Chen, H., Chen, Z., Zheng, H., Ge, L., & Gao, X. (2022). Policy shock effect of SDP on environmental total factors productivity: 53 coal cities versus 165 non-resource-based cities. Environmental Science and Pollution Research, 29(30), 46145–46160. doi: 10.1007/s11356-022-19163-5
9. Fanuel, M., & Tyagi, H. (2022). Denoising modulo samples: k-NN regression and tightness of SDP relaxation. Retrieved from https://arxiv.org/abs/2009.04850
10. Saheed, Y. K., Longe, O., Baba, U. A., Rakshit, S., & Vajjhala, N. R. (2021). An Ensemble Learning Approach for Software Defect Prediction in Developing Quality Software Product. Advances in Computing and Data Sciences, 317–326. doi: 10.1007/978-3-030-81462-5_29
11. Shankar, B. M., Sivakumar, S. A., Dhabliya, D., Sundari, P. A., Asmitha, M., & Shree, S. M. G. (2023). Software Defect Prediction using ANN Algorithm. 2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 682–686. doi: 10.1109/i-smac58438.2023.10290523
12. Aljamaan, H., & Alazba, A. (2020). Software defect prediction using tree-based ensembles. Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering, 1–10. doi: 10.1145/3416508.3417114
13. Bashir, K., Li, T., Yohannese, C. W., & Yahaya, M. (2020). SMOTEFRIS-INFFC: Handling the challenge of borderline and noisy examples in imbalanced learning for software defect prediction. Journal of Intelligent & Fuzzy Systems, 38(1), 917–933. doi: 10.3233/jifs-179459
14. Akimova, E. N., Bersenev, A. Yu., Deikov, A. A., Kobylkin, K. S., Konygin, A. V., Mezentsev, I. P., & Misilov, V. E. (2021). A Survey on Software Defect Prediction Using Deep Learning. Mathematics, 9(11), 1180. doi: 10.3390/math9111180
15. Feng, S., Keung, J., Yu, X., Xiao, Y., Bennin, K. E., Kabir, M. A., & Zhang, M. (2021). COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction. Information and Software Technology, 129, 106432. doi: 10.1016/j.infsof.2020.106432
16. Thi-Mai-Anh, B., & Nhat-Hai, N. (2023). On the Value of Code Embedding and Imbalanced Learning Approaches for Software Defect Prediction. Proceedings of the 12th International Symposium on Information and Communication Technology, 510–516. doi: 10.1145/3628797.3628963
17. Gong, L., Jiang, S., & Jiang, L. (2019). Tackling Class Imbalance Problem in Software Defect Prediction Through Cluster-Based Over-Sampling With Filtering. IEEE Access, 7, 145725–145737. doi: 10.1109/access.2019.2945858
18. Goyal, S., & Bhatia, P. K. (2021). Heterogeneous stacked ensemble classifier for software defect prediction. Multimedia Tools and Applications, 81(26), 37033–37055. doi: 10.1007/s11042-021-11488-6
19. Yi, H., Jiang, Q., Yan, X., & Wang, B. (2021). Imbalanced Classification Based on Minority Clustering Synthetic Minority Oversampling Technique With Wind Turbine Fault Detection Application. IEEE Transactions on Industrial Informatics, 17(9), 5867–5875. doi: 10.1109/tii.2020.3046566
20. Qu, Y., Li, Z., Zhao, J., & Li, H. (2022). Unbalanced data processing for software defect prediction. 2022 4th International Conference on Data-Driven Optimization of Complex Systems (DOCS), 1–6. doi: 10.1109/docs55193.2022.9967755
21. Khalid, A., Badshah, G., Ayub, N., Shiraz, M., & Ghouse, M. (2023). Software Defect Prediction Analysis Using Machine Learning Techniques. Sustainability, 15(6), 5517. doi: 10.3390/su15065517
22. Deep Singh, P., & Chug, A. (2017). Software defect prediction analysis using machine learning algorithms. 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence, 775–781. doi: 10.1109/confluence.2017.7943255
23. Khan, B., Naseem, R., Shah, M. A., Wakil, K., Khan, A., Uddin, M. I., & Mahmoud, M. (2021). Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques. Journal of Healthcare Engineering, 2021, 1–16. doi: 10.1155/2021/8899263
24. Sheth, V., Tripathi, U., & Sharma, A. (2022). A Comparative Analysis of Machine Learning Algorithms for Classification Purpose. Procedia Computer Science, 215, 422–431. doi: 10.1016/j.procs.2022.12.04
25. Siswantoro, M. Z. F. N., & Yuhana, U. L. (2023). Software Defect Prediction Based on Optimised Machine Learning Models: A Comparative Study. Teknika, 12(2), 166–172. doi: 10.34148/teknika.v12i2.634
26. Rathore, S. S., & Kumar, S. (2020). An empirical study of ensemble techniques for software fault prediction. Applied Intelligence, 51(6), 3615–3644. doi: 10.1007/s10489-020-01935-6
27. Pandey, S. K., & Tripathi, A. K. (2021). Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study. 2021 8th International Conference on Smart Computing and Communications (ICSCC), 58–63. doi: 10.1109/icscc51209.2021.9528170
28. Li, J., He, P., Zhu, J., & Lyu, M. R. (2017). Software Defect Prediction via Convolutional Neural Network. 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), 318–328. doi: 10.1109/qrs.2017.42
29. Sidey-Gibbons, J. A. M., & Sidey-Gibbons, C. J. (2019). Machine larning in medicine: a practical introduction. BMC Medical Research Methodology, 19(1). doi: 10.1186/s12874-019-0681-4
30. Shaikh, S. Y., Qureshi, N. A., Khan, M. Z., Khan, M. A., Imroz, A., & Kalwar, M. A. (2023). Performance Analysis of Classification Algorithms for Software Defects Prediction by Mathematical Modelling & Simulations. Journal of Software Engineering, 2(1), 1–28.
31. Alsawalqah, H., Faris, H., Aljarah, I., Alnemer, L., & Alhindawi, N. (2017). Hybrid SMOTE-Ensemble Approach for Software Defect Prediction. Software Engineering Trends and Techniques in Intelligent Systems, 355–366. doi: 10.1007/978-3-319-57141-6_39
32. Zubair Khan, M. (2020). Hybrid Ensemble Learning Technique for Software Defect Prediction. International Journal of Modern Education and Computer Science, 12(1), 1–10. doi: 10.5815/ijmecs.2020.01.01
33. Manjula, C., & Florence, L. (2018). Hybrid Approach for Software Defect Prediction Using Machine Learning with Optimization Technique. World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, 12, 28-32.
34. Singh, P., Pal, N. R., Verma, S., & Vyas, O. P. (2017). Fuzzy Rule-Based Approach for Software Fault Prediction. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 47(5), 826–837. doi: 10.1109/tsmc.2016.2521840
35. Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., Ullah, N., & Huda, S. (2023). A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning. IEEE Access, 11, 63579–63597. doi: 10.1109/access.2023.3287326
36. Alsawalqah, H., Hijazi, N., Eshtay, M., Faris, H., Radaideh, A. A., Aljarah, I., & Alshamaileh, Y. (2020). Software Defect Prediction Using Heterogeneous Ensemble Classification Based on Segmented Patterns. Applied Sciences, 10(5), 1745. doi: 10.3390/app10051745
Article Metrics
Metrics powered by PLOS ALM
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Isah Abdullateef, Souley Boukari, Usman Ali Abdullahi

This work is licensed under a Creative Commons Attribution 4.0 International License.



