A Large Language Model Framework for Early Software Bug Prediction in Software Engineering

Authors

  • Akhil Reddy Duggasani Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V7I2P137

Keywords:

Software Quality, Bug Detection, Software Defect Prediction, Machine Learning, SHAP Explainability, Random Forest, Distilbert

Abstract

Test managers may anticipate modules that are prone to defects by using software defect prediction models, which helps them produce high-quality products. Improving software quality and reducing expenses throughout the development process depend on early problem discovery. This study proposes an effective software bug detection framework using Random Forest (RF) and DistilBERT models. Advanced preprocessing and feature engineering are used to enhance the prediction performance. To handle the class imbalance, SMOTE is used, and RobustScaler is used for feature normalization. The performance of the proposed models is assessed using accuracy (acc), precision (prec), recall (rec), F1 score (F1), ROC AUC and training time. Experimental results showed that the RF model achieved superior performance, with 99.98% accuracy, predictive capability of 99.39% accuracy, and an F1-score of 99.69%. Near-perfect AUC values are obtained with ROC analysis, ensuring the robustness of both models. The proposed models are also compared against baseline models, including Artificial Neural Network (ANN), Convolutional Neural Network (CNN), Decision Tree (DT), AdaBoost (ADA), and Support Vector Machine (SVM), and are shown to outperform them. Moreover, SHAP-based interpretability analysis is conducted to identify most important factors influencing software defect forecasts, thereby making proposed models transparent and interpretable. Overall, the study provides a very accurate, interpretable and efficient solution for early predicting software flaws and enhancing software quality.

Downloads

Download data is not yet available.

References

[1] K. al-Sulbi and A. Attaallah, “Symmetric bug prediction in software requirement by machine learning algorithms,” Sci. Rep., vol. 15, no. 1, p. 38276, Oct. 2025, doi: 10.1038/s41598-025-22193-x.

[2] M. Kumari, R. Singh, and V. B. Singh, “Prioritization of Software Bugs Using Entropy-Based Measures,” J. Softw. Evol. Process, vol. 37, no. 2, 2025, doi: 10.1002/smr.2742.

[3] X. Du et al., “CoreBug: Improving Effort-Aware Bug Prediction in Software Systems Using Generalized k-Core Decomposition in Class Dependency Networks,” Axioms, vol. 11, no. 5, 2022, doi: 10.3390/axioms11050205.

[4] E. Kesavan, “Software Bug Prediction Using Machine Learning Algorithms: An Empirical Study on Code Quality and Reliability,” Int. J. Innov. Sci. Eng. Manag., pp. 377–381, Sep. 2025, doi: 10.69968/ijisem.2025v4i3377-381.

[5] C. Z. Yang, C. C. Hou, W. C. Kao, and I. X. Chen, “An empirical study on improving severity prediction of defect reports using feature selection,” in Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 2012. doi: 10.1109/APSEC.2012.144.

[6] B. Krishnan, A. Thaneeru, R. Lingam, and S. K. Kaata, “The Future of Cloud Data Engineering: Multi-Tenant, Multi-Region Pipelines Leveraging LLM-Powered Data Governance,” in 2025 1st International Conference on Advancement in Futuristic Technologies (ICAFT), Belagavi, India: IEEE, 2025, pp. 1–8, Dec. doi: 10.1109/ICAFT66710.2025.11453308.

[7] B. P. Singh and H. Singh, “Using LLMs for Autonomous Cloud Infrastructure Entitlement Management to Prevent Overprivileged Access,” J. Eng. Comput. Sci., vol. 5, no. 4, pp. 1–14, April, 2026, doi: https://doi.org/10.5281/zenodo.19488212.

[8] A. H. Dao and C. Z. Yang, “Severity prediction for bug reports using multi-aspect features: A deep learning approach,” Mathematics, 2021, doi: 10.3390/math9141644.

[9] M. Pradel and K. Sen, “DeepBugs: A learning approach to name-based bug detection,” Proc. ACM Program. Lang., 2018, doi: 10.1145/3276517.

[10] G. Fan, Y. Liang, L. Zu, H. Yu, Z. Huang, and W. Chen, “Automatic identification of extrinsic bug reports for just-in-time bug prediction,” Sci. Comput. Program., vol. 249, p. 103410, Apr. 2026, doi: 10.1016/j.scico.2025.103410.

[11] I. Mansour, M. Ben Said, and Y. H. Kacem, “Enhanced Software Bug Prediction Using Double-Stacked Ensembles and Halving Search Optimization,” Procedia Comput. Sci., vol. 270, pp. 3789–3798, 2025, doi: 10.1016/j.procs.2025.09.504.

[12] B. Xu et al., “Cross-Project Aging-Related Bug Prediction Based on Transfer Learning and Class Imbalance Learning,” IEEE Trans. Dependable Secur. Comput., 2025, doi: 10.1109/TDSC.2025.3567957.

[13] J. Jasz, “The Effectiveness of Hidden Dependence Metrics in Bug Prediction,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3406929.

[14] S. A. Alsaedi, A. Y. Noaman, A. A. A. Gad-Elrab, and F. E. Eassa, “Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learning Model,” IEEE Access, 2023, doi: 10.1109/ACCESS.2023.3288156.

[15] Z. Hou, L. Gong, M. Yang, Y. Zhang, and S. Yang, “Software Bug Prediction based on Complex Network Considering Control Flow,” in 2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C), IEEE, Dec. 2022, pp. 246–254. doi: 10.1109/QRS-C57518.2022.00044.

[16] R. B. Bahaweres, F. Agustian, I. Hermadi, A. I. Suroso, and Y. Arkeman, “Software defect prediction using neural network based smote,” in International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), 2020. doi: 10.23919/EECSI50503.2020.9251874.

[17] K. S. Bharath and P. Jagadeesh, “An Innovative Software Bug Prediction System using Random Forest Algorithm for Enhanced Accuracy in Comparison with Logistic Regression Algorithm,” in 2023 Intelligent Computing and Control for Engineering and Business Systems, ICCEBS 2023, 2023. doi: 10.1109/ICCEBS58601.2023.10449266.

[18] A. Ali, Y. Xia, Q. Umer, and M. Osman, “BERT based severity prediction of bug reports for the maintenance of mobile applications,” J. Syst. Softw., 2024, doi: 10.1016/j.jss.2023.111898.

[19] M. Jumare, H. and Darius, and T. Chinyio, “Software Defect Prediction Using Machine Learning and Deep Learning Techniques,” Kasu J. Comput. Sci., vol. 1, no. 3, pp. 527–543, 2024, doi: 10.47514/kjcs/2024.1.3.0010.

[20] N. A. A. Khleel and K. Nehéz, “A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method,” J. Intell. Inf. Syst., 2023, doi: 10.1007/s10844-023-00793-1.

[21] S. M. H. Kabir, M. T. Rahman, and A. H. Mridul, “Software Defect Prediction Using Traditional Machine Learning and Ensemble Learning Algorithms,” Smart Wearable Technol., no. May, pp. 1–16, 2025, doi: 10.47852/bonviewswt52025645.

[22] S. Haldar and L. F. Capretz, “Interpretable Software Defect Prediction from Project Effort and Static Code Metrics,” Computers, vol. 13, no. 2, p. 52, Feb. 2024, doi: 10.3390/computers13020052.

[23] B. Arasteh, S. S. Sefati, E. C. Popovici, I. F. Ince, and F. Kiani, “A Bedbug Optimization-Based Machine Learning Framework for Software Fault Prediction,” Mathematics, 2025, doi: 10.3390/math13213531.

Published

2026-05-18

Issue

Section

Articles

How to Cite

1.
Duggsaani AR. A Large Language Model Framework for Early Software Bug Prediction in Software Engineering . IJETCSIT [Internet]. 2026 May 18 [cited 2026 Jun. 12];7(2):304-11. Available from: https://www.ijetcsit.org/index.php/ijetcsit/article/view/747

Similar Articles

1-10 of 462

You may also start an advanced similarity search for this article.