A Large Language Model Framework for Early Software Bug Prediction in Software Engineering

Akhil Reddy Duggsaani

doi:10.63282/3050-9246.IJETCSIT-V7I2P137

Authors

Akhil Reddy Duggasani Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V7I2P137

Keywords:

Software Quality, Bug Detection, Software Defect Prediction, Machine Learning, SHAP Explainability, Random Forest, Distilbert

Abstract

Test managers may anticipate modules that are prone to defects by using software defect prediction models, which helps them produce high-quality products. Improving software quality and reducing expenses throughout the development process depend on early problem discovery. This study proposes an effective software bug detection framework using Random Forest (RF) and DistilBERT models. Advanced preprocessing and feature engineering are used to enhance the prediction performance. To handle the class imbalance, SMOTE is used, and RobustScaler is used for feature normalization. The performance of the proposed models is assessed using accuracy (acc), precision (prec), recall (rec), F1 score (F1), ROC AUC and training time. Experimental results showed that the RF model achieved superior performance, with 99.98% accuracy, predictive capability of 99.39% accuracy, and an F1-score of 99.69%. Near-perfect AUC values are obtained with ROC analysis, ensuring the robustness of both models. The proposed models are also compared against baseline models, including Artificial Neural Network (ANN), Convolutional Neural Network (CNN), Decision Tree (DT), AdaBoost (ADA), and Support Vector Machine (SVM), and are shown to outperform them. Moreover, SHAP-based interpretability analysis is conducted to identify most important factors influencing software defect forecasts, thereby making proposed models transparent and interpretable. Overall, the study provides a very accurate, interpretable and efficient solution for early predicting software flaws and enhancing software quality.

Downloads

Download data is not yet available.

References

[1] K. al-Sulbi and A. Attaallah, “Symmetric bug prediction in software requirement by machine learning algorithms,” Sci. Rep., vol. 15, no. 1, p. 38276, Oct. 2025, doi: 10.1038/s41598-025-22193-x.

[2] M. Kumari, R. Singh, and V. B. Singh, “Prioritization of Software Bugs Using Entropy-Based Measures,” J. Softw. Evol. Process, vol. 37, no. 2, 2025, doi: 10.1002/smr.2742.

[3] X. Du et al., “CoreBug: Improving Effort-Aware Bug Prediction in Software Systems Using Generalized k-Core Decomposition in Class Dependency Networks,” Axioms, vol. 11, no. 5, 2022, doi: 10.3390/axioms11050205.

[4] E. Kesavan, “Software Bug Prediction Using Machine Learning Algorithms: An Empirical Study on Code Quality and Reliability,” Int. J. Innov. Sci. Eng. Manag., pp. 377–381, Sep. 2025, doi: 10.69968/ijisem.2025v4i3377-381.

[5] C. Z. Yang, C. C. Hou, W. C. Kao, and I. X. Chen, “An empirical study on improving severity prediction of defect reports using feature selection,” in Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 2012. doi: 10.1109/APSEC.2012.144.

[6] B. Krishnan, A. Thaneeru, R. Lingam, and S. K. Kaata, “The Future of Cloud Data Engineering: Multi-Tenant, Multi-Region Pipelines Leveraging LLM-Powered Data Governance,” in 2025 1st International Conference on Advancement in Futuristic Technologies (ICAFT), Belagavi, India: IEEE, 2025, pp. 1–8, Dec. doi: 10.1109/ICAFT66710.2025.11453308.

[7] B. P. Singh and H. Singh, “Using LLMs for Autonomous Cloud Infrastructure Entitlement Management to Prevent Overprivileged Access,” J. Eng. Comput. Sci., vol. 5, no. 4, pp. 1–14, April, 2026, doi: https://doi.org/10.5281/zenodo.19488212.

[8] A. H. Dao and C. Z. Yang, “Severity prediction for bug reports using multi-aspect features: A deep learning approach,” Mathematics, 2021, doi: 10.3390/math9141644.

[9] M. Pradel and K. Sen, “DeepBugs: A learning approach to name-based bug detection,” Proc. ACM Program. Lang., 2018, doi: 10.1145/3276517.

[10] G. Fan, Y. Liang, L. Zu, H. Yu, Z. Huang, and W. Chen, “Automatic identification of extrinsic bug reports for just-in-time bug prediction,” Sci. Comput. Program., vol. 249, p. 103410, Apr. 2026, doi: 10.1016/j.scico.2025.103410.

[11] I. Mansour, M. Ben Said, and Y. H. Kacem, “Enhanced Software Bug Prediction Using Double-Stacked Ensembles and Halving Search Optimization,” Procedia Comput. Sci., vol. 270, pp. 3789–3798, 2025, doi: 10.1016/j.procs.2025.09.504.

[12] B. Xu et al., “Cross-Project Aging-Related Bug Prediction Based on Transfer Learning and Class Imbalance Learning,” IEEE Trans. Dependable Secur. Comput., 2025, doi: 10.1109/TDSC.2025.3567957.

[13] J. Jasz, “The Effectiveness of Hidden Dependence Metrics in Bug Prediction,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3406929.

[14] S. A. Alsaedi, A. Y. Noaman, A. A. A. Gad-Elrab, and F. E. Eassa, “Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learning Model,” IEEE Access, 2023, doi: 10.1109/ACCESS.2023.3288156.

[15] Z. Hou, L. Gong, M. Yang, Y. Zhang, and S. Yang, “Software Bug Prediction based on Complex Network Considering Control Flow,” in 2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C), IEEE, Dec. 2022, pp. 246–254. doi: 10.1109/QRS-C57518.2022.00044.

[16] R. B. Bahaweres, F. Agustian, I. Hermadi, A. I. Suroso, and Y. Arkeman, “Software defect prediction using neural network based smote,” in International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), 2020. doi: 10.23919/EECSI50503.2020.9251874.

[17] K. S. Bharath and P. Jagadeesh, “An Innovative Software Bug Prediction System using Random Forest Algorithm for Enhanced Accuracy in Comparison with Logistic Regression Algorithm,” in 2023 Intelligent Computing and Control for Engineering and Business Systems, ICCEBS 2023, 2023. doi: 10.1109/ICCEBS58601.2023.10449266.

[18] A. Ali, Y. Xia, Q. Umer, and M. Osman, “BERT based severity prediction of bug reports for the maintenance of mobile applications,” J. Syst. Softw., 2024, doi: 10.1016/j.jss.2023.111898.

[19] M. Jumare, H. and Darius, and T. Chinyio, “Software Defect Prediction Using Machine Learning and Deep Learning Techniques,” Kasu J. Comput. Sci., vol. 1, no. 3, pp. 527–543, 2024, doi: 10.47514/kjcs/2024.1.3.0010.

[20] N. A. A. Khleel and K. Nehéz, “A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method,” J. Intell. Inf. Syst., 2023, doi: 10.1007/s10844-023-00793-1.

[21] S. M. H. Kabir, M. T. Rahman, and A. H. Mridul, “Software Defect Prediction Using Traditional Machine Learning and Ensemble Learning Algorithms,” Smart Wearable Technol., no. May, pp. 1–16, 2025, doi: 10.47852/bonviewswt52025645.

[22] S. Haldar and L. F. Capretz, “Interpretable Software Defect Prediction from Project Effort and Static Code Metrics,” Computers, vol. 13, no. 2, p. 52, Feb. 2024, doi: 10.3390/computers13020052.

[23] B. Arasteh, S. S. Sefati, E. C. Popovici, I. F. Ince, and F. Kiani, “A Bedbug Optimization-Based Machine Learning Framework for Software Fault Prediction,” Mathematics, 2025, doi: 10.3390/math13213531.

A Large Language Model Framework for Early Software Bug Prediction in Software Engineering

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

callforpaper

Submission

Menu

Latest publications

Information

Reach US

Ethics and Policies

Important Links

Downloads & Indexing

Similar Articles

Enterprise Risk Intelligence: Machine Learning Models for Predicting Compliance, Fraud, and Operational Failures

Accelerating Defect and Vulnerability Discovery with ML + HPC: High-Throughput Simulation Analytics for Software Quality Engineering

Evaluating the Efficacy of Machine Learning Algorithms in Credit Card Limit Optimization and Customer Segmentation

Hybrid Deep Learning Approach for Early Detection of Railway Track Faults to Enhance Railway Safety

Agile Software Development in AI-Driven Applications: Challenges and Strategies

AI-Augmented Software Engineering: Automated Code Generation and Optimization Using Large Language Models

Churn Prediction Through Content Interaction Pattern Analysis: A Machine Learning Approach for Digital Service Providers

Anomaly Identification in IoT-Networks Using Artificial Intelligence-Based Data-Driven Techniques in Cloud Environmen

AI-Enhanced API Reliability Testing for Digital Banking: Improving Accuracy, Resilience, and Integrity in Financial Transaction Processing

AI-Driven Decision Intelligence for Agile Software Lifecycle Governance: An Architecture-Centered Framework Integrating Machine Learning Defect Prediction and Automated Testing