Handling Class Imbalance in SMS Spam Datasets Using Advanced Sampling Techniques

Authors

  • Vempalli Mopuru Rakesh Reddy Systems Engineer, Tata Consultancy Services. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V7I1P108

Keywords:

Sms Spam Detection, Class Imbalance, Imbalanced Datasets, Sampling Techniques, Smote, Adasyn, Ensemble Resampling, Text Classification, Machine Learning, Spam Filtering, Predictive Accuracy, False Positive Reduction

Abstract

The prevalence of SMS spam poses significant challenges for automated messaging systems, and effective detection is often hindered by the inherent class imbalance in SMS datasets, where legitimate messages vastly outnumber spam messages. This study investigates the impact of advanced sampling techniques on improving classification performance in imbalanced SMS datasets. Techniques such as Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), and ensemble-based resampling methods are evaluated for their effectiveness in balancing the dataset and enhancing the predictive accuracy of machine learning classifiers. Experimental results demonstrate that applying these advanced sampling strategies significantly improves spam detection rates while reducing false positives. The findings provide valuable insights for developing robust SMS spam filters and highlight the importance of addressing class imbalance in real-world text classification problems.

Downloads

Download data is not yet available.

References

[1] Gangineni, V. N., Tyagadurgam, M. S. V., Pabbineedi, S., Penmetsa, M., Bhumireddy, J. R., & Chalasani, R. (2024). AI-Powered Cybersecurity Risk Scoring for Financial Institutions Using Machine Learning Techniques (Approved by ICITET 2024). Journal of Artificial Intelligence & Cloud Computing.

[2] Waditwar, P. (2024) The Intersection of Strategic Sourcing and Artificial Intelligence: A Paradigm Shift for Modern Organizations. Open Journal of Business and Management, 12, 4073-4085. doi: 10.4236/ojbm.2024.126204.

[3] Rajendran, D., Namburi, V. D., Tamilmani, V., Singh, A. A. S., Maniar, V., & Kothamaram, R. R. (2026). Middleware Architectures for Hybrid and Multi-cloud Environments: A Survey of Scalability and Security Approaches. Asian Journal of Research in Computer Science, 19(1), 106-120.

[4] Waditwar, P. (2026) De-Risking Returns: How AI Can Reinvent Big Tech’s China-Tied Reverse Supply Chains. Open Journal of Business and Management, 14, 104-124. doi: 10.4236/ojbm.2026.141007

[5] Maniar, V., Kothamaram, R. R., Rajendran, D., Namburi, V. D., Tamilmani, V., & Singh, A. A. S. (2025). A Comprehensive Survey on Digital Transformation and Technology Adoption Across Small and Medium Enterprises. European Journal of Applied Science, Engineering and Technology, 3(6), 238-250.

[6] Tamilmani, V., Maniar, V., Singh, A. A. S., Kothamaram, R. R., Rajendran, D., & Namburi, V. D. (2025). Automated Cloud Migration Pipelines: Trends, Tools, and Best Practices–A Survey. Journal of Computer Science and Technology Studies, 7(11), 121-134.

[7] Attipalli, A., Kendyala, R., Kurma, J., Mamidala, J. V., Bitkuri, V., & Enokkaren, S. J. (2025). Survey on Evolution of Java Web Technologies and Best Practices: from Servlets to Microservices. Asian Journal of Research in Computer Science, 18(11), 172-187.

[8] Mamidala, J. V., Bitkuri, V., Enokkaren, S. J., Attipalli, A., Kendyala, R., & Kurma, J. (2025). Explainable Machine Learning Models for Malware Identification in Modern Computing Systems. European Journal of Applied Science, Engineering and Technology, 3(5), 153-170.

[9] Waditwar, P. (2025) AI-Driven Smart Negotiation Assistant for Procurement—An Intelligent Chatbot for Contract Negotiation Based on Market Data and AI Algorithms. Journal of Data Analysis and Information Processing, 13, 140-155. doi: 10.4236/jdaip.2025.132009.

[10] Kendyala, R., Kurma, J., Mamidala, J. V., Enokkaren, S. J., Attipalli, A., & Bitkuri, V. (2025). Framework based on Machine Learning for Lung Cancer Prognosis with Big Data-Driven. European Journal of Technology, 9(1), 68-85.

[11] Gangineni, V. N., Penmetsa, M., Bhumireddy, J. R., Chalasani, R., Tyagadurgam, M. S. V., & Pabbineedi, S. (2025). Big Data and Predictive Analytics for Customer Retention: Exploring the Role of Machine Learning in E-Commerce. Available at SSRN 5478047.

[12] Kulkarni, P., Siddharth, T., Pillai, S., Pathak, P., Gangineni, V. N., & Yadav, V. (2025, June). Cybersecurity Threats and Vulnerabilities-A Growing Challenge in Connected Vehicles. In International Conference on Data Analytics & Management (pp. 466-476). Cham: Springer Nature Switzerland.

[13] Vanaparthi, N. R. (2025). Intelligent finance: How AI is reshaping the future of financial services. International Journal of Computer Engineering and Technology, 16(1), 126–137. https://doi.org/10.34218/IJCET_16_01_012

[14] Tyagadurgam, M. S. V., Gangineni, V. N., Pabbineedi, S., Kakani, A. B., Nandiraju, S. K. K., & Chundru, S. K. (2025). Preventing Phishing Attacks Using Advanced Deep Learning Techniques for Cyber Threat Mitigation.

[15] Penmetsa, M., Bhumireddy, J. R., Vangala, S. R., Polam, R. M., Kamarthapu, B., & Chalasani, R. (2025). Adversarial Machine Learning in Cybersecurity: A Review on Defending Against AI-Driven Attacks. Available at SSRN 5515383.

[16] Polam, R. M., Kamarthapu, B., Penmetsa, M., Bhumireddy, J. R., Chalasani, R., & Vangala, S. R. (2025). Advanced Machine Learning for Robust Botnet Attack Detection in Evolving Threat Landscapes. Available at SSRN 5515384.

[17] Kamarthapu, B., Penmetsa, M., Bhumireddy, J. R., Chalasani, R., Vangala, S. R., & Polam, R. M. (2025). Data-Driven Detection of Network Threats using Advanced Machine Learning Techniques for Cybersecurity. Available at SSRN 5515400.

[18] Penmetsa, M., Bhumireddy, J. R., Chalasani, R., Vangala, S. R., Polam, R. M., & Kamarthapu, B. (2025). Effectiveness of Deep Learning Algorithms in Phishing Attack Detection for Cybersecurity Frameworks. Available at SSRN 5515385.

[19] Nandiraju, S. K. K., Chundru, S. K., Vangala, S. R., Polam, R. M., Kamarthapu, B., & Kakani, A. B. (2025). Towards Early Forecast of Diabetes Mellitus via Machine Learning Systems in Healthcare. European Journal of Technology, 9(1), 35-50.

[20] Polam, R. M., Kamarthapu, B., Kakani, A. B., Nandiraju, S. K. K., Chundru, S. K., & Vangala, S. R. (2025). Predictive Modeling for Property Insurance Premium Estimation Using Machine Learning Algorithms. Available at SSRN 5515382.

[21] Nandiraju, S. K. K., & Chundru, S. K. Enhancing Cybersecurity: Zero-Day.

[22] Prajkta Waditwar. Agentic AI and sustainable procurement: Rethinking anti-corrosion strategies in oil and gas. World Journal of Advanced Research and Reviews, 2025, 27(03), 1591-1598. Article DOI: https://doi.org/10.30574/wjarr.2025.27.3.3298.

[23] Vadisetty, R., Polamarasetti, A., Varadarajan, V., Kalla, D., & Ramanathan, G. K. (2025, May). Cyber Warfare and AI Agents: Strengthening National Security Against Advanced Persistent Threats (APTs). In International Conference on Intelligence-Based Transformations of Technology and Business Trends (pp. 578-587). Cham: Springer Nature Switzerland.

[24] Chundru, S. K., Vikram, M. S., Naidu, V., Pabbineedi, S., Kakani, A. B., & Nandiraju, S. K. K. Analyzing and Predicting Anaemia with Advanced Machine Learning Techniques with Comparative Analysis.

[25] Polam, R. M., Kamarthapu, B., Penmetsa, M., Bhumireddy, J. R., Chalasani, R., & Vangala, S. R. (2025). Advanced Machine Learning for Robust Botnet Attack Detection in Evolving Threat Landscapes. Available at SSRN 5515384.

[26] Kamarthapu, B., Penmetsa, M., Bhumireddy, J. R., Chalasani, R., Vangala, S. R., & Polam, R. M. (2025). Data-Driven Detection of Network Threats using Advanced Machine Learning Techniques for Cybersecurity. Available at SSRN 5515400.

[27] Penmetsa, M., Bhumireddy, J. R., Chalasani, R., Vangala, S. R., Polam, R. M., & Kamarthapu, B. (2025). Effectiveness of Deep Learning Algorithms in Phishing Attack Detection for Cybersecurity Frameworks. Available at SSRN 5515385.

[28] Vanaparthi, N. R. (2025). Why digital transformation in fintech requires mainframe modernization: A cost benefit analysis. International Journal of Science and Research Archive, 14(1), 1052–1062. https://doi.org/10.30574/ijsra.2025.14.1.0161

[29] Kamarthapu, B., Penmetsa, M., Vangala, S. R., & Polam, R. M. (2025). Effectiveness of Deep Learning Algorithms in Phishing Attack Detection for Cybersecurity Frameworks. Available at SSRN 5571241.

[30] Kakani, A. B., Nandiraju, S. K. K., Chundru, S. K., Vangala, S. R., Polam, R. M., & Kamarthapu, B. (2025). Leveraging NLP and Sentiment Analysis for ML-Based Fake News Detection with Big Data. Available at SSRN 5515418.

[31] Gangineni, V. N., Penmetsa, M., Bhumireddy, J. R., Chalasani, R., & Tyagadurgam, M. SV, & Pabbineedi, S.(2025). Big Data and Predictive Analytics for Customer Retention: Exploring the Role of Machine Learning in E-Commerce.

[32] Prajkta Waditwar. Quantum-Enhanced Travel Procurement: Hybrid Quantum–Classical Optimization for Enterprise Travel Management. World Journal of Advanced Engineering Technology and Sciences, 2025, 17(03), 375-386. Article DOI: https://doi.org/10.30574/wjaets.2025.17.3.1572.

[33] Vanaparthi, N. R. (2025). Regulatory compliance in the digital age: How mainframe modernization can support financial institutions. International Journal of Research in Computer Applications and Information Technology, 8(1), 383–396. https://doi.org/10.34218/IJRCAIT_08_01_033

[34] Waditwar, P. (2025) AI-Driven Procurement in Ayurveda and Ayurvedic Medicines & Treatments. Open Journal of Business and Management, 13, 1854-1879. doi: 10.4236/ojbm.2025.133096

[35] Vanaparthi, N. R. (2025). The roadmap to mainframe modernization: Bridging legacy systems with the cloud. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 11(1), 125–133. https://doi.org/10.32628/CSEIT25111214

[36] Prabakar, D., Iskandarova, N., Iskandarova, N., Kalla, D., Kulimova, K., & Parmar, D. (2025, May). Dynamic Resource Allocation in Cloud Computing Environments Using Hybrid Swarm Intelligence Algorithms. In 2025 International Conference on Networks and Cryptology (NETCRYPT) (pp. 882-886). IEEE.

[37] Nagaraju, S., Johri, P., Putta, P., Kalla, D., Polvanov, S., & Patel, N. V. (2025, May). Smart Routing in Urban Wireless Ad Hoc Networks Using Graph Attention Network-Based Decision Models. In 2025 International Conference on Networks and Cryptology (NETCRYPT) (pp. 212-216). IEEE.

[38] Kalla, D., Mohammed, A. S., Boddapati, V. N., Jiwani, N., & Kiruthiga, T. (2024, November). Investigating the Impact of Heuristic Algorithms on Cyberthreat Detection. In 2024 2nd International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT) (Vol. 1, pp. 450-455). IEEE.

[39] Vadisetty, R., Polamarasetti, A., & Kalla, D. (2025, February). Automated AI-Driven Phishing Detection and Countermeasures for Zero-Day Phishing Attacks. In International Ethical Hacking Conference (pp. 285-303). Singapore: Springer Nature Singapore.

[40] Nagrath, P., Saini, I., Zeeshan, M., Komal, Komal, & Kalla, D. (2025, June). Predicting Mental Health Disorders with Variational Autoencoders. In International Conference on Data Analytics & Management (pp. 38-51). Cham: Springer Nature Switzerland.

[41] Oliveira, T., & Martins, M. F. (2011). Literature review of information technology adoption models at firm level. The Electronic Journal of Information Systems Evaluation, 14(1), 110–121.

[42] Rogers, E. M. (2003). Diffusion of innovations (5th ed.). Free Press.

[43] Schumacher, A., Erol, S., & Sihn, W. (2016). A maturity model for assessing Industry 4.0 readiness and maturity of manufacturing enterprises. Procedia CIRP, 52, 161–166.

[44] Tornatzky, L. G., & Fleischer, M. (1990). The processes of technological innovation. Lexington Books.

[45] Vial, G. (2019). Understanding digital transformation: A review and a research agenda. The Journal of Strategic Information Systems, 28(2), 118–144.

[46] Pol, N. U. R., Ghezzi, A., Balocco, R., & Rangone, A. (2023). Understanding SMEs digitalization: A literature review of maturity models. Proceedings of the European Conference on Innovation and Entrepreneurship. DOI:10.34190/ecie.18.2.1823

[47] Silva, M., Mamede, R., & Santos, P. (2024). A new proposed model to assess the digital organizational readiness to maximize the results of the digital transformation in SMEs. Journal of Innovation & Knowledge.

[48] (Silva, Mamede & Santos). (2024). EconStor. A new proposed model to assess the digital organizational readiness to maximize the results of the digital transformation in SMEs.

[49] Soomro, M. A., Hanafiah, M. H. B., & Abdullah, N. L. (2020). Digital readiness models: A systematic literature review. Journal/Conference Publication.

[50] Williams, C. A., Schallmo, D., Lang, K., & Boardman, L. (2019). Digital maturity models for small and medium-sized enterprises: A systematic literature review. ISPIM Innovation Conference Proceedings.

[51] (Author Unknown). (2024). Assessment of organizational readiness for digital transformation in SMEs. Procedia Computer Science, 204, 362–369.

[52] Various Authors. (2024). Toward SMEs digital transformation success: A systematic literature review. Information Systems and e-Business Management, 22, 667–719.

[53] Haryanti, et al. (2025). Sustainable digital transformation roadmaps for SMEs: A systematic literature review. Sustainability, 16(19), 8551.

[54] (If accessible) Egodawele, M., Sedera, D., & Bui, V. (2022). A systematic review of digital transformation literature (2013–2021) and the development of an overarching model to guide future research. ArXiv Preprint.

[55] Gonzalez‑Varona, J. M., Lopez‑Paredes, A., Poza, D., & Acebes, F. (2024).

Published

2026-01-23

Issue

Section

Articles

How to Cite

1.
Rakesh Reddy VM. Handling Class Imbalance in SMS Spam Datasets Using Advanced Sampling Techniques. IJETCSIT [Internet]. 2026 Jan. 23 [cited 2026 Feb. 8];7(1):57-62. Available from: https://www.ijetcsit.org/index.php/ijetcsit/article/view/557

Similar Articles

71-80 of 452

You may also start an advanced similarity search for this article.