Query Optimization Using Machine Learning

Authors

  • Nagireddy Karri Senior IT Administrator Database, Sherwin-Williams, USA. Author
  • Partha Sarathi Reddy Pedda Muntala Software Developer at Cisco Systems, Inc, USA. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V4I4P112

Keywords:

Query optimization, machine learning, neural cost models, reinforcement learning, join ordering, cost-based optimization, tail latency

Abstract

Traditional database optimizers rely on hand-crafted heuristics and cost models that assume predicate independence, uniform distributions, and stable runtime conditions. These assumptions are generally flawed on contemporary, heterogeneous workloads deep join trees, correlated attributes, UDFs, and elastic cloud resources that give rise to cascading cardinality errors, ineffective plan decisions, and inflated tail latencies. Machine learning (ML) offers it as an alternative based on the data. Report on and synthesize ML methods that enhance three optimizer layers: (i) predictive (learned cardinality estimators, neural cost models, plan-time latency predictors), (ii) decision (reinforcement learning to join ordering, operator selection, and knob tuning), and (iii) control (bandits to online adaptation, uncertainty-sensitive pruning, and rollback guardrails) in this paper. Present feature representations of SQL/ASTs, logical/physical plan DAGS, and operator-level sketches; contrast offline training on past logs with online continual learning; and study robustness to distribution shift and drift. Analytical and mixed workloads Empirical data using both analytical and mixed workloads indicate steady double-digit decrees in end-to-end execution time and SLO violations with low overheads in optimization. Another set of engineering concerns find are cold start, query log privacy, and interaction with concurrency control, and integration patterns of reproducibility and outline covering advisory scoring to end-to-end learned optimizers. The paper is finalized with a research agenda that is dedicated to uncertainty-calibrated planning, cross-database transfer, explainability, and cross-tail and standardized benchmarks

Downloads

Download data is not yet available.

References

[1] Ortiz, J., Balazinska, M., Gehrke, J., & Keerthi, S. S. (2018, June). Learning state representations for query optimization with deep reinforcement learning. In Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning (pp. 1-4).

[2] Marcus, R., & Papaemmanouil, O. (2018). Towards a hands-free query optimizer through deep learning. arXiv preprint arXiv:1809.10212.

[3] Heitz, J., & Stockinger, K. (2019). Join query optimization with deep reinforcement learning algorithms. arXiv preprint arXiv:1911.11689.

[4] Tekale, K. M., & Rahul, N. (2022). AI and Predictive Analytics in Underwriting, 2022 Advancements in Machine Learning for Loss Prediction and Customer Segmentation. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 3(1), 95-113. https://doi.org/10.63282/3050-9262.IJAIDSML-V3I1P111

[5] Issah, I., Appiah, O., Appiahene, P., & Inusah, F. (2023). A systematic review of the literature on machine learning application of determining the attributes influencing academic performance. Decision analytics journal, 7, 100204.

[6] Unuriode, A., Durojaiye, O., Yusuf, B., & Okunade, L. (2023). The integration of artificial i intelligence into d database systems (ai-db integration review). Available at SSRN 4744549.

[7] Samuel Sorial, Query Optimization, 2023. online. https://samuel-sorial.hashnode.dev/query-optimization

[8] Thirupurasundari, D. R., Kumar, R., Palani, H. K., Ilangovan, S., & Senthilvel, P. G. (2023, November). Optimizing query performance in big data systems using machine learning algorithms. In 2023 International Conference on Communication, Security and Artificial Intelligence (ICCSAI) (pp. 891-895). IEEE.

[9] Yang, Z. (2022). Machine learning for query optimization (Doctoral dissertation, University of California, Berkeley).

[10] Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J., & Stoica, I. (2018). Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196.

[11] Vaidya, K., Dutt, A., Narasayya, V., & Chaudhuri, S. (2021). Leveraging query logs and machine learning for parametric query optimization. Proceedings of the VLDB Endowment, 15(3), 401-413.

[12] Fankhauser, T., Solèr, M. E., Füchslin, R. M., & Stockinger, K. (2021). Multiple query optimization using a hybrid approach of classical and quantum computing. arXiv preprint arXiv:2107.10508.

[13] Tekale, K. M. (2022). Claims Optimization in a High-Inflation Environment Provide Frameworks for Leveraging Automation and Predictive Analytics to Reduce Claims Leakage and Accelerate Settlements. International Journal of Emerging Research in Engineering and Technology, 3(2), 110-122. https://doi.org/10.63282/3050-922X.IJERET-V3I2P112

[14] Ammar, A. B. (2016). Query optimization techniques in graph Databases. arXiv preprint arXiv:1609.01893.

[15] Schüle, M. E., Bungeroth, M., Kemper, A., Günnemann, S., & Neumann, T. (2019, June). Mlearn: A declarative machine learning language for database systems. In Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning (pp. 1-4).

[16] Van Aken, D., Yang, D., Brillard, S., Fiorino, A., Zhang, B., Bilien, C., & Pavlo, A. (2021). An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems. Proceedings of the VLDB Endowment, 14(7), 1241-1253.

[17] Kougka, G., Gounaris, A., & Tsichlas, K. (2015). Practical algorithms for execution engine selection in data flows. Future Generation Computer Systems, 45, 133-148.

[18] Bataineh, M., & Marler, T. (2017). Neural network for regression problems with reduced training sets. Neural networks, 95, 1-9.

[19] Farahmand, A. M., & Szepesvári, C. (2011). Model selection in reinforcement learning. Machine learning, 85(3), 299-332.

[20] Tekale, K. M. T., & Enjam, G. reddy . (2022). The Evolving Landscape of Cyber Risk Coverage in P&C Policies. International Journal of Emerging Trends in Computer Science and Information Technology, 3(3), 117-126. https://doi.org/10.63282/3050-9246.IJETCSIT-V3I1P113

[21] Geihs, K., Barone, P., Eliassen, F., Floch, J., Fricke, R., Gjorven, E., ... & Stav, E. (2009). A comprehensive solution for application‐level adaptation. Software: Practice and Experience, 39(4), 385-422.

[22] Choi, D., Shallue, C. J., Nado, Z., Lee, J., Maddison, C. J., & Dahl, G. E. (2019). On empirical comparisons of optimizers for deep learning. arXiv preprint arXiv:1910.05446.

[23] Ma, Y., Shen, Y., Yu, X., Zhang, J., Song, S. H., & Letaief, K. B. (2022). Learn to communicate with neural calibration: Scalability and generalization. IEEE Transactions on Wireless Communications, 21(11), 9947-9961.

Published

2023-12-30

Issue

Section

Articles

How to Cite

1.
Karri N, Pedda Muntala PSR. Query Optimization Using Machine Learning. IJETCSIT [Internet]. 2023 Dec. 30 [cited 2025 Oct. 27];4(4):109-17. Available from: https://www.ijetcsit.org/index.php/ijetcsit/article/view/411

Similar Articles

1-10 of 277

You may also start an advanced similarity search for this article.