Automated Hyperparameter Optimization for Deep Neural Networks Using Bayesian Optimization and Genetic Algorithms
DOI:
https://doi.org/10.63282/3050-9246.IJETCSIT-V2I4P103Keywords:
Bayesian Optimization, Genetic Algorithm, Hyperparameter Optimization, Deep Neural Networks, Convergence Speed, Computational Cost, Validation Accuracy, Machine Learning, Neural Architecture Search, Optimization TechniquesAbstract
Deep Neural Networks (DNNs) have achieved remarkable success in various domains, including computer vision, natural language processing, and reinforcement learning. However, the performance of these models is highly dependent on the choice of hyperparameters, which are often set manually through trial and error. This process is time-consuming, resourceintensive, and requires significant expertise. To address this challenge, this paper explores the use of automated hyperparameter optimization (HPO) techniques, specifically Bayesian Optimization (BO) and Genetic Algorithms (GA), to improve the efficiency and effectiveness of hyperparameter tuning for DNNs. We provide a comprehensive review of the theoretical foundations of BO and GA, discuss their implementation in the context of DNNs, and evaluate their performance on a variety of benchmark datasets. Our results demonstrate that both BO and GA can significantly enhance the performance of DNNs, with BO generally outperforming GA in terms of convergence speed and final model performance. We also discuss the limitations and potential future directions for research in this area
Downloads
References
[1] Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems, 24, 2546–2554.
[2] Cho, H., Kim, Y., Lee, E., Choi, D., Lee, Y., & Rhee, W. (2019). DEEP-BO for hyperparameter optimization of deep networks. arXiv preprint arXiv:1905.09680. https://arxiv.org/abs/1905.09680
[3] Frazier, P. I. (2018). A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811. https://arxiv.org/abs/1807.02811
[4] Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization (pp. 507–523). Springer.
[5] Jamieson, K., & Talwalkar, A. (2016). Non-stochastic best arm identification and hyperparameter optimization. In Artificial Intelligence and Statistics (pp. 240–248). PMLR.
[6] Kandasamy, K., Schneider, J., & Póczos, B. (2015). High dimensional Bayesian optimisation and bandits via additive models. In International Conference on Machine Learning (pp. 295–304). PMLR.
[7] Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2017). Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18(1), 6765–6816.
[8] Li, C., Jiang, J., Zhao, Y., Li, R., Wang, E., Zhang, X., & Zhao, K. (2021). Genetic algorithm-based hyper-parameters optimization for transfer convolutional neural network. arXiv preprint arXiv:2103.03875. https://arxiv.org/abs/2103.03875
[9] Loshchilov, I., & Hutter, F. (2016). CMA-ES for hyperparameter optimization of deep neural networks. arXiv preprint arXiv:1604.07269. https://arxiv.org/abs/1604.07269
[10] Maclaurin, D., Duvenaud, D., & Adams, R. (2015). Gradient-based hyperparameter optimization through reversible learning. In International Conference on Machine Learning (pp. 2113–2122). PMLR.
[11] Mendoza, H., Klein, A., Feurer, M., Springenberg, J. T., & Hutter, F. (2016). Towards automatically-tuned neural networks. In Automated Machine Learning (pp. 141–156). Springer.
[12] Mockus, J. (2012). Bayesian approach to global optimization: Theory and applications. Springer Science & Business Media.
[13] Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 25, 2951–2959.
[14] Wu, J., Toscano-Palmerin, S., Frazier, P. I., & Wilson, A. G. (2019). Practical multi-fidelity Bayesian optimization for hyperparameter tuning. arXiv preprint arXiv:1903.04703. https://arxiv.org/abs/1903.04703
[15] Yao, Z., Wang, M., Kwok, J. T., & Ni, L. M. (2020). Efficient neural architecture search via proximal iterations. In AAAI Conference on Artificial Intelligence (pp. 6665–6672).
[16] Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3), 55–75.
[17] Yu, T., & Zhu, H. (2020). Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689. https://arxiv.org/abs/2003.05689
[18] Zhang, Z., & Zhang, J. (2019). Deep neural network hyperparameter optimization with orthogonal array tuning. arXiv preprint arXiv:1901.06824. https://arxiv.org/abs/1901.06824
[19] Zoph, B., & Le, Q. V. (2017). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. https://arxiv.org/abs/1611.01578
[20] Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8697–8710).