Large Language Models for Automated Financial Code Generation and Documentation in Data Pipelines

Authors

  • Nihari Paladugu Independent Financial Technology Researcher, Columbus, OH, USA. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V4I1P113

Keywords:

Large Language Models, financial code generation, data pipelines, regulatory compliance, automated documentation

Abstract

The growing complexity of financial data pipelines creates significant challenges for code development and regulatory compliance documentation. This paper presents a simulation-based evaluation of Large Language Models (LLMs) for automated financial code generation and documentation. Using controlled simulation environments, we evaluate the feasibility of applying LLMs to generate financial data processing code while maintaining regulatory compliance requirements. Our simulation framework employs synthetic financial datasets, standardized code generation benchmarks, and automated compliance validation to assess the potential of LLM-based approaches. We implemented a controlled testing environment using GPT-4 based models fine-tuned on publicly available financial code repositories and regulatory documentation. The simulation study evaluates code generation across multiple financial computing scenarios including risk calculations, regulatory reporting, and market data processing. Our controlled experiments using synthetic datasets demonstrate promising results with 87.3% functional accuracy in code generation tasks, 91.2% compliance with regulatory code patterns, and significant potential for development efficiency improvements. The simulation successfully generated 12,847 code components across various financial computing scenarios, providing insights into the feasibility and limitations of LLM-based financial code automation. This work provides a foundation for understanding the potential and challenges of applying modern AI techniques to financial software development through rigorous simulation-based evaluation

Downloads

Download data is not yet available.

References

[1] J. Hendler and T. Berners-Lee, "From the semantic web to social machines: A research challenge for AI on the world wide web," Artificial intelligence, vol. 174, no. 2, pp. 156-161, 2010.

[2] S. Raschka and V. Mirjalili, "Python machine learning: Machine learning and deep learning with Python," scikit-learn, and TensorFlow, vol. 3, 2019.

[3] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.

[4] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015.

[5] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.

[6] J. Devlin et al., "BERT: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.

[7] A. Radford et al., "Language models are unsupervised multitask learners," OpenAI blog, vol. 1, no. 8, p. 9, 2019.

[8] T. Brown et al., "Language models are few-shot learners," Advances in neural information processing systems, vol. 33, pp. 1877-1901, 2020.

[9] M. Chen et al., "Evaluating large language models trained on code," arXiv preprint arXiv:2107.03374, 2021.

[10] J. Austin et al., "Program synthesis with large language models," arXiv preprint arXiv:2108.07732, 2021.

[11] Y. Li et al., "Competition-level code generation with alphacode," Science, vol. 378, no. 6624, pp. 1092-1097, 2022.

[12] E. Nijkamp et al., "CodeGen: An open large language model for code with multi-turn program synthesis," arXiv preprint arXiv:2203.13474, 2022.

[13] D. Fried et al., "InCoder: A generative model for code infilling and synthesis," arXiv preprint arXiv:2204.05999, 2022.

[14] R. Agashe et al., "JuICe: A large scale distantly supervised dataset for open domain context-based code generation," arXiv preprint arXiv:1910.02216, 2019.

[15] E. Shin et al., "Program synthesis using natural language," Proceedings of the 40th International Conference on Software Engineering, pp. 627-637, 2018.

[16] T. Wang et al., "Automated generation of API usage code through mining and statistical learning," Empirical Software Engineering, vol. 24, no. 6, pp. 3937-3980, 2019.

[17] C. Bartolini et al., "Automated compliance checking in business processes," Business Process Management, pp. 106-120, 2012.

[18] S. Sadiq et al., "Compliance checking between business processes and business contracts," 2007 10th IEEE International Conference on Enterprise Computing, pp. 221-232, 2007.

[19] G. Governatori and A. Rotolo, "Logic of violations: A Gentzen system for reasoning with contrary-to-duty obligations," Australasian Journal of Philosophy, vol. 84, no. 2, pp. 193-215, 2006.

[20] D. Knuplesch et al., "On enabling compliance of cross-organizational business processes," International Conference on Business Process Management, pp. 146-154, 2010.

[21] G. Sridhara et al., "Towards automatically generating summary comments for Java methods," Proceedings of the IEEE/ACM international conference on Automated software engineering, pp. 43-52, 2010.

[22] P. W. McBurney and C. McMillan, "Automatic documentation generation via source code summarization of method context," Proceedings of the 22nd International Conference on Program Comprehension, pp. 279-290, 2014.

[23] L. Moreno et al., "Automatic generation of natural language summaries for Java classes," 2013 21st International Conference on Program Comprehension, pp. 23-32, 2013.

[24] X. Hu et al., "Deep code comment generation," Proceedings of the 26th Conference on Program Comprehension, pp. 200-210, 2018.

Published

2023-03-30

Issue

Section

Articles

How to Cite

1.
Paladugu N. Large Language Models for Automated Financial Code Generation and Documentation in Data Pipelines. IJETCSIT [Internet]. 2023 Mar. 30 [cited 2025 Sep. 25];4(1):112-23. Available from: https://www.ijetcsit.org/index.php/ijetcsit/article/view/372

Similar Articles

11-20 of 259

You may also start an advanced similarity search for this article.