From Data Lakes to Visual Narratives: Harnessing Data Pipelines for Impactful Insights

Authors

  • Lalmohan Behera Senior IEEE member and IETE Membership. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V4I1P106

Keywords:

Data Pipelines, Data Lakes, Data Visualization, Machine Learning, Artificial Intelligence, Big Data, Business Intelligence

Abstract

The utilization of data-based technology has grown sharply within the last few years, establishing a high information production rate. However, it is not a problem of space, which many of us have and will continue to grapple with; it is the problem of identifying information within this data. Data pipelines can be defined as the set of processes through which data is collected, processed and sometimes even analyzed and then visualized. This paper aims to develop a story from the progression of data lakes to the presentation of visual narratives, with the data pipeline in-between as an intermediary in getting data intelligence. It covers frameworks, approaches, and technologies to construct effective data pipelines. The paper also includes examples of data analysis in real-world scenarios, possible issues that should be considered, and recommendations for achieving business value. These proofs show how contextual informative metadata increases healthcare, finance, and smart city decision-making. The paper also acknowledges the need to apply ML and AI to enhance the automation and streamlining of the data pipeline process. Thus, the findings gathered from this work conclude that effective usage of well-built data pipelines is strategically crucial for turning the raw data into visually engaging stories for decision-making purposes

Downloads

Download data is not yet available.

References

[1] Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.

[2] Grolinger, K., Higashino, W. A., Tiwari, A., & Capretz, M. A. (2013). Data management in cloud environments: NoSQL and NewSQL data stores. Journal of Cloud Computing: advances, systems and applications, 2, 1-24.

[3] Han, J., Pei, J., & Tong, H. (2022). Data mining: concepts and techniques. Morgan Kaufmann.

[4] An Introduction to Data Pipelines for Aspiring Data Professionals, Datacamp, 2023. online. https://www.datacamp.com/tutorial/introduction-to-data-pipelines-for-data-professionals

[5] Jagadish, H. V., Chapman, A., Elkins, A., Jayapandian, M., Li, Y., Nandi, A., & Yu, C. (2007, June). Making database systems usable. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (pp. 13-24).

[6] Kreps, J., Narkhede, N., & Rao, J. (2011, June). Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB (Vol. 11, No. 2011, pp. 1-7).

[7] Stonebraker, M., Madden, S., Abadi, D. J., Harizopoulos, S., Hachem, N., & Helland, P. (2018). The end of an architectural era: it's time for a complete rewrite. In Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker (pp. 463-489).

[8] White, T. (2012). Hadoop: The definitive guide. " O'Reilly Media, Inc.".

[9] Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media.

[10] Gorelik, A. (2019). The enterprise big data lake: Delivering the promise of big data and data science. O'Reilly Media.

[11] Post, F. H., Nielson, G., & Bonneau, G. P. (Eds.). (2002). Data visualization: The state of the art.

[12] Dong, X. L., & Rekatsinas, T. (2018, May). Data integration and machine learning: A natural synergy. In Proceedings of the 2018 International Conference on Management of Data (pp. 1645-1650).

[13] Introduction to Data Lakes, Databricks, online. https://www.databricks.com/discover/data-lakes

[14] Migliorini, M., Castellotti, R., Canali, L., & Zanetti, M. (2020). Machine learning pipelines with modern big data tools for high energy physics. Computing and Software for Big Science, 4(1), 8.

[15] Ramamoorthy, C. V., & Li, H. F. (1977). Pipeline architecture. ACM Computing Surveys (CSUR), 9(1), 61-102.

[16] Dehury, C., Jakovits, P., Srirama, S. N., Tountopoulos, V., & Giotis, G. (2020, September). Data pipeline architecture for the serverless platform. In European Conference on Software Architecture (pp. 241-246). Cham: Springer International Publishing.

[17] Munappy, A. R., Bosch, J., & Olsson, H. H. (2020). Data pipeline management in practice: Challenges and opportunities. In Product-Focused Software Process Improvement: 21st International Conference, PROFES 2020, Turin, Italy, November 25–27, 2020, Proceedings 21 (pp. 168-184). Springer International Publishing.

Published

2023-03-30

Issue

Section

Articles

How to Cite

1.
Behera L. From Data Lakes to Visual Narratives: Harnessing Data Pipelines for Impactful Insights. IJETCSIT [Internet]. 2023 Mar. 30 [cited 2025 Sep. 13];4(1):44-52. Available from: https://www.ijetcsit.org/index.php/ijetcsit/article/view/322

Similar Articles

21-30 of 232

You may also start an advanced similarity search for this article.