Architecting for Resilience: Designing Fault-Tolerant Systems in Multi-Cloud Environments
DOI:
https://doi.org/10.63282/3050-9246.IJETCSIT-V5I2P112Keywords:
Multi-Cloud, Resilience, Fault Tolerance, High Availability, Redundancy, Cloud Architecture, Failover Systems, Disaster Recovery, Service Level Agreements (SLAs), Load Balancing, Auto-scaling, System ReliabilityAbstract
System resilience is not optional in the fast changing digital terrain; it is rather than necessary. The shift to multi-cloud environments is changing the resilience & fault tolerance strategies as businesses rely more on their cloud infrastructure for basic operations. The requirement of fault-tolerant design in preserving system operation despite unanticipated interruptions such as software failures, hardware breakdowns, or regional outages is investigated in this article. Although multi-cloud architecture provides unmatched flexibility & redundancy, it also greatly complicates orchestration, interoperability & consistent policy implementation. The goal is to clarify the ideas of building strong systems on many cloud platforms by offering realistic best practices & methods transcending theoretical models. We investigate the basic elements allowing systems to recover that is, to stay resilient—that include distributed data replication, automated failover techniques, observability & proactive monitoring. Using case studies from businesses that have deftly solved these challenges, the article clarifies actual world concerns such as vendor lock-in, latency management & service compatibility. These findings not only support the recommended approaches but also show the specific benefits of building for resilience that is, improved uptime, more user trust & regulatory standards conformance. Ultimately, in a multi-cloud system, fault tolerance planning calls for a whole approach combining dynamic automation, careful design & continuous testing. It is about blossoming despite obstacles, not just about overcoming them. For builders, engineers, and decision-makers trying to create systems that are both robust & also flexible within an unpredictable cloud environment, this article functions as a realistic road map
Downloads
References
[1] Neto, Jose Pergentino Araujo, Donald M. Pianto, and Célia Ghedini Ralha. "MULTS: A multi-cloud fault-tolerant architecture to manage transient servers in cloud computing." Journal of Systems Architecture 101 (2019): 101651.
[2] Tadi, S. R. C. C. T. "Architecting Resilient Cloud-Native APIs: Autonomous Fault Recovery in Event-Driven Microservices Ecosystems." Journal of Scientific and Engineering Research 9.3 (2022): 293-305.
[3] Thumala, Srinivasarao. "Building Highly Resilient Architectures in the Cloud." Nanotechnology Perceptions 16.2 (2020).
[4] Atluri, Anusha. “Oracle HCM Extensibility: Architectural Patterns for Custom API Development”. International Journal of Emerging Trends in Computer Science and Information Technology, vol. 5, no. 1, Mar. 2024, pp. 21-30
[5] Verissimo, Paulo, Alysson Bessani, and Marcelo Pasin. "The TClouds architecture: Open and resilient cloud-of-clouds computing." IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012). IEEE, 2012.
[6] Gupta, Punit, and Pradeep Kumar Gupta. Trust & fault in multi layered cloud computing architecture. Cham: Springer, 2020.
[7] Paidy, Pavan. “AI-Augmented SAST and DAST Integration in CI CD Pipelines”. Los Angeles Journal of Intelligent Systems and Pattern Recognition, vol. 2, Feb. 2022, pp. 246-72
[8] Vasanta Kumar Tarra. “Claims Processing & Fraud Detection With AI in Salesforce”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 11, no. 2, Oct. 2023, pp. 37–53
[9] Veluru, Sai Prasad, and Swetha Talakola. “Continuous Intelligence: Architecting Real-Time AI Systems With Flink and MLOps”. American Journal of Autonomous Systems and Robotics Engineering, vol. 3, Sept. 2023, pp. 215-42
[10] Dasari, Kalyan Krishna. "Cross-Cloud Continuity: A Scalable Framework for Resilient and Regulated Digital Infrastructure." (2023).
[11] Talakola, Swetha. “Enhancing Financial Decision Making With Data Driven Insights in Microsoft Power BI”. Essex Journal of AI Ethics and Responsible Innovation, vol. 4, Apr. 2024, pp. 329-3
[12] Chinamanagonda, Sandeep. "Focus on resilience engineering in cloud services." Academia Nexus Journal 2.1 (2023).
[13] Sangeeta Anand, and Sumeet Sharma. “Scalability of Snowflake Data Warehousing in Multi-State Medicaid Data Processing”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 12, no. 1, May 2024, pp. 67-82
[14] Yasodhara Varma. “Scalability and Performance Optimization in ML Training Pipelines”. American Journal of Autonomous Systems and Robotics Engineering, vol. 3, July 2023, pp. 116-43
[15] Welsh, Thomas, and Elhadj Benkhelifa. "On resilience in cloud computing: A survey of techniques across the cloud domain." ACM Computing Surveys (CSUR) 53.3 (2020): 1-36.
[16] Chaganti, Krishna Chaitanya. "AI-Powered Threat Detection: Enhancing Cybersecurity with Machine Learning." International Journal of Science And Engineering 9.4 (2023): 10-18.
[17] Ali Asghar Mehdi Syed. “Cost Optimization in AWS Infrastructure: Analyzing Best Practices for Enterprise Cost Reduction”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 9, no. 2, July 2021, pp. 31-46
[18] Tatineni, Sumanth. "Cloud-Based Reliability Engineering: Strategies for Ensuring High Availability and Performance." International Journal of Science and Research (IJSR) 12.11 (2023): 1005-1012.
[19] Sangeeta Anand, and Sumeet Sharma. “Scalability of Snowflake Data Warehousing in Multi-State Medicaid Data Processing”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 12, no. 1, May 2024, pp. 67-82
[20] Kambala, Gireesh. "Designing resilient enterprise applications in the cloud: Strategies and best practices." World Journal of Advanced Research and Reviews 17 (2023): 1078-1094.
[21] Atluri, Anusha, and Vijay Reddy. “Total Rewards Transformation: Exploring Oracle HCM’s Next-Level Compensation Modules”. International Journal of Emerging Research in Engineering and Technology, vol. 4, no. 1, Mar. 2023, pp. 45-53
[22] Paidy, Pavan. “Testing Modern APIs Using OWASP API Top 10”. Essex Journal of AI Ethics and Responsible Innovation, vol. 1, Nov. 2021, pp. 313-37
[23] Pentyala, Dillep Kumar. "AI-Driven Strategies for Ensuring Data Reliability in Multi-Cloud Ecosystems." International Journal of Modern Computing 4.1 (2021): 29-49.
[24] Sangaraju, Varun Varma. "AI-Augmented Test Automation: Leveraging Selenium, Cucumber, and Cypress for Scalable Testing." International Journal of Science And Engineering 7 (2021): 59-68
[25] Kupunarapu, Sujith Kumar. "Data Fusion and Real-Time Analytics: Elevating Signal Integrity and Rail System Resilience." International Journal of Science And Engineering 9.1 (2023): 53-61.
26. . de Araújo Neto, José Pergentino, Donald M. Pianto, and Célia Ghedini Ralha. "MULTS: A Multi-cloud Fault-tolerant Architecture to Manage Transient Servers in Cloud Computing." (2019).
[26] Talakola, Swetha. “Microsoft Power BI Performance Optimization for Finance Applications”. American Journal of Autonomous Systems and Robotics Engineering, vol. 3, June 2023, pp. 192-14
[27] Syed, Ali Asghar Mehdi, and Shujat Ali. “Linux Container Security: Evaluating Security Measures for Linux Containers in DevOps Workflows”. American Journal of Autonomous Systems and Robotics Engineering, vol. 2, Dec. 2022, pp. 352-75
[28] Rybka, Andrey. Fault Tolerant, Self-Healing and Vendor Neutral Multi-Cloud Patterns and Framework Focusing on Deployment and Management. Diss. Pace University, 2017.
[29] Veluru, Sai Prasad. “Flink-Powered Feature Engineering: Optimizing Data Pipelines for Real-Time AI”. American Journal of Data Science and Artificial Intelligence Innovations, vol. 1, Nov. 2021, pp. 512-33
[30] Tarra, Vasanta Kumar, and Arun Kumar Mittapelly. “Sentiment Analysis in Customer Interactions: Using AI-Powered Sentiment Analysis in Salesforce Service Cloud to Improve Customer Satisfaction”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 4, no. 3, Oct. 2023, pp. 31-40
[31] Chaudhari, Bhushan, Satish Kabade, and Akshay Sharma. "AI-Driven Cloud Services for Guaranteed Disaster Recovery, Improved Fault Tolerance, and Transparent High Availability in Dynamic Cloud Systems." (2023).
[32] Goundar, Sam, and Akashdeep Bhardwaj. "Efficient fault tolerance on cloud environments." Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing. IGI Global, 2021. 1231-1243.
[33] Kiran Nittur, Srinivas Chippagiri, Mikhail Zhidko, “Evolving Web Application Development Frameworks: A Survey of Ruby on Rails, Python, and Cloud-Based Architectures”, International Journal of New Media Studies (IJNMS), 7 (1), 28-34, 2020.