AFP: An SLA-Aware Adaptive Freshness Protocol for Log Collection in Large-Scale Geographically Distributed Systems

Milan Gupta

doi:10.63282/3050-9246.IJETCSIT-V6I1P119

Authors

Milan Gupta Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V6I1P119

Keywords:

Adaptive Freshness, SLA-Aware Log Collection, Geo-Distributed Systems, Per-Stream Optimization, Shared Logs, Graceful Degradation

Abstract

Anyone who has operated a large-scale distributed system across multiple datacenters knows the frustration: log data piles up at staggering rates, and getting it from point of origin to a place where someone can query it is a constant exercise in tradeoffs. Today’s log pipelines handle this with a one-size-fits-all freshness model. If even a single consumer needs sub-second access to a log stream, the whole pipeline for that stream runs at full tilt synchronous replication, eager ordering, and immediate indexing regardless of whether the other 95% of downstream consumers would have been perfectly happy waiting thirty seconds. The waste adds up fast.

This paper introduces AFP (Adaptive Freshness Protocol), a protocol that rethinks this assumption. AFP lets each consumer declare its own freshness SLA, and then dynamically tunes the pipeline replication mode, ordering strategy, indexing priority on a per-stream basis to meet the tightest active SLA at the lowest possible cost. When the most demanding consumer disconnects, the pipeline relaxes on its own. We formalize the problem as a constrained optimization over composable stage-latency functions, show it can be solved greedily in O(S log S) time per scheduling epoch, and introduce a degradation policy for WAN partitions that prioritizes critical streams while guaranteeing zero data loss. Under a workload mix we believe is representative of production environments (5% critical, 20% interactive, 75% batch consumers), our analysis indicates AFP cuts cross-region bandwidth by 58%, indexing CPU by 49%, and ordering overhead by 66% compared to uniform provisioning, all while keeping 99.7% of SLA contracts satisfied.

Downloads

Download data is not yet available.

References

[1] J. Kreps, N. Narkhede, and J. Rao, "Kafka: A Distributed Messaging System for Log Processing," in Proc. NetDB Workshop, Athens, Greece, 2011.

[2] M. Balakrishnan, D. Malkhi, V. Prabhakaran, T. Wobber, M. Wei, and J. D. Davis, "CORFU: A Shared Log Design for Flash Clusters," in Proc. 9th USENIX NSDI, San Jose, CA, 2012, pp. 1–14.

[3] C. Ding, D. Chu, E. Zhao, X. Li, L. Alvisi, and R. van Renesse, "Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log," in Proc. 17th USENIX NSDI, 2020.

[4] X. Luo et al., "LazyLog: A New Shared Log Abstraction for Low-Latency Applications," in Proc. ACM SOSP, 2024.

[5] G. Wang et al., "Building a Replicated Logging System with Apache Kafka," Proc. VLDB Endowment, vol. 8, no. 12, 2015.

[6] Confluent, Inc., "Multi-Geo Replication in Apache Kafka," 2023. [Online]. Available: https://www.confluent.io/blog/multi-geo-replication-in-apache-kafka/

[7] J. Lockerman et al., "The FuzzyLog: A Partially Ordered Shared Log," in Proc. USENIX OSDI, 2018.

[8] F. Nawab, V. Arora, D. Agrawal, and A. El Abbadi, "Chariots: A Scalable Shared Log for Data Management in Multi-Datacenter Cloud Environments," in Proc. 18th EDBT, Brussels, 2015, pp. 13–24.

[9] J. Cipar et al., "LazyBase: Trading Freshness for Performance in a Scalable Database," in Proc. 7th ACM EuroSys, Bern, 2012.

[10] S. Bhatt et al., "DistributedLog: A High Performance Replicated Log Service," Twitter Engineering, 2016.

[11] L. Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System," Commun. ACM, vol. 21, no. 7, pp. 558–565, 1978.

[12] K. Goodhope et al., "Building LinkedIn's Real-time Activity Data Pipeline," IEEE Data Eng. Bull., vol. 35, no. 2, 2012.

[13] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, "ZooKeeper: Wait-free Coordination for Internet-scale Systems," in Proc. USENIX ATC, 2010.

[14] J. C. Corbett et al., "Spanner: Google's Globally-Distributed Database," in Proc. USENIX OSDI, 2012, pp. 251–264.

[15] Z. Jia and E. Witchel, "Boki: Stateful Serverless Computing with Shared Logs," in Proc. ACM SOSP, 2021.

[16] M. Balakrishnan et al., "Tango: Distributed Data Structures over a Shared Log," in Proc. ACM SOSP, 2013.

[17] M. Kleppmann and J. Kreps, "Kafka, Samza and the Unix Philosophy of Distributed Data," IEEE Data Eng. Bull., vol. 38, no. 4, 2015.

[18] D. Ongaro and J. K. Ousterhout, "In Search of an Understandable Consensus Algorithm," in Proc. USENIX ATC, 2014.

[19] M. Balakrishnan et al., "Taming Consensus in the Wild (with the Shared Log Abstraction)," ACM SIGOPS OSR, 2024.

[20] R. C. Fernandez et al., "Liquid: Unifying Nearline and Offline Big Data Integration," in Proc. CIDR, 2015.

[21] Sakariya, A. B. (2023). Trends in the Rubber Industry: A Comparative Study of Asia and Europe. European Economic Letters ISSN 2323-5233 http://eelet.org.uk, 13(4), 1342-1349.

AFP: An SLA-Aware Adaptive Freshness Protocol for Log Collection in Large-Scale Geographically Distributed Systems

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

callforpaper

Submission

Menu

Latest publications

Information

Reach US

Ethics and Policies

Important Links

Downloads & Indexing

Similar Articles

Data and Analytics Workflows for Decision Systems Enabled by Learning-Based RAN Intelligence across Distributed Computing Environments

Intelligent Indexing Based on Usage Patterns and Query Frequency

Resource Scheduling Using AI in Cloud Environments

Enterprise and RAN-Aware Data and Analytics Platforms for Mission-Critical and Low-Latency Digital Services

Proactive AI Systems: Engineering Intelligent Platforms that Sense, Predict, and Act

Decision-Centric Architectures for Intelligent and Networked Wireless Computing Environments Operating at Scale and Uncertainty

HoloSearchAI: AI-Driven Latency Optimization Framework for Distributed Search Systems

Advanced Data Science Frameworks for Predictive Cyber-Risk Assessment and Adaptive Security Policy Optimization in Zero Trust Networks

AI for Microservice Monitoring & Anomaly Detection

Mitigating Algorithmic Complexity Attacks in Federated GraphQL Architectures: A Depth-Bounded Semantic Rate Limiting Approach for Open Banking