AFP: An SLA-Aware Adaptive Freshness Protocol for Log Collection in Large-Scale Geographically Distributed Systems
DOI:
https://doi.org/10.63282/3050-9246.IJETCSIT-V6I1P119Keywords:
Adaptive Freshness, SLA-Aware Log Collection, Geo-Distributed Systems, Per-Stream Optimization, Shared Logs, Graceful DegradationAbstract
Anyone who has operated a large-scale distributed system across multiple datacenters knows the frustration: log data piles up at staggering rates, and getting it from point of origin to a place where someone can query it is a constant exercise in tradeoffs. Today’s log pipelines handle this with a one-size-fits-all freshness model. If even a single consumer needs sub-second access to a log stream, the whole pipeline for that stream runs at full tilt synchronous replication, eager ordering, and immediate indexing regardless of whether the other 95% of downstream consumers would have been perfectly happy waiting thirty seconds. The waste adds up fast.
This paper introduces AFP (Adaptive Freshness Protocol), a protocol that rethinks this assumption. AFP lets each consumer declare its own freshness SLA, and then dynamically tunes the pipeline replication mode, ordering strategy, indexing priority on a per-stream basis to meet the tightest active SLA at the lowest possible cost. When the most demanding consumer disconnects, the pipeline relaxes on its own. We formalize the problem as a constrained optimization over composable stage-latency functions, show it can be solved greedily in O(S log S) time per scheduling epoch, and introduce a degradation policy for WAN partitions that prioritizes critical streams while guaranteeing zero data loss. Under a workload mix we believe is representative of production environments (5% critical, 20% interactive, 75% batch consumers), our analysis indicates AFP cuts cross-region bandwidth by 58%, indexing CPU by 49%, and ordering overhead by 66% compared to uniform provisioning, all while keeping 99.7% of SLA contracts satisfied.
Downloads
References
[1] J. Kreps, N. Narkhede, and J. Rao, "Kafka: A Distributed Messaging System for Log Processing," in Proc. NetDB Workshop, Athens, Greece, 2011.
[2] M. Balakrishnan, D. Malkhi, V. Prabhakaran, T. Wobber, M. Wei, and J. D. Davis, "CORFU: A Shared Log Design for Flash Clusters," in Proc. 9th USENIX NSDI, San Jose, CA, 2012, pp. 1–14.
[3] C. Ding, D. Chu, E. Zhao, X. Li, L. Alvisi, and R. van Renesse, "Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log," in Proc. 17th USENIX NSDI, 2020.
[4] X. Luo et al., "LazyLog: A New Shared Log Abstraction for Low-Latency Applications," in Proc. ACM SOSP, 2024.
[5] G. Wang et al., "Building a Replicated Logging System with Apache Kafka," Proc. VLDB Endowment, vol. 8, no. 12, 2015.
[6] Confluent, Inc., "Multi-Geo Replication in Apache Kafka," 2023. [Online]. Available: https://www.confluent.io/blog/multi-geo-replication-in-apache-kafka/
[7] J. Lockerman et al., "The FuzzyLog: A Partially Ordered Shared Log," in Proc. USENIX OSDI, 2018.
[8] F. Nawab, V. Arora, D. Agrawal, and A. El Abbadi, "Chariots: A Scalable Shared Log for Data Management in Multi-Datacenter Cloud Environments," in Proc. 18th EDBT, Brussels, 2015, pp. 13–24.
[9] J. Cipar et al., "LazyBase: Trading Freshness for Performance in a Scalable Database," in Proc. 7th ACM EuroSys, Bern, 2012.
[10] S. Bhatt et al., "DistributedLog: A High Performance Replicated Log Service," Twitter Engineering, 2016.
[11] L. Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System," Commun. ACM, vol. 21, no. 7, pp. 558–565, 1978.
[12] K. Goodhope et al., "Building LinkedIn's Real-time Activity Data Pipeline," IEEE Data Eng. Bull., vol. 35, no. 2, 2012.
[13] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, "ZooKeeper: Wait-free Coordination for Internet-scale Systems," in Proc. USENIX ATC, 2010.
[14] J. C. Corbett et al., "Spanner: Google's Globally-Distributed Database," in Proc. USENIX OSDI, 2012, pp. 251–264.
[15] Z. Jia and E. Witchel, "Boki: Stateful Serverless Computing with Shared Logs," in Proc. ACM SOSP, 2021.
[16] M. Balakrishnan et al., "Tango: Distributed Data Structures over a Shared Log," in Proc. ACM SOSP, 2013.
[17] M. Kleppmann and J. Kreps, "Kafka, Samza and the Unix Philosophy of Distributed Data," IEEE Data Eng. Bull., vol. 38, no. 4, 2015.
[18] D. Ongaro and J. K. Ousterhout, "In Search of an Understandable Consensus Algorithm," in Proc. USENIX ATC, 2014.
[19] M. Balakrishnan et al., "Taming Consensus in the Wild (with the Shared Log Abstraction)," ACM SIGOPS OSR, 2024.
[20] R. C. Fernandez et al., "Liquid: Unifying Nearline and Offline Big Data Integration," in Proc. CIDR, 2015.
[21] Sakariya, A. B. (2023). Trends in the Rubber Industry: A Comparative Study of Asia and Europe. European Economic Letters ISSN 2323-5233 http://eelet.org.uk, 13(4), 1342-1349.
