Oreoluwa Omoike

Work place: Department of Computer Science, Olabisi Onabanjo University, Ogun State, Nigeria

E-mail: oreisreal@gmail.com

Website: https://orcid.org/0009-0001-2131-9518

Research Interests:

Biography

Oreoluwa Omoike was born in Nigeria. She earned her degree in Computer Science at the Olabisi Onabanjo
University, Ogun State, Nigeria, where she is currently affiliated with the Department of Computer Science.
She works in the areas of cloud-native distributed systems, site reliability engineering (SRE), observability
frameworks, DevOps automation, and the application of statistical causal inference to real-time system
reliability problems. Her methodological research focuses on the adaptation of Granger vector autoregression,
Bayesian probabilistic graphical models, and information-theoretic causality measures to Kubernetes-based
microservice environments. A secondary research direction investigates the convergence of reliability
observability and cloud-native cybersecurity anomaly detection.
Mrs. Omoike is a member of the computer science research community.

Author Articles
Implementing Causal Observability for Practical Site Reliability Engineering in Cloud-Native Distributed Systems

By Oreoluwa Omoike

DOI: https://doi.org/10.5815/ijwmt.2026.02.02, Pub. Date: 8 Apr. 2026

This paper presents a Causal Observability Framework designed to enhance the reliability and performance of cloud-native distributed systems through structured integration with the DevOps pipeline. The framework unifies three interdependent components: real-time telemetry collection, dual-domain causal tracing, and probabilistic causal inference. The causal tracing layer combines a time-domain vector autoregressive Granger causality model with a discrete Fourier transform frequency-domain extension. The causal inference layer employs Bayesian network propagation, updated online via the Expectation-Maximisation algorithm, to compute posterior downstream failure probabilities from upstream anomaly observations. Validation was conducted through a controlled, three-replicate experimental study on a seven-service AI-powered recommendation application deployed across a dual-provider six-node Kubernetes cluster (AWS EKS and GCP GKE) under three traffic profiles ranging from 50 to 500 requests per second. Against a conventional threshold-based monitoring baseline, the proposed framework achieved: a 35% reduction in incident response time (70 minutes to 45 minutes), a 40% reduction in mean time to recovery (50 minutes to 30 minutes), a 1.5 percentage-point improvement in system availability (98.0% to 99.5%), a 61% reduction in false-positive alert rate (18% to 7%), and a 63% improvement in root-cause localisation accuracy (54% to 88%). All five improvements were statistically significant at p < 0.05 via paired t-test. A quantified nine-minute early-warning lead time over conventional detection was demonstrated in the fault-injection scenario. Seven formal equations underpin the methodology, spanning Granger vector autoregression, F-test inference, AIC-based lag selection, normalised causality scoring, frequency-domain spectral causality, Bayesian posterior propagation, and expected detection lead time.

[...] Read more.
Other Articles