Cost-Aware LLM-Style Routing for AIOps Log Analysis: Log Parsing, Anomaly Detection, Fault Diagnosis, and Incident Summarization on LogEval Task Files

Chenyu  Li; Ge  Liu; Zoe  Zhao

doi:10.51903/jtie.v5i2.538

Authors

Chenyu Li Applied Analytics, Columbia University, NY, USA
Ge Liu Computer Science, USC, CA, USA
Zoe Zhao Computer Science, UCSD, CA, USA

DOI:

https://doi.org/10.51903/jtie.v5i2.538

Keywords:

AIOps, log analysis, anomaly detection, fault diagnosis, incident summarization, cost-aware inference, log parsing

Abstract

This study investigates a local and cost-aware routing framework for AIOps log analysis using the LogEval benchmark. The evaluation covers four tasks: log parsing, anomaly detection, fault diagnosis, and incident summarization. Instead of relying on external large language model APIs, the experiment implements deterministic local policies that simulate zero-shot and few-shot LLM-style inference under controlled token-cost and latency assumptions. Six approaches were compared: regex normalization, TF-IDF with machine learning, a local character-based classifier, zero-shot policy, few-shot retrieval policy, and a routing cascade. At a risk threshold of 0.20, the router directed only 12.9% of queries to the few-shot retrieval policy while achieving parsing accuracy of 0.991, anomaly F1-score of 1.000, diagnosis accuracy of 1.000, ROUGE-L of 0.743, and BLEU-1 of 0.814. The routing strategy reduced simulated token cost by 80.1% compared with always using few-shot retrieval. Additional unseen-template evaluation revealed limited generalization for closed-label classifiers and retrieval methods when encountering unseen patterns. The findings indicate that routing can effectively reduce AIOps inference costs, while further validation with real LLMs and stronger generalization testing are required before production deployment.

References

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems, 33, 1877-1901.

Chen, L., Zaharia, M., & Zou, J. (2023). FrugalGPT: How to use large language models while reducing cost and improving performance. arXiv.

Cui, T., Ma, S., Chen, Z., Xiao, T., Tao, S., Liu, Y., Zhang, S., Lin, D., Liu, C., Cai, Y., Meng, W., Sun, Y., & Pei, D. (2024). LogEval: A comprehensive benchmark suite for large language models in log analysis. arXiv:2407.01896.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT (pp. 4171-4186).

Du, M., Li, F., Zheng, G., & Srikumar, V. (2017). DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (pp. 1285-1298).

He, P., Zhu, J., Zheng, Z., & Lyu, M. R. (2017). Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE International Conference on Web Services (pp. 33-40).

He, S., He, P., Chen, Z., Yang, T., Su, Y., & Lyu, M. R. (2021). A survey on automated log analysis for reliability engineering. ACM Computing Surveys, 54(6), 1-37.

Landauer, M., Skopik, F., Wurzenberger, M., & Rauber, A. (2023). Deep learning for anomaly detection in log data: A survey. ACM Computing Surveys, 56(2), 1-37.

Meng, W., Liu, Y., Zhu, Y., Zhang, S., Pei, D., Liu, Y., Chen, Y., Zhang, R., Tao, S., Sun, P., & Zhou, R. (2019). LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering.

Oliner, A., & Stearley, J. (2007). What supercomputers say: A study of five system logs. In 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (pp. 575-584).

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, 30.

Xu, W., Huang, L., Fox, A., Patterson, D., & Jordan, M. I. (2009). Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (pp. 117-132).

Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y., Xie, C., Yang, X., Cheng, Q., Li, Z., Chen, J., He, X., Yao, R., Lou, J.-G., & Chintalapati, M. (2019). Robust log-based anomaly detection on unstable log data. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 807-817).

Zhu, J., He, S., Liu, J., He, P., Xie, Q., Zheng, Z., & Lyu, M. R. (2019). Tools and benchmarks for automated log parsing. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (pp. 121-130).