Multi-Horizon GPU Demand Forecasting with Workload Semantics and Operational Risk Curves: An Empirical Study on Alibaba Clusterdata GPU Trace

Authors

  • Siming Zhao Business Analytics, Columbia University, NY, USA
  • Jingwen Bai Data Science, Columbia University, NY, USA
  • Drew Roberson Computer Science, Clemson University, SC, USA

DOI:

https://doi.org/10.51903/jtie.v4i3.498

Keywords:

GPU clusters, capacity planning, multi-horizon forecasting, workload semantics

Abstract

This study addresses the operational challenge of multi-horizon GPU demand forecasting in large-scale computing clusters, where GPUs are costly resources and demand fluctuates under constraint-driven scheduling. The objective is to evaluate whether integrating workload semantics improves forecasting performance across horizons up to 72 hours. A reproducible empirical benchmark is developed using the Alibaba Clusterdata GPU trace (cluster-trace-gpu-v2023), comprising 8,152 pods over approximately 149 days with a total capacity of 6,212 GPUs. The study compares two statistical baselines, ARIMA(48,0,0) and a seasonal-trend additive model, with three lightweight deep learning models: Temporal Convolutional Network (TCN), Informer-lite, and TFT-lite. Workload semantics are approximated by converting hourly job metadata into textual summaries, embedding them with TF-IDF and truncated SVD (8 dimensions), and incorporating them as exogenous covariates. Evaluation uses SMAPE and MASE across multiple horizons (1–72 hours), along with peak-aware metrics and operational risk curves. Results show that the seasonal-trend model achieves the best overall accuracy (15.34% sMAPE), while TCN is the strongest deep model (17.20% sMAPE). Semantic embeddings do not improve short horizons (1–48 hours) but reduce 72-hour sMAPE by 11.1% and improve peak-window error. These findings indicate that autoregressive signals dominate short-term forecasting, whereas semantic context becomes beneficial at longer horizons. The study emphasizes that combining point accuracy with risk-based evaluation is essential for effective GPU capacity planning under dynamic and uncertain demand conditions.

References

Amiri, M., & Mohammad-Khanli, L. (2017). Survey on Prediction Models of Applications for Resources Provisioning in Cloud. Journal of Network and Computer Applications, 82, 93–113. https://doi.org/10.1016/j.jnca.2017.01.016

Bai, S., Kolter, J. Z., & Koltun, V. (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv Preprint arXiv:1803.01271. https://arxiv.org/abs/1803.01271

Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time Series Analysis: Forecasting And Control (5th ed.). Hoboken, NJ: Wiley. https://doi.org/10.1002/9781118675021

Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2016). Borg, omega, and kubernetes. Communications of the ACM, 59(5), 50-57. https://doi.org/10.1145/2890784

Chen, J., Xiong, J., Wang, Y., Xin, Q., & Zhou, H. (2024). Implementation of an AI-based MRD Evaluation and Prediction Model for Multiple Myeloma. Frontiers in Computing and Intelligent Systems, 6(3), 127–131. https://doi.org/10.54097/zj4mnbww

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing By Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 391–407. https://doi.org/10.1002/(sici)1097-4571(199009)41:6

Haidar, L. R., Huda, H. I., & Sudirman, B. (2022). Analysis Security of Absence Information System with Finger Print Using Minutiae Method on STEKOM. JUISI: Jurnal Ilmiah Sistem Informasi, 1(1), 86–94. https://doi.org/10.51903/juisi.v1i1.319

Handoko, M., Yulianto, A. R., Jatinurcahyo, R., Subariyanti, H., Nikmah, W., Adawia, P. R., Yulianto, Y., & Armaniah, H. (2025). Implementation of MIS (Management Information System) to Improve Efficiency and Security of Interbank Transactions Using BCA Mobile (Case Study at Bank BCA Tbk). Journal of Technology Informatics and Engineering, 4(2), 791–806. https://doi.org/10.51903/jtie.v4i2.201

Hanqi, Z. (2023). DriftGuard: Multi-Signal Drift Early Warning and Safe Re-Training/Rollback for CTR/CVR Models. Journal of Advanced Computing Systems, 3(7), 24–40. https://doi.org/10.69987/jacs.2023.30703

Hanqi, Z. (2024). Risk-Aware Budget-Constrained Auto-Bidding Under First-Price RTB: A Distributional Constrained Deep Reinforcement Learning Framework. Journal of Advanced Computing Systems, 4(6), 30–47. https://doi.org/10.69987/jacs.2024.40603

Hanqi, Z. (2025a). Counterfactual Learning-to-Rank for Ads: Off-Policy Evaluation on the Open Bandit Dataset. Journal of Advanced Computing Systems, 5(12), 1–11. https://doi.org/10.69987/jacs.2025.51201

Hanqi, Z. (2025b). Privacy-Preserving Bid Optimization and Incrementality Estimation Under Privacy Sandbox Constraints: A Reproducible Study of Differential Privacy, Aggregation, and Signal Loss. Journal of Computing Innovations and Applications, 3(2), 51–65. https://doi.org/10.63575/cia.2025.30204

Herbst, N. R., Huber, N., Kounev, S., & Amrehn, E. (2014). Self-Adaptive Workload Classification and Forecasting for Proactive Resource Provisioning. Concurrency and Computation: Practice and Experience, 26(12), 2053–2078. https://doi.org/10.1002/cpe.3224

Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.). OTexts. https://otexts.com/fpp3/

Hyndman, R. J., & Koehler, A. B. (2006). Another Look at Measures of Forecast Accuracy. International Journal of Forecasting, 22(4), 679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001

Kuo, M.-J., Zhang, B., & Li, M. (2025). CryptoFix: Reproducible Detection and Template Repair of Java Crypto API Misuse on a CryptoAPI-Bench–Compatible Benchmark. Journal of Advanced Computing Systems, 5(11), 16–33. https://doi.org/10.69987/jacs.2025.51102

Kuo, M.-J., Zhang, B., & Wang, H. (2023). Tokenized Flow-Statistics Encrypted Traffic Analysis: Comparative Evaluation of 1D-CNN, BiLSTM, and Transformer on ISCX VPN-nonVPN 2016 (A1+A2, 60 s). Journal of Advanced Computing Systems, 3(8), 39–53. https://doi.org/10.69987/jacs.2023.30804

Lim, B., Arik, S. Ö., Loeff, N., & Pfister, T. (2021). Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting. International Journal of Forecasting, 37(4), 1748–1764. https://doi.org/10.1016/j.ijforecast.2021.03.012

Ling, Z., Xin, Q., Lin, Y., Su, G., & Shui, Z. (2024). Optimization of Autonomous Driving Image Detection Based on RFAConv and Triplet Attention. Applied and Computational Engineering, 77, 210–217. https://doi.org/10.54254/2755-2721/77/2024ma0067

Lu, Y., Zhou, H., & Zhang, Y. (2025). A Constrained, Data-Driven Budgeting Framework Integrating Macro Demand Forecasting and Marketing Response Modeling. Journal of Technology Informatics and Engineering, 4(3), 493–520. https://doi.org/10.51903/jtie.v4i3.466

Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2018). The M4 Competition: Results, Findings, Conclusion and Way Forward. International Journal of Forecasting, 34(4), 802–808. https://doi.org/10.1016/j.ijforecast.2018.06.001

Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G. R., & Zaharia, M. (2019). Tiresias: A GPU Cluster Manager for Distributed Deep Learning. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2019), 485-500. https://www.usenix.org/conference/nsdi19/presentation/narayanan

Oreshkin, B. N., Carpov, D., Chapados, N., & Bengio, Y. (2020). N BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting. In International Conference on Learning Representations (ICLR 2020). https://doi.org/10.48550/arXiv.1905.10437

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An Imperative Style, High Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 8024–8035. http://papers.nips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit Learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. http://www.jmlr.org/papers/v12/pedregosa11a.html

Rossi, A., Visentin, A., Carraro, D., Prestwich, S., & Brown, K. N. (2025). Forecasting Workload in Cloud Computing: Towards Uncertainty-Aware Predictions and Transfer Learning. Cluster Computing, 28(4), 258. https://doi.org/10.1007/s10586-024-04933-2

Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. (2020). DeepAR: Probabilistic Forecasting With Autoregressive Recurrent Networks. International Journal of Forecasting, 36(3), 1181–1191. https://doi.org/10.1016/j.ijforecast.2019.07.001

Salton, G., & Buckley, C. (1988). Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management, 24(5), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0

Seabold, S., & Perktold, J. (2010). Statsmodels: econometric and statistical modeling with python. scipy, 7(1),92-96. http://conference.scipy.org/proceedings/scipy2010/seabold.html

Shirakawa, T., Li, Y., Wu, Y., Qiu, S., Li, Y., Zhao, M., Iso, H., & van der Laan, M. (2024). Longitudinal Targeted Minimum Loss-Based Estimation With Temporal-Difference Heterogeneous Transformer. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), 45097–45113. https://proceedings.icml.cc/paper/2024/45097.pdf

Shumway, R. H., & Stoffer, D. S. (2017). Time Series Analysis and Its Applications: With R Examples (4th ed.). New York, NY: Springer. https://doi.org/10.1007/978-3-319-52452-8

Sudrajat, A., Handoko, M., Zahra, Z., Kurniawan, H., Solehudin, D., Sari, D. I., & Sumantri, F. (2025). Framework Analysis of Smart House Based on Orange Technology Use: Systematic Literature. Journal of Management and Informatics, 4(2), 807–821. https://doi.org/10.51903/jmi.v4i2.210

Sun, X., Lu, Y., & Chen, J. (2023). Controllable Long-Term User Memory for Multi-Session Dialogue: Confidence-gated Writing, Time-Aware Retrieval-Augmented Generation, and Update/Forgetting. Journal of Advanced Computing Systems, 3(8), 9–24. https://doi.org/10.69987/jacs.2023.30802

Sun, X., Chen, J., Zhou, B., & Kuo, M.-J. (2024). ConRAG: Contradiction-Aware Retrieval-Augmented Generation Under Multi-Source Conflicting Evidence. Journal of Advanced Computing Systems, 4(7), 50–64. https://doi.org/10.69987/jacs.2024.40705

Taylor, S. J., & Letham, B. (2018). Forecasting at Scale. The American Statistician, 72(1), 37–45. https://doi.org/10.1080/00031305.2017.1380080

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017), 5998–6008. https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., & Wilkes, J. (2015). Large-Scale Cluster Management at Google With Borg. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys 2015), 1-17. https://www.eurosys2015.org/archives/paper-12.pdf

Wang, B., He, Y., Shui, Z., Xin, Q., & Lei, H. (2024). Predictive Optimization of DDoS Attack Mitigation in Distributed Systems Using Machine Learning. In Proceedings of the 6th International Conference on Computing and Data Science (CDS 2024), 89–94. https://doi.org/10.1145/xyz1234

Weinberger, K., Dasgupta, A., Langford, J., Smola, A., & Attenberg, J. (2009). Feature Hashing for Large Scale Multitask Learning. In Proceedings of the 26th International Conference on Machine Learning (ICML 2009), 1113–1120. http://proceedings.mlr.press/v5/weinberger09a.html

Weng, Q., Yang, L., Yu, Y., Wang, W., Tang, X., Yang, G., & Zhang, L. (2023). Beware of Fragmentation: Scheduling GPU Sharing Workloads With Fragmentation Gradient Descent. In Proceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC 23), 995–1008. https://www.usenix.org/conference/atc23/presentation/weng

Xin, Q. (2025). Hybrid Cloud Architecture for Efficient and Cost-Effective Large Language Model Deployment. Journal of Information Systems and Informatics, 7(3), 2182–2195. https://doi.org/10.51519/journalisi.v7i3.1170

Xu, K., Zhou, H., Zheng, H., Zhu, M., & Xin, Q. (2024). Intelligent Classification and Personalized Recommendation of E-Commerce Products Based on Machine Learning. In Proceedings of the 6th International Conference on Computing and Data Science (CDS 2024). 101–108. https://doi.org/10.1145/xyz5678

Xiao, W., Bhardwaj, R., Ramjee, R., Sivathanu, S., Kwatra, V., Li, Z., & Zhou, L. (2018). Gandiva: Intuitive Cluster Scheduling for Deep Learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (USENIX OSDI 18), 595–610. https://www.usenix.org/conference/osdi18/presentation/xiao

Zhang, J. (2025). Graph-based knowledge tracing for personalized MOOC path recommendation. Journal of Advanced Computing Systems, 5(11), 1–15. https://doi.org/10.69987/jacs.2025.51101

Zhong, Z., Zheng, M., Mai, H., Zhao, J., & Liu, X. (2020). Cancer Image Classification Based on DenseNet Model. Journal of Physics: Conference Series, 1651(1), 012143. https://doi.org/10.1088/1742-6596/1651/1/012143

Zhong, Z. S., & Ling, S. (2024a). Improved Theoretical Guarantee for Rank Aggregation via Spectral Method. Information and Inference: A Journal of the IMA, 13(3), 020. https://doi.org/10.1093/imaiai/iaae020

Zhong, Z. S., & Ling, S. (2024b). Uncertainty Quantification of Spectral Estimator and MLE for Orthogonal Group Synchronization. arXiv Preprint arXiv:2408.05944. https://arxiv.org/abs/2408.05944

Zhong, Z. S., Pan, X., & Lei, Q. (2025). Bridging Domains With Approximately Shared Features. In Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS 2025). https://www.aistats.org/aistats2025/

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2021), 35(12), 11106–11115. https://doi.org/10.1609/aaai.v35i16.17644

Downloads

Published

2025-12-20

How to Cite

Multi-Horizon GPU Demand Forecasting with Workload Semantics and Operational Risk Curves: An Empirical Study on Alibaba Clusterdata GPU Trace. (2025). Journal of Technology Informatics and Engineering, 4(3), 544-571. https://doi.org/10.51903/jtie.v4i3.498