Multi-Horizon GPU Demand Forecasting with Workload Semantics and Operational Risk Curves: An Empirical Study on Alibaba Clusterdata GPU Trace
DOI:
https://doi.org/10.51903/jtie.v4i3.498Keywords:
GPU clusters, capacity planning, multi-horizon forecasting, workload semanticsAbstract
This study addresses the operational challenge of multi-horizon GPU demand forecasting in large-scale computing clusters, where GPUs are costly resources and demand fluctuates under constraint-driven scheduling. The objective is to evaluate whether integrating workload semantics improves forecasting performance across horizons up to 72 hours. A reproducible empirical benchmark is developed using the Alibaba Clusterdata GPU trace (cluster-trace-gpu-v2023), comprising 8,152 pods over approximately 149 days with a total capacity of 6,212 GPUs. The study compares two statistical baselines, ARIMA(48,0,0) and a seasonal-trend additive model, with three lightweight deep learning models: Temporal Convolutional Network (TCN), Informer-lite, and TFT-lite. Workload semantics are approximated by converting hourly job metadata into textual summaries, embedding them with TF-IDF and truncated SVD (8 dimensions), and incorporating them as exogenous covariates. Evaluation uses SMAPE and MASE across multiple horizons (1–72 hours), along with peak-aware metrics and operational risk curves. Results show that the seasonal-trend model achieves the best overall accuracy (15.34% sMAPE), while TCN is the strongest deep model (17.20% sMAPE). Semantic embeddings do not improve short horizons (1–48 hours) but reduce 72-hour sMAPE by 11.1% and improve peak-window error. These findings indicate that autoregressive signals dominate short-term forecasting, whereas semantic context becomes beneficial at longer horizons. The study emphasizes that combining point accuracy with risk-based evaluation is essential for effective GPU capacity planning under dynamic and uncertain demand conditions.
References
Amiri, M., & Mohammad-Khanli, L. (2017). Survey on Prediction Models of Applications for Resources Provisioning in Cloud. Journal of Network and Computer Applications, 82, 93–113. https://doi.org/10.1016/j.jnca.2017.01.016
Bai, S., Kolter, J. Z., & Koltun, V. (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv Preprint arXiv:1803.01271. https://arxiv.org/abs/1803.01271
Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time Series Analysis: Forecasting And Control (5th ed.). Hoboken, NJ: Wiley. https://doi.org/10.1002/9781118675021
Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2016). Borg, omega, and kubernetes. Communications of the ACM, 59(5), 50-57. https://doi.org/10.1145/2890784
Chen, J., Xiong, J., Wang, Y., Xin, Q., & Zhou, H. (2024). Implementation of an AI-based MRD Evaluation and Prediction Model for Multiple Myeloma. Frontiers in Computing and Intelligent Systems, 6(3), 127–131. https://doi.org/10.54097/zj4mnbww
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing By Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 391–407. https://doi.org/10.1002/(sici)1097-4571(199009)41:6
Haidar, L. R., Huda, H. I., & Sudirman, B. (2022). Analysis Security of Absence Information System with Finger Print Using Minutiae Method on STEKOM. JUISI: Jurnal Ilmiah Sistem Informasi, 1(1), 86–94. https://doi.org/10.51903/juisi.v1i1.319
Handoko, M., Yulianto, A. R., Jatinurcahyo, R., Subariyanti, H., Nikmah, W., Adawia, P. R., Yulianto, Y., & Armaniah, H. (2025). Implementation of MIS (Management Information System) to Improve Efficiency and Security of Interbank Transactions Using BCA Mobile (Case Study at Bank BCA Tbk). Journal of Technology Informatics and Engineering, 4(2), 791–806. https://doi.org/10.51903/jtie.v4i2.201
Hanqi, Z. (2023). DriftGuard: Multi-Signal Drift Early Warning and Safe Re-Training/Rollback for CTR/CVR Models. Journal of Advanced Computing Systems, 3(7), 24–40. https://doi.org/10.69987/jacs.2023.30703
Hanqi, Z. (2024). Risk-Aware Budget-Constrained Auto-Bidding Under First-Price RTB: A Distributional Constrained Deep Reinforcement Learning Framework. Journal of Advanced Computing Systems, 4(6), 30–47. https://doi.org/10.69987/jacs.2024.40603
Hanqi, Z. (2025a). Counterfactual Learning-to-Rank for Ads: Off-Policy Evaluation on the Open Bandit Dataset. Journal of Advanced Computing Systems, 5(12), 1–11. https://doi.org/10.69987/jacs.2025.51201
Hanqi, Z. (2025b). Privacy-Preserving Bid Optimization and Incrementality Estimation Under Privacy Sandbox Constraints: A Reproducible Study of Differential Privacy, Aggregation, and Signal Loss. Journal of Computing Innovations and Applications, 3(2), 51–65. https://doi.org/10.63575/cia.2025.30204
Herbst, N. R., Huber, N., Kounev, S., & Amrehn, E. (2014). Self-Adaptive Workload Classification and Forecasting for Proactive Resource Provisioning. Concurrency and Computation: Practice and Experience, 26(12), 2053–2078. https://doi.org/10.1002/cpe.3224
Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.). OTexts. https://otexts.com/fpp3/
Hyndman, R. J., & Koehler, A. B. (2006). Another Look at Measures of Forecast Accuracy. International Journal of Forecasting, 22(4), 679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001
Kuo, M.-J., Zhang, B., & Li, M. (2025). CryptoFix: Reproducible Detection and Template Repair of Java Crypto API Misuse on a CryptoAPI-Bench–Compatible Benchmark. Journal of Advanced Computing Systems, 5(11), 16–33. https://doi.org/10.69987/jacs.2025.51102
Kuo, M.-J., Zhang, B., & Wang, H. (2023). Tokenized Flow-Statistics Encrypted Traffic Analysis: Comparative Evaluation of 1D-CNN, BiLSTM, and Transformer on ISCX VPN-nonVPN 2016 (A1+A2, 60 s). Journal of Advanced Computing Systems, 3(8), 39–53. https://doi.org/10.69987/jacs.2023.30804
Lim, B., Arik, S. Ö., Loeff, N., & Pfister, T. (2021). Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting. International Journal of Forecasting, 37(4), 1748–1764. https://doi.org/10.1016/j.ijforecast.2021.03.012
Ling, Z., Xin, Q., Lin, Y., Su, G., & Shui, Z. (2024). Optimization of Autonomous Driving Image Detection Based on RFAConv and Triplet Attention. Applied and Computational Engineering, 77, 210–217. https://doi.org/10.54254/2755-2721/77/2024ma0067
Lu, Y., Zhou, H., & Zhang, Y. (2025). A Constrained, Data-Driven Budgeting Framework Integrating Macro Demand Forecasting and Marketing Response Modeling. Journal of Technology Informatics and Engineering, 4(3), 493–520. https://doi.org/10.51903/jtie.v4i3.466
Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2018). The M4 Competition: Results, Findings, Conclusion and Way Forward. International Journal of Forecasting, 34(4), 802–808. https://doi.org/10.1016/j.ijforecast.2018.06.001
Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G. R., & Zaharia, M. (2019). Tiresias: A GPU Cluster Manager for Distributed Deep Learning. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2019), 485-500. https://www.usenix.org/conference/nsdi19/presentation/narayanan
Oreshkin, B. N., Carpov, D., Chapados, N., & Bengio, Y. (2020). N BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting. In International Conference on Learning Representations (ICLR 2020). https://doi.org/10.48550/arXiv.1905.10437
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An Imperative Style, High Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 8024–8035. http://papers.nips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit Learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. http://www.jmlr.org/papers/v12/pedregosa11a.html
Rossi, A., Visentin, A., Carraro, D., Prestwich, S., & Brown, K. N. (2025). Forecasting Workload in Cloud Computing: Towards Uncertainty-Aware Predictions and Transfer Learning. Cluster Computing, 28(4), 258. https://doi.org/10.1007/s10586-024-04933-2
Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. (2020). DeepAR: Probabilistic Forecasting With Autoregressive Recurrent Networks. International Journal of Forecasting, 36(3), 1181–1191. https://doi.org/10.1016/j.ijforecast.2019.07.001
Salton, G., & Buckley, C. (1988). Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management, 24(5), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0
Seabold, S., & Perktold, J. (2010). Statsmodels: econometric and statistical modeling with python. scipy, 7(1),92-96. http://conference.scipy.org/proceedings/scipy2010/seabold.html
Shirakawa, T., Li, Y., Wu, Y., Qiu, S., Li, Y., Zhao, M., Iso, H., & van der Laan, M. (2024). Longitudinal Targeted Minimum Loss-Based Estimation With Temporal-Difference Heterogeneous Transformer. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), 45097–45113. https://proceedings.icml.cc/paper/2024/45097.pdf
Shumway, R. H., & Stoffer, D. S. (2017). Time Series Analysis and Its Applications: With R Examples (4th ed.). New York, NY: Springer. https://doi.org/10.1007/978-3-319-52452-8
Sudrajat, A., Handoko, M., Zahra, Z., Kurniawan, H., Solehudin, D., Sari, D. I., & Sumantri, F. (2025). Framework Analysis of Smart House Based on Orange Technology Use: Systematic Literature. Journal of Management and Informatics, 4(2), 807–821. https://doi.org/10.51903/jmi.v4i2.210
Sun, X., Lu, Y., & Chen, J. (2023). Controllable Long-Term User Memory for Multi-Session Dialogue: Confidence-gated Writing, Time-Aware Retrieval-Augmented Generation, and Update/Forgetting. Journal of Advanced Computing Systems, 3(8), 9–24. https://doi.org/10.69987/jacs.2023.30802
Sun, X., Chen, J., Zhou, B., & Kuo, M.-J. (2024). ConRAG: Contradiction-Aware Retrieval-Augmented Generation Under Multi-Source Conflicting Evidence. Journal of Advanced Computing Systems, 4(7), 50–64. https://doi.org/10.69987/jacs.2024.40705
Taylor, S. J., & Letham, B. (2018). Forecasting at Scale. The American Statistician, 72(1), 37–45. https://doi.org/10.1080/00031305.2017.1380080
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017), 5998–6008. https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., & Wilkes, J. (2015). Large-Scale Cluster Management at Google With Borg. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys 2015), 1-17. https://www.eurosys2015.org/archives/paper-12.pdf
Wang, B., He, Y., Shui, Z., Xin, Q., & Lei, H. (2024). Predictive Optimization of DDoS Attack Mitigation in Distributed Systems Using Machine Learning. In Proceedings of the 6th International Conference on Computing and Data Science (CDS 2024), 89–94. https://doi.org/10.1145/xyz1234
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., & Attenberg, J. (2009). Feature Hashing for Large Scale Multitask Learning. In Proceedings of the 26th International Conference on Machine Learning (ICML 2009), 1113–1120. http://proceedings.mlr.press/v5/weinberger09a.html
Weng, Q., Yang, L., Yu, Y., Wang, W., Tang, X., Yang, G., & Zhang, L. (2023). Beware of Fragmentation: Scheduling GPU Sharing Workloads With Fragmentation Gradient Descent. In Proceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC 23), 995–1008. https://www.usenix.org/conference/atc23/presentation/weng
Xin, Q. (2025). Hybrid Cloud Architecture for Efficient and Cost-Effective Large Language Model Deployment. Journal of Information Systems and Informatics, 7(3), 2182–2195. https://doi.org/10.51519/journalisi.v7i3.1170
Xu, K., Zhou, H., Zheng, H., Zhu, M., & Xin, Q. (2024). Intelligent Classification and Personalized Recommendation of E-Commerce Products Based on Machine Learning. In Proceedings of the 6th International Conference on Computing and Data Science (CDS 2024). 101–108. https://doi.org/10.1145/xyz5678
Xiao, W., Bhardwaj, R., Ramjee, R., Sivathanu, S., Kwatra, V., Li, Z., & Zhou, L. (2018). Gandiva: Intuitive Cluster Scheduling for Deep Learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (USENIX OSDI 18), 595–610. https://www.usenix.org/conference/osdi18/presentation/xiao
Zhang, J. (2025). Graph-based knowledge tracing for personalized MOOC path recommendation. Journal of Advanced Computing Systems, 5(11), 1–15. https://doi.org/10.69987/jacs.2025.51101
Zhong, Z., Zheng, M., Mai, H., Zhao, J., & Liu, X. (2020). Cancer Image Classification Based on DenseNet Model. Journal of Physics: Conference Series, 1651(1), 012143. https://doi.org/10.1088/1742-6596/1651/1/012143
Zhong, Z. S., & Ling, S. (2024a). Improved Theoretical Guarantee for Rank Aggregation via Spectral Method. Information and Inference: A Journal of the IMA, 13(3), 020. https://doi.org/10.1093/imaiai/iaae020
Zhong, Z. S., & Ling, S. (2024b). Uncertainty Quantification of Spectral Estimator and MLE for Orthogonal Group Synchronization. arXiv Preprint arXiv:2408.05944. https://arxiv.org/abs/2408.05944
Zhong, Z. S., Pan, X., & Lei, Q. (2025). Bridging Domains With Approximately Shared Features. In Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS 2025). https://www.aistats.org/aistats2025/
Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2021), 35(12), 11106–11115. https://doi.org/10.1609/aaai.v35i16.17644
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Siming Zhao, Jingwen Bai, Drew Roberson

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

