Few-Shot Cold-Start Workload Forecasting for New AI Inference Tenants with Time-Series Foundation Models

Shilu  He; Chengliang  Li; Hengning  Rao

doi:10.51903/jtie.v4i1.546

Authors

Shilu He Mathematics, UW-Madison, WI, USA
Chengliang Li Information Studies, Trine University, VA, USA
Hengning Rao Electrical and Computer Engineering, UIUC, IL, USA

DOI:

https://doi.org/10.51903/jtie.v4i1.546

Keywords:

cold-start forecasting, time-series foundation models, DLRM serving, GPU disaggregation, capacity planning, few-shot learning

Abstract

This paper presents a reproducible empirical study of few-shot cold-start workload forecasting for new AI inference tenants using the Alibaba GPU-disaggregated DLRM serving trace. Instance lifecycles are transformed into hourly active-demand series, and resource reservations are normalized into capacity units to evaluate 24-hour forecasting under zero-shot, 5-shot, 10-shot, and full-history settings. Seven forecasting methods are compared: archetype mean prior, persistence, moving average, linear trend, seasonal naive, global residual ridge, and CT-TSFM, a compact cross-tenant time-series foundation model. The cold-start evaluation uses 46 held-out tenants, with 110 source tenants for pretraining and calibration. Results show that hourly demand is strongly persistence-dominated. Zero-shot forecasting yields a mean absolute error (MAE) of 326.26 normalized capacity units, whereas only five observations reduce MAE to 4.00 for persistence, global residual ridge, and CT-TSFM. Validation consistently selects a residual gate of 0.0 for CT-TSFM, indicating that retaining the persistence prior and rejecting cross-tenant residual transfer is the most reliable strategy. Calibration intervals achieve approximately 85–87% coverage against a 90% target. The findings demonstrate that a few recent observations substantially improve cold-start forecasting, while source-tenant metadata alone provides limited zero-shot planning capability.

References

Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., & Zaharia, M. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50–58. https://doi.org/10.1145/1721654.1721672

Barroso, L. A., Hölzle, U., & Ranganathan, P. (2019). The datacenter as a computer: Designing warehouse-scale machines (3rd ed.). Morgan & Claypool.

Binghua Zhou, Siming Zhao, & David Chao. (2023). LLM-Guided Energy-Aware A/B Testing for Consolidation and DVFS Policies via Power-Sensitivity Clustering. Journal of Advanced Computing Systems , 3(4), 12-30. https://doi.org/10.69987/JACS.2023.30402

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterjee, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., ... Liang, P. (2021). On the opportunities and risks of foundation models. arXiv. https://arxiv.org/abs/2108.07258

Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: Forecasting and control (5th ed.). Wiley.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.

Challu, C., Olivares, K. G., Oreshkin, B. N., Ramirez, F. G., Canseco, M. M., & Dubrawski, A. (2023). N-HiTS: Neural hierarchical interpolation for time series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 6989–6997.

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785

Das, A., Kong, W., Sen, R., & Zhou, Y. (2023). A decoder-only foundation model for time-series forecasting. arXiv. https://arxiv.org/abs/2310.10688

Delimitrou, C., & Kozyrakis, C. (2014). Quasar: Resource-efficient and QoS-aware cluster management. Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, 127–144. https://doi.org/10.1145/2541940.2541941

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019, 4171–4186.

Garza, A., & Mergenthaler-Canseco, M. (2023). TimeGPT-1. arXiv. https://arxiv.org/abs/2310.03589

Grandl, R., Chowdhury, M., Akella, A., & Ananthanarayanan, G. (2014). Altruistic scheduling in multi-resource cluster schedulers. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, 65–80.

Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and practice (3rd ed.). OTexts.

Hyndman, R. J., Koehler, A. B., Ord, J. K., & Snyder, R. D. (2008). Forecasting with exponential smoothing: The state space approach. Springer.

Jeon, M., Venkataraman, S., Phanishayee, A., Qian, J., Xiao, W., & Yang, F. (2019). Analysis of large-scale multi-tenant GPU clusters for DNN training workloads. 2019 USENIX Annual Technical Conference, 947–960.

Jiaying Jin, Tina Huang, & Sam Lu. (2024). Cost-Sensitive Learning, Simulated PU Learning, and One-Class Autoencoding for Extreme-Imbalance Credit Card Fraud Detection. Journal of Advanced Computing Systems , 4(6), 64-73. https://doi.org/10.69987/JACS.2024.40605

Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J., Shi, X., Chen, P.-Y., Liang, Y., Li, Y.-F., Pan, S., & Wen, Q. (2023). Time-LLM: Time series forecasting by reprogramming large language models. arXiv. https://arxiv.org/abs/2310.01728

Lai, G., Chang, W.-C., Yang, Y., & Liu, H. (2018). Modeling long- and short-term temporal patterns with deep neural networks. Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, 95–104. https://doi.org/10.1145/3209978.3210006

Lim, B., Arik, S. Ö., Loeff, N., & Pfister, T. (2021). Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37(4), 1748–1764. https://doi.org/10.1016/j.ijforecast.2021.03.012

Lu, Y., Zhou, H., & Zhang, Y. (2025). A constrained, data-driven budgeting framework integrating macro demand forecasting and marketing response modeling. Journal of Technology Informatics and Engineering, 4(3). https://doi.org/10.51903/jtie.v4i3.466

Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2018). The M4 Competition: Results, findings, conclusion and way forward. International Journal of Forecasting, 34(4), 802–808. https://doi.org/10.1016/j.ijforecast.2018.06.001

Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., & Talagala, T. S. (2020). FFORMA: Feature-based forecast model averaging. International Journal of Forecasting, 36(1), 86–92. https://doi.org/10.1016/j.ijforecast.2019.02.011

Oreshkin, B. N., Carpov, D., Chapados, N., & Bengio, Y. (2020). N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. International Conference on Learning Representations.

Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., & Kozuch, M. A. (2012). Heterogeneity and dynamicity of clouds at scale: Google trace analysis. Proceedings of the Third ACM Symposium on Cloud Computing, 1–13. https://doi.org/10.1145/2391229.2391236

Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. (2020). DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3), 1181–1191. https://doi.org/10.1016/j.ijforecast.2019.07.001

Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37–45. https://doi.org/10.1080/00031305.2017.1380080

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008.

Wen, R., Torkkola, K., Narayanaswamy, B., & Madeka, D. (2017). A multi-horizon quantile recurrent forecaster. NeurIPS Time Series Workshop.

Woo, G., Liu, C., Sahoo, D., Kumar, A., & Hoi, S. C. H. (2022). ETSformer: Exponential smoothing transformers for time-series forecasting. arXiv. https://arxiv.org/abs/2202.01381

Wu, H., Xu, J., Wang, J., & Long, M. (2021). Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34, 22419–22430.

Yunhe Li. (2023). Risk-Sensitive Offline Reinforcement Learning for Stable ABR QoE Improvements on Real HSDPA and LTE Traces. Journal of Advanced Computing Systems , 3(4), 1-11. https://doi.org/10.69987/JACS.2023.30401

Zerveas, G., Jayaraman, S., Patel, D., Bhamidipaty, A., & Eickhoff, C. (2021). A transformer-based framework for multivariate time series representation learning. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2114–2124. https://doi.org/10.1145/3447548.3467401

Zhao, S., Bai, J., & Roberson, D. (2025). Multi-horizon GPU demand forecasting with workload semantics and operational risk curves: An empirical study on Alibaba clusterdata GPU trace. JTIE : Journal of Technology Informatics and Engineering, 4(3). https://doi.org/10.51903/jtie.v4i3.498

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021). Informer: Beyond efficient transformer for long sequence time-series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 11106–11115.

Zhou, T., Ma, Z., Wen, Q., Wang, X., Sun, L., & Jin, R. (2022). FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. Proceedings of the 39th International Conference on Machine Learning, 27268–27286.

Zhou, T., Niu, P., Wang, X., Sun, L., & Jin, R. (2023). One fits all: Power general time series analysis by pretrained LM. Advances in Neural Information Processing Systems, 36.