Profit-Aware Spot GPU Admission Control with Cost-Sensitive Loss and Evidence-Grounded Policy Memos for AI Workload Supply-Demand Matching

Siming Zhao; Yuxuan  Ren; Xiaohan Chang

doi:10.51903/jtie.v5i2.545

Authors

Siming Zhao Business Analytics, Columbia University, NY, USA
Yuxuan Ren Chemical Engineering, University of Washington, WA, USA
Xiaohan Chang Computer Science, University of Connecticut, CT, USA

DOI:

https://doi.org/10.51903/jtie.v5i2.545

Keywords:

Spot GPU, AI infrastructure, admission control, cost-sensitive learning, workload forecasting, GPU scheduling

Abstract

AI clusters increasingly operate with heterogeneous GPU resources where production workloads and opportunistic spot jobs compete for limited accelerator capacity. This study presents a trace-driven admission-control framework using the Alibaba cluster-trace-v2026-spot-gpu dataset, consisting of 466,867 job records and 4,278 GPU-node records. The experiment evaluates GPU demand forecasting, profit-aware spot admission control, and evidence-grounded operational policy generation using chronological training, validation, and test splits. Hourly spot GPU demand forecasting was evaluated across six GPU models, where Ridge regression achieved the best test performance with an RMSE of 38.50 requested GPUs per hour, improving over both last-hour and seasonal naive baselines. The admission-control evaluation compared FIFO, greedy packing, classifier-based acceptance, utility ranking, and the proposed cost-sensitive policy. The proposed approach achieved a test profit of 67,278.96, improving 1.97% over the accuracy-oriented classifier while increasing spot success rate and reducing costly false acceptances by 13.17%. Sensitivity analysis showed that the optimal policy depends on the protection cost assigned to high-priority workloads. A deterministic evidence-grounded explanation layer generated 500 policy memos and passed numeric, policy, and evidence consistency checks. The findings suggest that profit-aware admission control can serve as a practical scheduling guardrail before detailed GPU placement and resource allocation decisions.

References

Alibaba Cluster Data. (2026). cluster-trace-v2026-spot-gpu. Alibaba Cluster Trace Program. https://github.com/alibaba/clusterdata/tree/master/cluster-trace-v2026-spot-gpu

Amazon Web Services. (2026a). Getting price list files using the AWS Price List Bulk API. AWS Documentation.

Amazon Web Services. (2026b). View Spot Instance pricing history. Amazon EC2 User Guide.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.

Burges, C. J. C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. Proceedings of the 22nd International Conference on Machine Learning, 89-96.

Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2016). Borg, Omega, and Kubernetes. Communications of the ACM, 59(5), 50-57.

Cheng, Y., Chai, Z., & Anwar, A. (2018). Characterizing co-located datacenter workloads: An Alibaba case study. Proceedings of the 9th Asia-Pacific Workshop on Systems.

Delimitrou, C., & Kozyrakis, C. (2014). Quasar: Resource-efficient and QoS-aware cluster management. Proceedings of ASPLOS, 127-144.

Elkan, C. (2001). The foundations of cost-sensitive learning. Proceedings of the 17th International Joint Conference on Artificial Intelligence, 973-978.

Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., & Stoica, I. (2011). Dominant resource fairness: Fair allocation of multiple resource types. Proceedings of NSDI, 323-336.

Grandl, R., Ananthanarayanan, G., Kandula, S., Rao, S., & Akella, A. (2014). Multi-resource packing for cluster schedulers. Proceedings of ACM SIGCOMM, 455-466.

Gu, J., Chowdhury, M., Shin, K. G., Zhu, Y., Jeon, M., Qian, J., Liu, H., & Guo, C. (2019). Tiresias: A GPU cluster manager for distributed deep learning. Proceedings of NSDI, 485-500.

Guo, J., Chang, Z., Wang, S., Ding, H., Feng, Y., Mao, L., & Bao, Y. (2019). Who limits the resource efficiency of my datacenter: An analysis of Alibaba datacenter traces. Proceedings of IWQoS.

Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A. D., Katz, R., Shenker, S., & Stoica, I. (2011). Mesos: A platform for fine-grained resource sharing in the data center. Proceedings of NSDI, 295-308.

Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679-688.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W.-t., Rocktaschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, 2980-2988.

Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.

Mahajan, K., Singhvi, A., Balasubramanian, A., Lee, B., Venkataraman, S., Akella, A., & Phanishayee, A. (2020). Themis: Fair and efficient GPU cluster scheduling. Proceedings of NSDI, 289-304.

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.

Peng, Y., Bao, Y., Chen, Y., Wu, C., & Guo, C. (2018). Optimus: An efficient dynamic resource management system for deep learning clusters. Proceedings of EuroSys.

Qiao, A., Lee, B., Chandrashekar, P., Zhao, Y., Zhang, W., Li, X., Chen, M., Zhang, S., Mars, J., & Tang, L. (2021). Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning. Proceedings of OSDI, 1-18.

Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., & Kozuch, M. A. (2012). Heterogeneity and dynamicity of clouds at scale: Google trace analysis. Proceedings of SoCC.

Rendle, S., Freudenthaler, C., Gantner, Z., & Schmidt-Thieme, L. (2009). BPR: Bayesian personalized ranking from implicit feedback. Proceedings of UAI, 452-461.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you? Explaining the predictions of any classifier. Proceedings of KDD, 1135-1144.

Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., & Wilkes, J. (2015). Large-scale cluster management at Google with Borg. Proceedings of EuroSys.

Weng, Q., Xiao, W., Yu, Y., Wang, W., Wang, C., He, J., Li, Y., Zhang, L., Lin, W., & Ding, Y. (2022). MLaaS in the wild: Workload analysis and scheduling in large-scale heterogeneous GPU clusters. Proceedings of NSDI, 945-960.

Weng, Q., Yang, L., Yu, Y., Wang, W., Tang, X., Yang, G., & Zhang, L. (2023). Beware of fragmentation: Scheduling GPU-sharing workloads with fragmentation gradient descent. Proceedings of USENIX ATC, 995-1008.

Xiao, W., Bhardwaj, R., Ramjee, R., Sivathanu, M., Kwatra, N., Han, Z., Patel, P., Peng, X., Zhao, H., Zhang, Q., Yang, F., & Zhou, L. (2018). Gandiva: Introspective cluster scheduling for deep learning. Proceedings of OSDI, 595-610.

Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. Proceedings of ICDM, 435-442.