Early Warning, Grade Prediction, and Teacher-Facing LLM-Ready Explanations Toward an Open Volleyball Course: Reproducible Evidence from Four Public Education Datasets
DOI:
https://doi.org/10.51903/jtie.v5i2.525Keywords:
earning analytics, early warning, educational data mining, student dropout, grade predictionAbstract
Open online courses and public skill-development programs often experience learner dropout not due to content limitations, but because instructors receive delayed and non-actionable feedback. This study proposes and empirically evaluates an integrated framework for an open volleyball course that combines early warning prediction, grade estimation, and teacher-oriented LLM-generated academic status explanations. The predictive models were tested on four lightweight educational datasets: xAPI-Edu-Data, Predict Students’ Dropout and Academic Success, Student Performance, and Higher Education Students Performance Evaluation. A unified preprocessing pipeline was applied using one-hot encoding, an 80/20 train-test split, and 5-fold cross-validation. Decision Tree, Random Forest, and XGBoost models were evaluated for classification, alongside their regression variants for grade prediction. Results show consistent performance across datasets. Random Forest achieved the best macro-F1 on xAPI-Edu-Data (0.799) with a macro-AUC of 0.914, while XGBoost performed best on the dropout dataset (macro-F1 = 0.689, macro-AUC = 0.892). For Student Performance, early-warning models without prior grades reached an RMSE of 3.086, improving to 1.398 when full information was available. On the higher education dataset, performance remained limited due to small sample size and multi-grade targets, with Random Forest achieving a macro-F1 of 0.248. Ablation results confirmed that behavioral and progression features significantly improve predictive accuracy. An explanation layer translated model outputs into structured, teacher-facing natural language linked to key risk indicators and intervention cues. Overall, the framework demonstrates analytic feasibility for structured volleyball course monitoring, though results should be interpreted as pre-deployment evidence rather than validation in real instructional settings. Explanation quality improves when grounded in observed behavioral signals rather than generic generation.
References
Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: Literature review and best practices. International Journal of Educational Technology in Higher Education, 17, 3. https://doi.org/10.1186/s41239-020-0177-7
Amrieh, E. A., Hamtini, T., & Aljarah, I. (2016). Mining educational data to predict student’s academic performance using ensemble methods. International Journal of Database Theory and Application, 9(8), 119-136.
Arnold, K. E., & Pistilli, M. D. (2012). Course signals at Purdue: Using learning analytics to increase student success. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (pp. 267-270). ACM. https://doi.org/10.1145/2330601.2330666
Baker, R. S., & Inventado, P. S. (2014). Educational data mining and learning analytics. In J. A. Larusson & B. White (Eds.), Learning analytics: From research to practice (pp. 61-75). Springer. https://doi.org/10.1007/978-1-4614-3305-7_4
Bean, J. P. (1980). Dropouts and turnover: The synthesis and test of a causal model of student attrition. Research in Higher Education, 12(2), 155-187. https://doi.org/10.1007/BF00976194
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM. https://doi.org/10.1145/2939672.2939785
Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. In Proceedings of 5th Annual Future Business Technology Conference (pp. 5-12).
Ferguson, R. (2012). Learning analytics: Drivers, developments and challenges. International Journal of Technology Enhanced Learning, 4(5-6), 304-317. https://doi.org/10.1504/IJTEL.2012.051816
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451
Gašević, D., Dawson, S., & Siemens, G. (2015). Let’s not forget: Learning analytics are about learning. TechTrends, 59(1), 64-71. https://doi.org/10.1007/s11528-014-0822-x
Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, F., Pfeiffer, F., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., Stadler, M., Weller, J., Kuhn, J., & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274
Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13(4), 410. https://doi.org/10.3390/educsci13040410
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30 (pp. 4765-4774).
Macfadyen, L. P., & Dawson, S. (2010). Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers & Education, 54(2), 588-599. https://doi.org/10.1016/j.compedu.2009.09.008
Martins, M. V., Tolledo, D., Machado, J., Baptista, L. M. T., & Realinho, V. (2021). Early prediction of student’s performance in higher education: A case study. In Á. Rocha, H. Adeli, L. P. Reis, & S. Costanzo (Eds.), Trends and applications in information systems and technologies (Vol. 1368, pp. 166-175). Springer. https://doi.org/10.1007/978-3-030-72660-7_16
Papamitsiou, Z., & Economides, A. A. (2014). Learning analytics and educational data mining in practice: A systematic literature review of empirical evidence. Educational Technology & Society, 17(4), 49-64.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM. https://doi.org/10.1145/2939672.2939778
Siemens, G., & Long, P. (2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE Review, 46(5), 30-40.
Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of Educational Research, 45(1), 89-125. https://doi.org/10.3102/00346543045001089
Viberg, O., Hatakka, M., Bälter, O., & Mavroudi, A. (2018). The current landscape of learning analytics in higher education. Computers in Human Behavior, 89, 98-110. https://doi.org/10.1016/j.chb.2018.07.027
Weidlich, J., Gašević, D., & Drachsler, H. (2022). Causal inference and bias in learning analytics: A primer on pitfalls using directed acyclic graphs. Journal of Learning Analytics, 9(3), 183-199. https://doi.org/10.18608/jla.2022.7577
Yılmaz, N., & Şekeroğlu, B. (2020). Student performance classification using artificial intelligence techniques. In R. A. Aliev, J. Kacprzyk, W. Pedrycz, M. Jamshidi, M. Babanli, & F. Sadikoglu (Eds.), 10th International Conference on Theory and Application of Soft Computing, Computing with Words and Perceptions (ICSCCW 2019) (Advances in Intelligent Systems and Computing, Vol. 1095). Springer.
Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education: Where are the educators? International Journal of Educational Technology in Higher Education, 16, 39. https://doi.org/10.1186/s41239-019-0171-0
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Jubin Zhang

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

