Early Warning, Grade Prediction, and Teacher-Facing LLM-Ready Explanations Toward an Open Volleyball Course: Reproducible Evidence from Four Public Education Datasets

Authors

  • Jubin Zhang Department of Physical Education, North China Institute of Aerospace Engineering, Langfang 065000, China

DOI:

https://doi.org/10.51903/jtie.v5i2.525

Keywords:

earning analytics, early warning, educational data mining, student dropout, grade prediction

Abstract

Open online courses and public skill-development programs often experience learner dropout not due to content limitations, but because instructors receive delayed and non-actionable feedback. This study proposes and empirically evaluates an integrated framework for an open volleyball course that combines early warning prediction, grade estimation, and teacher-oriented LLM-generated academic status explanations. The predictive models were tested on four lightweight educational datasets: xAPI-Edu-Data, Predict Students’ Dropout and Academic Success, Student Performance, and Higher Education Students Performance Evaluation. A unified preprocessing pipeline was applied using one-hot encoding, an 80/20 train-test split, and 5-fold cross-validation. Decision Tree, Random Forest, and XGBoost models were evaluated for classification, alongside their regression variants for grade prediction. Results show consistent performance across datasets. Random Forest achieved the best macro-F1 on xAPI-Edu-Data (0.799) with a macro-AUC of 0.914, while XGBoost performed best on the dropout dataset (macro-F1 = 0.689, macro-AUC = 0.892). For Student Performance, early-warning models without prior grades reached an RMSE of 3.086, improving to 1.398 when full information was available. On the higher education dataset, performance remained limited due to small sample size and multi-grade targets, with Random Forest achieving a macro-F1 of 0.248. Ablation results confirmed that behavioral and progression features significantly improve predictive accuracy. An explanation layer translated model outputs into structured, teacher-facing natural language linked to key risk indicators and intervention cues. Overall, the framework demonstrates analytic feasibility for structured volleyball course monitoring, though results should be interpreted as pre-deployment evidence rather than validation in real instructional settings. Explanation quality improves when grounded in observed behavioral signals rather than generic generation.

References

Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: Literature review and best practices. International Journal of Educational Technology in Higher Education, 17, 3. https://doi.org/10.1186/s41239-020-0177-7

Amrieh, E. A., Hamtini, T., & Aljarah, I. (2016). Mining educational data to predict student’s academic performance using ensemble methods. International Journal of Database Theory and Application, 9(8), 119-136.

Arnold, K. E., & Pistilli, M. D. (2012). Course signals at Purdue: Using learning analytics to increase student success. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (pp. 267-270). ACM. https://doi.org/10.1145/2330601.2330666

Baker, R. S., & Inventado, P. S. (2014). Educational data mining and learning analytics. In J. A. Larusson & B. White (Eds.), Learning analytics: From research to practice (pp. 61-75). Springer. https://doi.org/10.1007/978-1-4614-3305-7_4

Bean, J. P. (1980). Dropouts and turnover: The synthesis and test of a causal model of student attrition. Research in Higher Education, 12(2), 155-187. https://doi.org/10.1007/BF00976194

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM. https://doi.org/10.1145/2939672.2939785

Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. In Proceedings of 5th Annual Future Business Technology Conference (pp. 5-12).

Ferguson, R. (2012). Learning analytics: Drivers, developments and challenges. International Journal of Technology Enhanced Learning, 4(5-6), 304-317. https://doi.org/10.1504/IJTEL.2012.051816

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451

Gašević, D., Dawson, S., & Siemens, G. (2015). Let’s not forget: Learning analytics are about learning. TechTrends, 59(1), 64-71. https://doi.org/10.1007/s11528-014-0822-x

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, F., Pfeiffer, F., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., Stadler, M., Weller, J., Kuhn, J., & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274

Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13(4), 410. https://doi.org/10.3390/educsci13040410

Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30 (pp. 4765-4774).

Macfadyen, L. P., & Dawson, S. (2010). Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers & Education, 54(2), 588-599. https://doi.org/10.1016/j.compedu.2009.09.008

Martins, M. V., Tolledo, D., Machado, J., Baptista, L. M. T., & Realinho, V. (2021). Early prediction of student’s performance in higher education: A case study. In Á. Rocha, H. Adeli, L. P. Reis, & S. Costanzo (Eds.), Trends and applications in information systems and technologies (Vol. 1368, pp. 166-175). Springer. https://doi.org/10.1007/978-3-030-72660-7_16

Papamitsiou, Z., & Economides, A. A. (2014). Learning analytics and educational data mining in practice: A systematic literature review of empirical evidence. Educational Technology & Society, 17(4), 49-64.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM. https://doi.org/10.1145/2939672.2939778

Siemens, G., & Long, P. (2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE Review, 46(5), 30-40.

Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of Educational Research, 45(1), 89-125. https://doi.org/10.3102/00346543045001089

Viberg, O., Hatakka, M., Bälter, O., & Mavroudi, A. (2018). The current landscape of learning analytics in higher education. Computers in Human Behavior, 89, 98-110. https://doi.org/10.1016/j.chb.2018.07.027

Weidlich, J., Gašević, D., & Drachsler, H. (2022). Causal inference and bias in learning analytics: A primer on pitfalls using directed acyclic graphs. Journal of Learning Analytics, 9(3), 183-199. https://doi.org/10.18608/jla.2022.7577

Yılmaz, N., & Şekeroğlu, B. (2020). Student performance classification using artificial intelligence techniques. In R. A. Aliev, J. Kacprzyk, W. Pedrycz, M. Jamshidi, M. Babanli, & F. Sadikoglu (Eds.), 10th International Conference on Theory and Application of Soft Computing, Computing with Words and Perceptions (ICSCCW 2019) (Advances in Intelligent Systems and Computing, Vol. 1095). Springer.

Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education: Where are the educators? International Journal of Educational Technology in Higher Education, 16, 39. https://doi.org/10.1186/s41239-019-0171-0

Downloads

Published

2026-06-17

How to Cite

Early Warning, Grade Prediction, and Teacher-Facing LLM-Ready Explanations Toward an Open Volleyball Course: Reproducible Evidence from Four Public Education Datasets. (2026). Journal of Technology Informatics and Engineering, 5(2), 20-44. https://doi.org/10.51903/jtie.v5i2.525