Efficient Temporal Segmentation And Classification Of Short-Form Video Content Using Lightweight CNN-LSTM Architecture
DOI:
https://doi.org/10.51903/jtie.v5i1.441Keywords:
Lightweight deep learning, Temporal segmentation, Short-form video classification, CNN-LSTM, Multimedia content analysisAbstract
The exponential rise of short-form video platforms such as TikTok, Instagram Reels, and YouTube Shorts has transformed digital content consumption patterns, creating both opportunities and challenges in media analysis. One critical need is the efficient segmentation and classification of temporal segments within these videos to enable applications in content moderation, targeted advertising, and audience behavior research. This study proposes a lightweight deep learning architecture that integrates Convolutional Neural Networks (CNN) for visual feature extraction and Long Short-Term Memory (LSTM) networks for temporal sequence modeling. The proposed CNN-LSTM framework is optimized for computational efficiency while maintaining high classification accuracy, making it suitable for deployment in resource-constrained environments. Experimental evaluations on a curated short-form video dataset show that the model achieves competitive performance compared with larger architectures, with significant reductions in memory usage and inference time. Furthermore, the temporal segmentation module effectively isolates meaningful visual-audio segments, enabling more precise classification outcomes. The results highlight the potential of lightweight architectures to address the scalability demands of modern video analysis systems without sacrificing accuracy. This research contributes to the growing discourse on efficient multimedia processing by bridging the gap between high-performance models and practical, real-time applications in the evolving short-form video ecosystem.
References
Athar, A., Mahadevan, S., Ošep, A., Leal Taixé, L., & Leibe, B. (2020). STEm Seg: Spatio Temporal Embeddings for Instance Segmentation in Videos. In Proceedings of the European Conference on Computer Vision (ECCV 2020), 158–177. https://doi.org/10.1007/978-3-030-58621-8_10
Bahroun, Z., Anane, C., Ahmed, V., & Zacca, A. (2023). Transforming Education: A Comprehensive Review of Generative Artificial Intelligence in Educational Settings through Bibliometric and Content Analysis. Sustainability, 15(17), 12983. https://doi.org/10.3390/su151712983
Chen, M.-H., Li, B., Bao, Y., Alregib, G., & Kira, Z. (2020). Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation. https://github.com/cmhungsteve/SSTDA
Ding, G., Sener, F., & Yao, A. (2023). Temporal Action Segmentation: An Analysis of Modern Techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(2), 1011–1030. https://doi.org/10.1109/tpami.2023.3327284
Doriguzzi-Corin, R., Millar, S., Scott-Hayward, S., Martinez-Del-Rincon, J., & Siracusa, D. (2020). Lucid: A Practical, Lightweight Deep Learning Solution for DDoS Attack Detection. IEEE Transactions on Network and Service Management, 17(2), 876–889. https://doi.org/10.1109/tnsm.2020.2971776
Dzhoha, A., Mirylenka, K., Malykh, E., Buchmann, M.-A., & Catino, F. (2025). Short-Form Video Recommendations with Multimodal Embeddings: Addressing Cold-Start and Bias Challenges. arXiv preprint arXiv:2507.19346. http://arxiv.org/abs/2507.19346
Elmaz, F., Eyckerman, R., Casteels, W., Latré, S., & Hellinckx, P. (2021). CNN-LSTM Architecture for Predictive Indoor Temperature Modeling. Building and Environment, 206, 108327. https://doi.org/10.1016/j.buildenv.2021.108327
Grammatikopoulou, M., Sanchez-Matilla, R., Bragman, F., Owen, D., Culshaw, L., Kerr, K., Stoyanov, D., & Luengo, I. (2023). A Spatio-Temporal Network for Video Semantic Segmentation in Surgical Videos. arXiv preprint arXiv:2306.11052. http://arxiv.org/abs/2306.11052
Hariguna, T., Li, M., Sadat, A. M., Zhang, W., & Wang, H. (2022). Privacy Concerns Toward Short-Form Video Platforms: Scale Development and Validation. Frontiers in Psychology, 13, 954964. https://doi.org/10.3389/fpsyg.2022.954964
Huang, Y., Sugano, Y., & Sato, Y. (2020). Improving Action Segmentation via Graph-Based Temporal Reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14024–14034. https://doi.org/10.1109/cvpr42600.2020.01404
Kleinheksel, A. J., Rockich-Winston, N., Tawfik, H., & Wyatt, T. R. (2020). Demystifying content analysis. American Journal of Pharmaceutical Education, 84(1), 127–137. https://doi.org/10.5688/ajpe7113
Li, Z., Farha, Y. A., & Gall, J. (2021). Temporal Action Segmentation from Timestamp Supervision. https://github.com/ZheLi2020/
Liu, H.-I., Galindo, M., Xie, H., Wong, L.-K., Shuai, H.-H., Li, Y.-H., & Cheng, W.-H. (2024). Lightweight Deep Learning for Resource-Constrained Environments: A Survey. http://arxiv.org/abs/2404.07236
Lu, W., Li, J., Li, Y., Sun, A., & Wang, J. (2020). A CNN LSTM Based Model to Forecast Stock Prices. Complexity, 2020(1), 1–10. https://doi.org/10.1155/2020/662292
Matuan, H., Dude, E., Mallo, A., Yowey, H., Patey, Y. S., & Sutejo, H. (2026). Application of the K-Means Method for Grouping Product Data Based on Sales Level. Jurnal Ilmiah Sistem Informasi, 5(1), 292–305. https://doi.org/10.51903/53pfrd78
Mittal, P. (2024). A Comprehensive Survey of Deep Learning-Based Lightweight Object Detection Models for Edge Devices. Artificial Intelligence Review, 57(9), 242. https://doi.org/10.1007/s10462-024-10877-1
Montefalcon, M. D., Padilla, J. R., Paulino, J., Go, J., Llabanes Rodriguez, R., & Imperial, J. M. (2021). Understanding Facial Expression Expressing Hate from Online Short-form Videos. ACM International Conference Proceeding Series, 201–207. https://doi.org/10.1145/3485768.3485785
Narin, N. G. (2021). A Content Analysis of the Metaverse Articles. Journal of Metaverse, 1(1), 17–24. http://dergipark.org.tr/en/pub/jmv/issue/67581/1051382
O’Hagan, E. T., Traeger, A. C., Bunzli, S., Leake, H. B., Schabrun, S. M., Wand, B. M., O’Neill, S., Harris, I. A., & McAuley, J. H. (2021). What Do People Post on Social Media Relative to Low Back Pain? A Content Analysis of Australian Data. Musculoskeletal Science and Practice, 54, 102402. https://doi.org/10.1016/j.msksp.2021.102402
Oyetunji, T. P., Arafat, S. M. Y., Famori, S. O., Akinboyewa, T. B., Afolami, M., Ajayi, M. F., & Kar, S. K. (2021). Suicide in Nigeria: Observations from the content analysis of newspapers. General Psychiatry, 34(1), e100347. https://doi.org/10.1136/gpsych-2020-100347
Pasquarella, V. J., Arévalo, P., Bratley, K. H., Bullock, E. L., Gorelick, N., Yang, Z., & Kennedy, R. E. (2022). Demystifying LandTrendr and CCDC Temporal Segmentation. International Journal of Applied Earth Observation and Geoinformation, 110, 102806. https://doi.org/10.1016/j.jag.2022.102806
Raharjo, B., Rudjiono, & Fitrianto, Y. (2024). Prediction and Detection of Scam Threats on Digital Platforms for Indonesian Users Using Machine Learning Models. Journal of Technology Informatics and Engineering, 3(3), 350–369. https://doi.org/10.51903/jtie.v3i3.208
Rostamian, A., & O’Hara, J. G. (2022). Event Prediction Within Directional Change Framework Using a CNN LSTM Model. Neural Computing and Applications, 34(20), 17193–17205. https://doi.org/10.1007/s00521-022-07687-3
Rufai, S. R., & Bunce, C. (2020). World leaders’ Usage of Twitter in response To the COVID-19 Pandemic: A Content Analysis. Journal of Public Health (United Kingdom), 42(3), 510–516. https://doi.org/10.1093/pubmed/fdaa049
Shuvo, S. B., Ali, S. N., Swapnil, S. I., Al-Rakhami, M. S., & Gumaei, A. (2021). CardioXNet: A Novel Lightweight Deep Learning Framework for Cardiovascular Disease Classification Using Heart Sound Recordings. IEEE Access, 9, 36955–36967. https://doi.org/10.1109/access.2021.3063129
Singhania, D., Rahaman, R., & Yao, A. (2023). C2F-TCN: A Framework for Semi- and Fully-Supervised Temporal Action Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10), 11484–11501. https://doi.org/10.1109/tpami.2023.3284080
Susilo, B. W., & Susanto, E. (2024). Employing Artificial Intelligence in Management Information Systems to Improve Business Efficiency. Journal of Management and Informatics, 3(2), 212–229. https://doi.org/10.51903/jmi.v3i2.30
Ullah, K., Ahsan, M., Hasanat, S. M., Haris, M., Yousaf, H., Raza, S. F., Tandon, R., Abid, S., & Ullah, Z. (2024). Short-Term Load Forecasting: A Comprehensive Review and Simulation Study with CNN-LSTM Hybrids Approach. IEEE Access, 12, 111858–111881. https://doi.org/10.1109/access.2024.3440631
Wang, Y., Yang, J., Liu, M., & Gui, G. (2020). LightAMC: Lightweight Automatic Modulation Classification via Deep Learning and Compressive Sensing. IEEE Transactions on Vehicular Technology, 69(3), 3491–3495. https://doi.org/10.1109/tvt.2020.2971001
Wei, L., Ding, K., & Hu, H. (2020). Automatic Skin Cancer Detection in Dermoscopy Images Based on Ensemble Lightweight Deep Learning Network. IEEE Access, 8, 99633–99647. https://doi.org/10.1109/access.2020.2997710
Zhang, C., Zheng, H., & Wang, Q. (2022). Driving Factors and Moderating Effects Behind Citizen Engagement With Mobile Short-Form Videos. IEEE Access, 10, 40999–41009. https://doi.org/10.1109/access.2022.3167687
Zhao, Y., Yin, Y., & Gui, G. (2020). Lightweight Deep Learning Based Intelligent Edge Surveillance Techniques. IEEE Transactions on Cognitive Communications and Networking, 6(4), 1146–1154. https://doi.org/10.1109/tccn.2020.2999479
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Ben Liu Tan, Chstina Angel Liem, Mohamed Amen

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

