Transfer Learning Approach for Sentiment Analysis in Low-Resource Austronesian Languages Using Multilingual BERT
DOI:
https://doi.org/10.51903/jtie.v4i1.276Keywords:
Sentiment Analysis, Austronesian Languages, Cross-Lingual Transfer, Data Augmentation. , Multilingual BERTAbstract
Sentiment analysis for low-resource languages, particularly Austronesian languages, remains challenging due to the limited availability of annotated datasets. Traditional approaches often struggle to achieve high accuracy, necessitating strategies like cross-lingual transfer and data augmentation. While multilingual models such as mBERT offer promising results, their performance heavily depends on fine-tuning techniques. This study aims to improve sentiment analysis for Austronesian languages by fine-tuning mBERT with augmented training data. The proposed method leverages cross-lingual transfer learning to enhance model robustness, addressing the scarcity of labeled data. Experiments were conducted using a dataset enriched with augmentation techniques such as back-translation and synonym replacement. The fine-tuned mBERT model achieved an accuracy of 92%, outperforming XLM-RoBERTa at 91.41%, while mT5 obtained the highest accuracy at 99.61%. Improvements in precision, recall, and F1-score further validated the model’s effectiveness in capturing subtle sentiment variations. These findings demonstrate that combining data augmentation and cross-lingual strategies significantly enhances sentiment classification for underrepresented languages. This study contributes to the development of scalable Natural Language Processing (NLP) models for Austronesian languages. Future research should explore larger and more diverse datasets, optimize real-time implementations, and extend the approach to tasks such as Named Entity Recognition (NER) and machine translation. The promising results underscore the importance of integrating robust transfer learning techniques with comprehensive data augmentation to overcome challenges in resource-limited NLP scenarios
References
Aruna Gladys, A., & Vetriselvi, V. (2024). Sentiment Analysis on A Low-Resource Language Dataset Using Multimodal Representation Learning and Cross-Lingual Transfer Learning. Applied Soft Computing, 157, 111553. https://doi.org/10.1016/j.asoc.2024.111553
Bayer, M., Kaufhold, M. A., Buchhold, B., Keller, M., Dallmeyer, J., & Reuter, C. (2023). Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers. International Journal of Machine Learning and Cybernetics, 14(1), 135–150. https://doi.org/10.1007/s13042-022-01553-3
Chen, D., Zhang, X., & Zhang, S. (2023). Narrowing the Language Gap: Domain Adaptation Guided Cross-Lingual Passage Re-Ranking. Neural Computing and Applications, 35(28), 20735–20748. https://doi.org/10.1007/s00521-023-08803-7
Chen, Q., Yamaguchi, S., & Yamamoto, Y. (2025). LLM Abuse Prevention Tool Using GCG Jailbreak Attack Detection and DistilBERT-Based Ethics Judgment. Information, 16(3), 204. https://doi.org/10.3390/info16030204
Dang, X., Wang, L., Dong, X., Li, F., & Deng, H. (2023). Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter. Applied Sciences, 13(19), 10759. https://doi.org/10.3390/app131910759
De Arriba-Pérez, F., García-Méndez, S., Costa-Montenegro, E., Guo, X., Mohd Adnan, H., & Zaiamri Zainal Abidin, M. (2024). Detecting Offensive Language on Malay Social Media: A Zero-Shot, Cross-Language Transfer Approach Using Dual-Branch mBERT. Applied Sciences, 14(13), 5777. https://doi.org/10.3390/app14135777
El-Alami, F. zahra, Ouatik El Alaoui, S., & En Nahnahi, N. (2022). A Multilingual Offensive Language Detection Method Based on Transfer Learning from Transformer Fine-Tuning Model. Journal of King Saud University - Computer and Information Sciences, 34(8), 6048–6056. https://doi.org/10.1016/j.jksuci.2021.07.013
Feng, S. J. H., Lai, E. M. K., & Li, W. (2024). Geometry of Textual Data Augmentation: Insights from Large Language Models. Electronics, 13(18), 3781. https://doi.org/10.3390/electronics13183781
Gardazi, N. M., Daud, A., Malik, M. K., Bukhari, A., Alsahfi, T., & Alshemaimri, B. (2025). BERT Applications in Natural Language Processing: A Review. Artificial Intelligence Review, 58(6), 1–49. https://doi.org/10.1007/s10462-025-11162-5
Habbat, N., & Nouri, H. (2024). Unlocking Travel Narratives: A Fusion of Stacking Ensemble Deep Learning and Neural Topic Modeling for Enhanced Tourism Comment Analysis. Social Network Analysis and Mining, 14(1), 1–24. https://doi.org/10.1007/s13278-024-01256-3
Harris, S., Hadi, H. J., Ahmad, N., & Alshara, M. A. (2024). Fake News Detection Revisited: An Extensive Review of Theoretical Frameworks, Dataset Assessments, Model Constraints, and Forward-Looking Research Agendas. Technologies, 12(11), 222. https://doi.org/10.3390/technologies12110222
Hashmi, E., Yayilgan, S. Y., & Shaikh, S. (2024). Augmenting Sentiment Prediction Capabilities for Code-Mixed Tweets with Multilingual Transformers. Social Network Analysis and Mining, 14(1), 1–15. https://doi.org/10.1007/s13278-024-01245-6
Khan, W., Daud, A., Khan, K., Muhammad, S., & Haq, R. (2023). Exploring the Frontiers of Deep Learning and Natural Language Processing: A Comprehensive Overview of Key Challenges and Emerging Trends. Natural Language Processing Journal, 4, 100026. https://doi.org/10.1016/j.nlp.2023.100026
Li, B., Hou, Y., & Che, W. (2022). Data Augmentation Approaches in Natural Language Processing: A Survey. AI Open, 3, 71–90. https://doi.org/10.1016/j.aiopen.2022.03.001
Liu, X., He, J., Liu, M., Yin, Z., Yin, L., & Zheng, W. (2023). A Scenario-Generic Neural Machine Translation Data Augmentation Method. Electronics, 12(10), 1–15. https://doi.org/10.3390/electronics12102320
Madukwe, K. J., Gao, X., & Xue, B. (2022). Token Replacement-Based Data Augmentation Methods for Hate Speech Detection. World Wide Web, 25(3), 1129–1150. https://doi.org/10.1007/s11280-022-01025-2
Maharana, K., Mondal, S., & Nemade, B. (2022). A Review: Data Pre-processing and Data Augmentation Techniques. Global Transitions Proceedings, 3(1), 91–99. https://doi.org/10.1016/j.gltp.2022.04.020
Malik, M. S. I., Nazarova, A., Jamjoom, M. M., & Ignatov, D. I. (2023). Multilingual Hope Speech Detection: A Robust Framework Using Transfer Learning of Fine-Tuning Roberta Model. Journal of King Saud University - Computer and Information Sciences, 35(8), 101736. https://doi.org/10.1016/j.jksuci.2023.101736
Muhammad, K. Bin, & Burney, S. M. A. (2023). Innovations in Urdu Sentiment Analysis Using Machine and Deep Learning Techniques for Two-Class Classification of Symmetric Datasets. Symmetry, 15(5), 1027. https://doi.org/10.3390/sym15051027
Nair, A. R., Singh, R. P., Gupta, D., & Kumar, P. (2024). Evaluating the Impact of Text Data Augmentation on Text Classification Tasks Using DistilBERT. Procedia Computer Science, 235(2023), 102–111. https://doi.org/10.1016/j.procs.2024.04.013
Pakray, P., Gelbukh, A., & Bandyopadhyay, S. (2025). Natural Language Processing Applications for Low-Resource Languages. Natural Language Processing, 31(2), 183–197. https://doi.org/10.1017/nlp.2024.33
Prottasha, N. J., Sami, A. A., Kowsher, M., Murad, S. A., Bairagi, A. K., Masud, M., & Baz, M. (2022). Transfer Learning for Sentiment Analysis Using BERT Based Supervised Fine-Tuning. Sensors, 22(11), 1–19. https://doi.org/10.3390/s22114157
Shaikh, S., Yayilgan, S. Y., Abomhara, M., & Zoto, E. (2025). Multilingual User Perceptions Analysis from Twitter Using Zero Shot Learning for Border Control Technologies. Social Network Analysis and Mining, 15(1), 1–24. https://doi.org/10.1007/s13278-025-01434-x
Sufi, F. (2024). Generative Pre-Trained Transformer (GPT) in Research: A Systematic Review on Data Augmentation. Information (Switzerland), 15(2), 99. https://doi.org/10.3390/info15020099
Wang, C. K. ; ;, Chen, Y. ;, Cheng, Y. ;, Huang, Y. ;, Dai, H.-N. ;, Kabir, H. M. D., Shaughnessy, O. ’, Kumar Mondal, S., Wang, C., Chen, Y., Cheng, Y., Huang, Y., Dai, H.-N., & Dipu Kabir, H. M. (2024). Enhancement of English-Bengali Machine Translation Leveraging Back-Translation. Applied Sciences, 14(15), 6848. https://doi.org/10.3390/app14156848
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Technology Informatics and Engineering

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).
This license allows others to copy, distribute, display, and perform the work, and derivative works based upon it, for both commercial and non-commercial purposes, as long as they credit the original author(s) and license their new creations under identical terms.
Licensed under CC BY-SA 4.0: https://creativecommons.org/licenses/by-sa/4.0/