Transfer Learning Approach for Sentiment Analysis in Low-Resource Austronesian Languages Using Multilingual BERT

Authors

  • Li Wen Hao Monash University, Malaysia campus, Jalan Lagoon Selatan, 47500 Subang Jaya, Malaysia
  • Robert Kuan Liu Monash University, Malaysia campus, Jalan Lagoon Selatan, 47500 Subang Jaya, Malaysia

DOI:

https://doi.org/10.51903/jtie.v4i1.276

Keywords:

Sentiment Analysis, Austronesian Languages, Cross-Lingual Transfer, Data Augmentation. , Multilingual BERT

Abstract

Sentiment analysis for low-resource languages, particularly Austronesian languages, remains challenging due to the limited availability of annotated datasets. Traditional approaches often struggle to achieve high accuracy, necessitating strategies like cross-lingual transfer and data augmentation. While multilingual models such as mBERT offer promising results, their performance heavily depends on fine-tuning techniques. This study aims to improve sentiment analysis for Austronesian languages by fine-tuning mBERT with augmented training data. The proposed method leverages cross-lingual transfer learning to enhance model robustness, addressing the scarcity of labeled data. Experiments were conducted using a dataset enriched with augmentation techniques such as back-translation and synonym replacement. The fine-tuned mBERT model achieved an accuracy of 92%, outperforming XLM-RoBERTa at 91.41%, while mT5 obtained the highest accuracy at 99.61%. Improvements in precision, recall, and F1-score further validated the model’s effectiveness in capturing subtle sentiment variations. These findings demonstrate that combining data augmentation and cross-lingual strategies significantly enhances sentiment classification for underrepresented languages. This study contributes to the development of scalable Natural Language Processing (NLP) models for Austronesian languages. Future research should explore larger and more diverse datasets, optimize real-time implementations, and extend the approach to tasks such as Named Entity Recognition (NER) and machine translation. The promising results underscore the importance of integrating robust transfer learning techniques with comprehensive data augmentation to overcome challenges in resource-limited NLP scenarios

References

Aruna Gladys, A., & Vetriselvi, V. (2024). Sentiment Analysis on A Low-Resource Language Dataset Using Multimodal Representation Learning and Cross-Lingual Transfer Learning. Applied Soft Computing, 157, 111553. https://doi.org/10.1016/j.asoc.2024.111553

Bayer, M., Kaufhold, M. A., Buchhold, B., Keller, M., Dallmeyer, J., & Reuter, C. (2023). Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers. International Journal of Machine Learning and Cybernetics, 14(1), 135–150. https://doi.org/10.1007/s13042-022-01553-3

Chen, D., Zhang, X., & Zhang, S. (2023). Narrowing the Language Gap: Domain Adaptation Guided Cross-Lingual Passage Re-Ranking. Neural Computing and Applications, 35(28), 20735–20748. https://doi.org/10.1007/s00521-023-08803-7

Chen, Q., Yamaguchi, S., & Yamamoto, Y. (2025). LLM Abuse Prevention Tool Using GCG Jailbreak Attack Detection and DistilBERT-Based Ethics Judgment. Information, 16(3), 204. https://doi.org/10.3390/info16030204

Dang, X., Wang, L., Dong, X., Li, F., & Deng, H. (2023). Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter. Applied Sciences, 13(19), 10759. https://doi.org/10.3390/app131910759

De Arriba-Pérez, F., García-Méndez, S., Costa-Montenegro, E., Guo, X., Mohd Adnan, H., & Zaiamri Zainal Abidin, M. (2024). Detecting Offensive Language on Malay Social Media: A Zero-Shot, Cross-Language Transfer Approach Using Dual-Branch mBERT. Applied Sciences, 14(13), 5777. https://doi.org/10.3390/app14135777

El-Alami, F. zahra, Ouatik El Alaoui, S., & En Nahnahi, N. (2022). A Multilingual Offensive Language Detection Method Based on Transfer Learning from Transformer Fine-Tuning Model. Journal of King Saud University - Computer and Information Sciences, 34(8), 6048–6056. https://doi.org/10.1016/j.jksuci.2021.07.013

Feng, S. J. H., Lai, E. M. K., & Li, W. (2024). Geometry of Textual Data Augmentation: Insights from Large Language Models. Electronics, 13(18), 3781. https://doi.org/10.3390/electronics13183781

Gardazi, N. M., Daud, A., Malik, M. K., Bukhari, A., Alsahfi, T., & Alshemaimri, B. (2025). BERT Applications in Natural Language Processing: A Review. Artificial Intelligence Review, 58(6), 1–49. https://doi.org/10.1007/s10462-025-11162-5

Habbat, N., & Nouri, H. (2024). Unlocking Travel Narratives: A Fusion of Stacking Ensemble Deep Learning and Neural Topic Modeling for Enhanced Tourism Comment Analysis. Social Network Analysis and Mining, 14(1), 1–24. https://doi.org/10.1007/s13278-024-01256-3

Harris, S., Hadi, H. J., Ahmad, N., & Alshara, M. A. (2024). Fake News Detection Revisited: An Extensive Review of Theoretical Frameworks, Dataset Assessments, Model Constraints, and Forward-Looking Research Agendas. Technologies, 12(11), 222. https://doi.org/10.3390/technologies12110222

Hashmi, E., Yayilgan, S. Y., & Shaikh, S. (2024). Augmenting Sentiment Prediction Capabilities for Code-Mixed Tweets with Multilingual Transformers. Social Network Analysis and Mining, 14(1), 1–15. https://doi.org/10.1007/s13278-024-01245-6

Khan, W., Daud, A., Khan, K., Muhammad, S., & Haq, R. (2023). Exploring the Frontiers of Deep Learning and Natural Language Processing: A Comprehensive Overview of Key Challenges and Emerging Trends. Natural Language Processing Journal, 4, 100026. https://doi.org/10.1016/j.nlp.2023.100026

Li, B., Hou, Y., & Che, W. (2022). Data Augmentation Approaches in Natural Language Processing: A Survey. AI Open, 3, 71–90. https://doi.org/10.1016/j.aiopen.2022.03.001

Liu, X., He, J., Liu, M., Yin, Z., Yin, L., & Zheng, W. (2023). A Scenario-Generic Neural Machine Translation Data Augmentation Method. Electronics, 12(10), 1–15. https://doi.org/10.3390/electronics12102320

Madukwe, K. J., Gao, X., & Xue, B. (2022). Token Replacement-Based Data Augmentation Methods for Hate Speech Detection. World Wide Web, 25(3), 1129–1150. https://doi.org/10.1007/s11280-022-01025-2

Maharana, K., Mondal, S., & Nemade, B. (2022). A Review: Data Pre-processing and Data Augmentation Techniques. Global Transitions Proceedings, 3(1), 91–99. https://doi.org/10.1016/j.gltp.2022.04.020

Malik, M. S. I., Nazarova, A., Jamjoom, M. M., & Ignatov, D. I. (2023). Multilingual Hope Speech Detection: A Robust Framework Using Transfer Learning of Fine-Tuning Roberta Model. Journal of King Saud University - Computer and Information Sciences, 35(8), 101736. https://doi.org/10.1016/j.jksuci.2023.101736

Muhammad, K. Bin, & Burney, S. M. A. (2023). Innovations in Urdu Sentiment Analysis Using Machine and Deep Learning Techniques for Two-Class Classification of Symmetric Datasets. Symmetry, 15(5), 1027. https://doi.org/10.3390/sym15051027

Nair, A. R., Singh, R. P., Gupta, D., & Kumar, P. (2024). Evaluating the Impact of Text Data Augmentation on Text Classification Tasks Using DistilBERT. Procedia Computer Science, 235(2023), 102–111. https://doi.org/10.1016/j.procs.2024.04.013

Pakray, P., Gelbukh, A., & Bandyopadhyay, S. (2025). Natural Language Processing Applications for Low-Resource Languages. Natural Language Processing, 31(2), 183–197. https://doi.org/10.1017/nlp.2024.33

Prottasha, N. J., Sami, A. A., Kowsher, M., Murad, S. A., Bairagi, A. K., Masud, M., & Baz, M. (2022). Transfer Learning for Sentiment Analysis Using BERT Based Supervised Fine-Tuning. Sensors, 22(11), 1–19. https://doi.org/10.3390/s22114157

Shaikh, S., Yayilgan, S. Y., Abomhara, M., & Zoto, E. (2025). Multilingual User Perceptions Analysis from Twitter Using Zero Shot Learning for Border Control Technologies. Social Network Analysis and Mining, 15(1), 1–24. https://doi.org/10.1007/s13278-025-01434-x

Sufi, F. (2024). Generative Pre-Trained Transformer (GPT) in Research: A Systematic Review on Data Augmentation. Information (Switzerland), 15(2), 99. https://doi.org/10.3390/info15020099

Wang, C. K. ; ;, Chen, Y. ;, Cheng, Y. ;, Huang, Y. ;, Dai, H.-N. ;, Kabir, H. M. D., Shaughnessy, O. ’, Kumar Mondal, S., Wang, C., Chen, Y., Cheng, Y., Huang, Y., Dai, H.-N., & Dipu Kabir, H. M. (2024). Enhancement of English-Bengali Machine Translation Leveraging Back-Translation. Applied Sciences, 14(15), 6848. https://doi.org/10.3390/app14156848

Published

2025-04-21

How to Cite

Transfer Learning Approach for Sentiment Analysis in Low-Resource Austronesian Languages Using Multilingual BERT. (2025). Journal of Technology Informatics and Engineering, 4(1), 75-94. https://doi.org/10.51903/jtie.v4i1.276