Prediction and Detection of Scam Threats on Digital Platforms for Indonesian Users Using Machine Learning Models

Budi Raharjo; Rudjiono; Yuli  Fitrianto

doi:10.51903/jtie.v3i3.208

Authors

Budi Raharjo University of Science and Computer Technology
Rudjiono Universitas Sains dan Teknologi Komputer
Yuli Fitrianto

DOI:

https://doi.org/10.51903/jtie.v3i3.208

Keywords:

Scam Detection, Machine Learning, Digital Security

Abstract

Scam threats on digital platforms continue to rise alongside the rapid adoption of technology in Indonesia. The unique characteristics of Indonesian digital users, such as low digital literacy and high social media usage, make them particularly vulnerable to various forms of scams, including phishing, impersonation, and emotional manipulation. This study aims to develop a machine learning-based model for predicting and detecting scams by identifying threat patterns within a local context. The methodology involves collecting a survey-based dataset from Indonesian digital users, capturing language patterns and user interaction behaviors. The dataset was processed through text-cleaning techniques, tokenization, normalization, and representation using TF-IDF and Word Embeddings. The machine learning models employed in this study are Random Forest and Support Vector Machine (SVM), evaluated using accuracy, precision, recall, and F1-score metrics. Hyperparameter tuning was conducted to optimize model performance, while k-fold cross-validation was utilized to minimize the risk of overfitting. The results indicate that the Random Forest model achieved the best performance, with an accuracy of 92.5%, precision of 90.7%, recall of 94.1%, and F1-score of 92.4%. The use of local datasets improved detection accuracy by 7.8% compared to global datasets, highlighting the critical importance of contextual representation in identifying scam patterns specific to Indonesia. The model was also effective in recognizing unique threat patterns, such as the use of informal language and manipulative phrases in scam messages. This study makes a significant contribution to the field of digital security by providing an effective machine learning-based approach to detecting scam threats in Indonesia. Moreover, the findings underscore the importance of developing local datasets and educating users as part of a holistic solution to enhance digital security. These insights emphasize the necessity of incorporating cultural and contextual factors into technology-driven approaches for combating scams in developing countries like Indonesia

References

Ali, A., Abd Razak, S., Othman, S. H., Eisa, T. A. E., Al-Dhaqm, A., Nasser, M., Elhassan, T., Elshafie, H., & Saif, A. (2022). Financial Fraud Detection Based on Machine Learning: A Systematic Literature Review. Applied Sciences, 12(19), 9637. https://doi.org/10.3390/app12199637

Ali, M. M., & Mohd Zaharon, N. F. (2022). Phishing - A Cyber Fraud: The Types, Implications and Governance. Sage Journals, 33(1), 101–121. https://doi.org/10.1177/10567879221082966

Aljabri, M., & Mohammad, R. M. A. (2023). Click Fraud Detection for Online Advertising Using Machine Learning. Egyptian Informatics Journal, 24(2), 341–350. https://doi.org/10.1016/j.eij.2023.05.006

Badruzaman, D. (2023). Legal Studies on Mobile Internet in an Effort to Prevent the Negative Impact of Information and Communication Technology in Indonesia. Journal of Law Science, 5(1), 10–20. https://doi.org/10.35335/jls.v5i1.260

Bera, D., Ogbanufe, O., & Kim, D. J. (2023). Towards a Thematic Dimensional Framework of Online Fraud: An Exploration of Fraudulent Email Attack Tactics and Intentions. Decision Support Systems, 171, 113977. https://doi.org/10.1016/j.dss.2023.113977

Borwell, J., Jansen, J., & Stol, W. (2021). The Psychological and Financial Impact of Cybercrime Victimization: A Novel Application of the Shattered Assumptions Theory. Sage Journals, 40(4), 933–954. https://doi.org/10.1177/0894439320983828

Cevikalp, H., & Chome, E. (2024). Robust and Compact Maximum Margin Clustering for High-Dimensional Data. Neural Computing and Applications, 36(11), 5981–6003. https://doi.org/10.1007/s00521-023-09388-x

Chang, J. W., Yen, N., & Hung, J. C. (2022). Design of a NLP-Empowered Finance Fraud Awareness Model: The Anti-Fraud Chatbot for Fraud Detection and Fraud Classification as an Instance. Journal of Ambient Intelligence and Humanized Computing, 13(10), 4663–4679. https://doi.org/10.1007/s12652-021-03512-2

Chawla, N., & Kumar, B. (2022). E-Commerce and Consumer Protection in India: The Emerging Trend. Journal of Business Ethics, 180(2), 581–604. https://doi.org/10.1007/s10551-021-04884-3

DeLiema, M., & Witt, P. (2023). Profiling Consumers who Reported Mass Marketing Scams: Demographic Characteristics and Emotional Sentiments Associated with Victimization. Security Journal, 37(3), 921–964. https://doi.org/10.1057/s41284-023-00401-5

Drury, B., Drury, S. M., Rahman, M. A., & Ullah, I. (2022). A Social Network of Crime: A Review of the Use of Social Networks for Crime and the Detection of Crime. Online Social Networks and Media, 30, 100211. https://doi.org/10.1016/j.osnem.2022.100211

Esenogho, E., Mienye, I. D., Swart, T. G., Aruleba, K., & Obaido, G. (2022). A Neural Network Ensemble with Feature Engineering for Improved Credit Card Fraud Detection. IEEE Access, 10, 16400–16407. https://doi.org/10.1109/access.2022.3148298

Gould, K. R., Carminati, J. Y. J., & Ponsford, J. L. (2023). They Just Say How Stupid I Was for Being Conned": Cyberscams and Acquired Brain Injury - A Qualitative Exploration of the Lived Experience of Survivors and Close Others. Neuropsychological Rehabilitation, 33(2), 325–345. https://doi.org/10.1080/09602011.2021.2016447

Gu, Q., Tian, J., Li, X., & Jiang, S. (2022). A Novel Random Forest Integrated Model for Imbalanced Data Classification Problem. Knowledge-Based Systems, 250, 109050. https://doi.org/10.1016/j.knosys.2022.109050

Hilal, W., Gadsden, S. A., & Yawney, J. (2022). Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances. Expert Systems with Applications, 193, 116429. https://doi.org/10.1016/j.eswa.2021.116429

Jáñez-Martino, F., Alaiz-Rodríguez, R., González-Castro, V., Fidalgo, E., & Alegre, E. (2023). A Review of Spam Email Detection: Analysis of Spammer Strategies and the Dataset Shift Problem. Artificial Intelligence Review, 56(2), 1145–1173. https://doi.org/10.1007/s10462-022-10195-4

Jethava, G., & Rao, U. P. (2024). Exploring Security and Trust Mechanisms in Online Social Networks: An Extensive Review. Computers & Security, 140, 103790. https://doi.org/10.1016/j.cose.2024.103790

Jovanovic, D., Antonijevic, M., Stankovic, M., Zivkovic, M., Tanaskovic, M., & Bacanin, N. (2022). Tuning Machine Learning Models Using a Group Search Firefly Algorithm for Credit Card Fraud Detection. Mathematics, 10(13), 2272. https://doi.org/10.3390/math10132272

Kemp, S., & Erades Pérez, N. (2023). Consumer Fraud against Older Adults in Digital Society: Examining Victimization and Its Impact. International Journal of Environmental Research and Public Health, 20(7), 5404. https://doi.org/10.3390/ijerph20075404

Kumar, A., Gopal, R. D., Shankar, R., & Tan, K. H. (2022). Fraudulent Review Detection Model Focusing on Emotional Expressions and Explicit Aspects: Investigating the Potential of Feature Engineering. Decision Support Systems, 155, 113728. https://doi.org/10.1016/j.dss.2021.113728

Li, G., & Jung, J. J. (2023). Deep Learning for Anomaly Detection in Multivariate Time Series: Approaches, Applications, and Challenges. Information Fusion, 91, 93–102. https://doi.org/10.1016/j.inffus.2022.10.008

Lwin Tun, Z., & Birks, D. (2023). Supporting Crime Script Analyses of Scams with Natural Language Processing. Crime Science, 12(1), 1–22. https://doi.org/10.1186/s40163-022-00177-w

Madyatmadja, E. D., Sianipar, C. P. M., Wijaya, C., & Sembiring, D. J. M. (2023). Classifying Crowdsourced Citizen Complaints through Data Mining: Accuracy Testing of k-Nearest Neighbors, Random Forest, Support Vector Machine, and AdaBoost. Informatics, 10(4), 84. https://doi.org/10.3390/informatics10040084

Maulidi, A., Girindratama, M. W., Putra, A. R., Sari, R. P., & Nuswantara, D. A. (2024). Qualitatively Beyond the Ledger: Unravelling the Interplay of Organisational Control, Whistleblowing Systems, Fraud Awareness, and Religiosity. Cogent Social Sciences, 10(1), 2320743. https://doi.org/10.1080/23311886.2024.2320743

Maulidiyah, D. N. (2024). Consensus on the Role of Culture in Restraining Financial Crime: A Systematic Literature Review. Journal of Financial Crime, 31(4), 883–897. https://doi.org/10.1108/jfc-05-2023-0103

Michael Onyema, E., Balasubaramanian, S., Suguna S, K., Iwendi, C., Prasad, B. V. V. S., & Edeh, C. D. (2023). Remote Monitoring System Using Slow-Fast Deep Convolution Neural Network Model for Identifying Anti-Social Activities in Surveillance Applications. Measurement: Sensors, 27, 100718. https://doi.org/10.1016/j.measen.2023.100718

Mishra, A., Alzoubi, Y. I., Anwar, M. J., & Gill, A. Q. (2022). Attributes Impacting Cybersecurity Policy Development: An Evidence from Seven Nations. Computers & Security, 120, 102820. https://doi.org/10.1016/j.cose.2022.102820

Mosharraf, M., & Haghighatkhah, F. H. (2023). Exploring Identity Theft: Motives, Techniques, and Consequents on Different Age Groups. Journal of Innovations in Computer Science and Engineering (JICSE), 1(1), 63–74. https://doi.org/10.48308/jicse.2023.231077.1017

Pagano, T. P., Loureiro, R. B., Lisboa, F. V. N., Peixoto, R. M., Guimarães, G. A. S., Cruz, G. O. R., Araujo, M. M., Santos, L. L., Cruz, M. A. S., Oliveira, E. L. S., Winkler, I., & Nascimento, E. G. S. (2023). Bias and Unfairness in Machine Learning Models: A Systematic Review on Datasets, Tools, Fairness Metrics, and Identification and Mitigation Methods. Big Data and Cognitive Computing, 7(1), 15. https://doi.org/10.3390/bdcc7010015/s1

Prabhu Kavin, B., Karki, S., Hemalatha, S., Singh, D., Vijayalakshmi, R., Thangamani, M., Haleem, S. L. A., Jose, D., Tirth, V., Kshirsagar, P. R., & Adigo, A. G. (2022). Machine Learning-Based Secure Data Acquisition for Fake Accounts Detection in Future Mobile Communication Networks. Wireless Communications and Mobile Computing, 2022(1), 6356152. https://doi.org/10.1155/2022/6356152

Prabowo, H. Y. (2024). When Gullibility Becomes Us: Exploring the Cultural Roots of Indonesians’ Susceptibility to Investment Fraud. Journal of Financial Crime, 31(1), 14–32. https://doi.org/10.1108/jfc-11-2022-0271

Qureshi, K. A., Malick, R. A. S., Sabih, M., & Cherifi, H. (2022). Deception Detection on Social Media: A Source-Based Perspective. Knowledge-Based Systems, 256, 109649. https://doi.org/10.1016/j.knosys.2022.109649

Saidat, M. R. Al, Yerima, S. Y., & Shaalan, K. (2024). Advancements of SMS Spam Detection: A Comprehensive Survey of NLP and ML Techniques. Procedia Computer Science, 244, 248–259. https://doi.org/10.1016/j.procs.2024.10.198

Salloum, S., Gaber, T., Vadera, S., & Shaalan, K. (2022). A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques. IEEE Access, 10, 65703–65727. https://doi.org/10.1109/access.2022.3183083

Shang, Y., Wu, Z., Du, X., Jiang, Y., Ma, B., & Chi, M. (2022). The Psychology of the Internet Fraud Victimization of Older Adults: A Systematic Review. Frontiers in Psychology, 13, 912242. https://doi.org/10.3389/fpsyg.2022.912242

Shetty, A. A., & Murthy, K. V. (2023). Investigation of Card Skimming Cases: An Indian Perspective. Journal of Applied Security Research, 18(3), 519–532. https://doi.org/10.1080/19361610.2021.2024049

Sholikhah, Z., Adawiyah, W. R., Pramuka, B. A., & Pariyanti, E. (2024). Can Spiritual Power Reduce Online Cheating Behavior Among University Students? The Fraud Triangle Theory Perspective. Journal of International Education in Business, 17(1), 82–106. https://doi.org/10.1108/jieb-11-2022-0082

Siahaan, M. N., Handayani, P. W., & Azzahro, F. (2022). Self-Disclosure of Social Media Users in Indonesia: The Influence of Personal and Social Media Factors. Information Technology and People, 35(7), 1931–1954. https://doi.org/10.1108/itp-06-2020-0389

Taherdoost, H. (2023). Enhancing Social Media Platforms with Machine Learning Algorithms and Neural Networks. Algorithms, 16(6), 271. https://doi.org/10.3390/a16060271

Widiasari, N. K. N., & Thalib, E. F. (2022). The Impact of Information Technology Development on Cybercrime Rate in Indonesia. Journal of Digital Law and Policy, 1(2), 73–86. https://doi.org/10.58982/jdlp.v1i2.165

Xu, X., Xiong, F., & An, Z. (2023). Using Machine Learning to Predict Corporate Fraud: Evidence Based on the GONE Framework. Journal of Business Ethics, 186(1), 137–158. https://doi.org/10.1007/s10551-022-05120-2

Yusriadi, Y., Rusnaedi, Siregar, N. A., Megawati, S., & Sakkir, G. (2023). Implementation of Artificial Intelligence in Indonesia. International Journal of Data and Network Science, 7(1), 283–294. https://doi.org/10.5267/j.ijdns.2022.10.005