Prediction and Detection of Scam Threats on Digital Platforms for Indonesian Users Using Machine Learning Models
DOI:
https://doi.org/10.51903/jtie.v3i3.208Keywords:
Scam Detection, Machine Learning, Digital SecurityAbstract
Scam threats on digital platforms continue to rise alongside the rapid adoption of technology in Indonesia. The unique characteristics of Indonesian digital users, such as low digital literacy and high social media usage, make them particularly vulnerable to various forms of scams, including phishing, impersonation, and emotional manipulation. This study aims to develop a machine learning-based model for predicting and detecting scams by identifying threat patterns within a local context. The methodology involves collecting a survey-based dataset from Indonesian digital users, capturing language patterns and user interaction behaviors. The dataset was processed through text-cleaning techniques, tokenization, normalization, and representation using TF-IDF and Word Embeddings. The machine learning models employed in this study are Random Forest and Support Vector Machine (SVM), evaluated using accuracy, precision, recall, and F1-score metrics. Hyperparameter tuning was conducted to optimize model performance, while k-fold cross-validation was utilized to minimize the risk of overfitting. The results indicate that the Random Forest model achieved the best performance, with an accuracy of 92.5%, precision of 90.7%, recall of 94.1%, and F1-score of 92.4%. The use of local datasets improved detection accuracy by 7.8% compared to global datasets, highlighting the critical importance of contextual representation in identifying scam patterns specific to Indonesia. The model was also effective in recognizing unique threat patterns, such as the use of informal language and manipulative phrases in scam messages. This study makes a significant contribution to the field of digital security by providing an effective machine learning-based approach to detecting scam threats in Indonesia. Moreover, the findings underscore the importance of developing local datasets and educating users as part of a holistic solution to enhance digital security. These insights emphasize the necessity of incorporating cultural and contextual factors into technology-driven approaches for combating scams in developing countries like Indonesia
References
Ali, A., Abd Razak, S., Othman, S. H., Eisa, T. A. E., Al-Dhaqm, A., Nasser, M., Elhassan, T., Elshafie, H., & Saif, A. (2022). Financial Fraud Detection Based on Machine Learning: A Systematic Literature Review. Applied Sciences, 12(19), 9637. https://doi.org/10.3390/app12199637
Ali, M. M., & Mohd Zaharon, N. F. (2022). Phishing - A Cyber Fraud: The Types, Implications and Governance. Sage Journals, 33(1), 101–121. https://doi.org/10.1177/10567879221082966
Aljabri, M., & Mohammad, R. M. A. (2023). Click Fraud Detection for Online Advertising Using Machine Learning. Egyptian Informatics Journal, 24(2), 341–350. https://doi.org/10.1016/j.eij.2023.05.006
Badruzaman, D. (2023). Legal Studies on Mobile Internet in an Effort to Prevent the Negative Impact of Information and Communication Technology in Indonesia. Journal of Law Science, 5(1), 10–20. https://doi.org/10.35335/jls.v5i1.260
Bera, D., Ogbanufe, O., & Kim, D. J. (2023). Towards a Thematic Dimensional Framework of Online Fraud: An Exploration of Fraudulent Email Attack Tactics and Intentions. Decision Support Systems, 171, 113977. https://doi.org/10.1016/j.dss.2023.113977
Borwell, J., Jansen, J., & Stol, W. (2021). The Psychological and Financial Impact of Cybercrime Victimization: A Novel Application of the Shattered Assumptions Theory. Sage Journals, 40(4), 933–954. https://doi.org/10.1177/0894439320983828
Cevikalp, H., & Chome, E. (2024). Robust and Compact Maximum Margin Clustering for High-Dimensional Data. Neural Computing and Applications, 36(11), 5981–6003. https://doi.org/10.1007/s00521-023-09388-x
Chang, J. W., Yen, N., & Hung, J. C. (2022). Design of a NLP-Empowered Finance Fraud Awareness Model: The Anti-Fraud Chatbot for Fraud Detection and Fraud Classification as an Instance. Journal of Ambient Intelligence and Humanized Computing, 13(10), 4663–4679. https://doi.org/10.1007/s12652-021-03512-2
Chawla, N., & Kumar, B. (2022). E-Commerce and Consumer Protection in India: The Emerging Trend. Journal of Business Ethics, 180(2), 581–604. https://doi.org/10.1007/s10551-021-04884-3
DeLiema, M., & Witt, P. (2023). Profiling Consumers who Reported Mass Marketing Scams: Demographic Characteristics and Emotional Sentiments Associated with Victimization. Security Journal, 37(3), 921–964. https://doi.org/10.1057/s41284-023-00401-5
Drury, B., Drury, S. M., Rahman, M. A., & Ullah, I. (2022). A Social Network of Crime: A Review of the Use of Social Networks for Crime and the Detection of Crime. Online Social Networks and Media, 30, 100211. https://doi.org/10.1016/j.osnem.2022.100211
Esenogho, E., Mienye, I. D., Swart, T. G., Aruleba, K., & Obaido, G. (2022). A Neural Network Ensemble with Feature Engineering for Improved Credit Card Fraud Detection. IEEE Access, 10, 16400–16407. https://doi.org/10.1109/access.2022.3148298
Gould, K. R., Carminati, J. Y. J., & Ponsford, J. L. (2023). They Just Say How Stupid I Was for Being Conned": Cyberscams and Acquired Brain Injury - A Qualitative Exploration of the Lived Experience of Survivors and Close Others. Neuropsychological Rehabilitation, 33(2), 325–345. https://doi.org/10.1080/09602011.2021.2016447
Gu, Q., Tian, J., Li, X., & Jiang, S. (2022). A Novel Random Forest Integrated Model for Imbalanced Data Classification Problem. Knowledge-Based Systems, 250, 109050. https://doi.org/10.1016/j.knosys.2022.109050
Hilal, W., Gadsden, S. A., & Yawney, J. (2022). Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances. Expert Systems with Applications, 193, 116429. https://doi.org/10.1016/j.eswa.2021.116429
Jáñez-Martino, F., Alaiz-Rodríguez, R., González-Castro, V., Fidalgo, E., & Alegre, E. (2023). A Review of Spam Email Detection: Analysis of Spammer Strategies and the Dataset Shift Problem. Artificial Intelligence Review, 56(2), 1145–1173. https://doi.org/10.1007/s10462-022-10195-4
Jethava, G., & Rao, U. P. (2024). Exploring Security and Trust Mechanisms in Online Social Networks: An Extensive Review. Computers & Security, 140, 103790. https://doi.org/10.1016/j.cose.2024.103790
Jovanovic, D., Antonijevic, M., Stankovic, M., Zivkovic, M., Tanaskovic, M., & Bacanin, N. (2022). Tuning Machine Learning Models Using a Group Search Firefly Algorithm for Credit Card Fraud Detection. Mathematics, 10(13), 2272. https://doi.org/10.3390/math10132272
Kemp, S., & Erades Pérez, N. (2023). Consumer Fraud against Older Adults in Digital Society: Examining Victimization and Its Impact. International Journal of Environmental Research and Public Health, 20(7), 5404. https://doi.org/10.3390/ijerph20075404
Kumar, A., Gopal, R. D., Shankar, R., & Tan, K. H. (2022). Fraudulent Review Detection Model Focusing on Emotional Expressions and Explicit Aspects: Investigating the Potential of Feature Engineering. Decision Support Systems, 155, 113728. https://doi.org/10.1016/j.dss.2021.113728
Li, G., & Jung, J. J. (2023). Deep Learning for Anomaly Detection in Multivariate Time Series: Approaches, Applications, and Challenges. Information Fusion, 91, 93–102. https://doi.org/10.1016/j.inffus.2022.10.008
Lwin Tun, Z., & Birks, D. (2023). Supporting Crime Script Analyses of Scams with Natural Language Processing. Crime Science, 12(1), 1–22. https://doi.org/10.1186/s40163-022-00177-w
Madyatmadja, E. D., Sianipar, C. P. M., Wijaya, C., & Sembiring, D. J. M. (2023). Classifying Crowdsourced Citizen Complaints through Data Mining: Accuracy Testing of k-Nearest Neighbors, Random Forest, Support Vector Machine, and AdaBoost. Informatics, 10(4), 84. https://doi.org/10.3390/informatics10040084
Maulidi, A., Girindratama, M. W., Putra, A. R., Sari, R. P., & Nuswantara, D. A. (2024). Qualitatively Beyond the Ledger: Unravelling the Interplay of Organisational Control, Whistleblowing Systems, Fraud Awareness, and Religiosity. Cogent Social Sciences, 10(1), 2320743. https://doi.org/10.1080/23311886.2024.2320743
Maulidiyah, D. N. (2024). Consensus on the Role of Culture in Restraining Financial Crime: A Systematic Literature Review. Journal of Financial Crime, 31(4), 883–897. https://doi.org/10.1108/jfc-05-2023-0103
Michael Onyema, E., Balasubaramanian, S., Suguna S, K., Iwendi, C., Prasad, B. V. V. S., & Edeh, C. D. (2023). Remote Monitoring System Using Slow-Fast Deep Convolution Neural Network Model for Identifying Anti-Social Activities in Surveillance Applications. Measurement: Sensors, 27, 100718. https://doi.org/10.1016/j.measen.2023.100718
Mishra, A., Alzoubi, Y. I., Anwar, M. J., & Gill, A. Q. (2022). Attributes Impacting Cybersecurity Policy Development: An Evidence from Seven Nations. Computers & Security, 120, 102820. https://doi.org/10.1016/j.cose.2022.102820
Mosharraf, M., & Haghighatkhah, F. H. (2023). Exploring Identity Theft: Motives, Techniques, and Consequents on Different Age Groups. Journal of Innovations in Computer Science and Engineering (JICSE), 1(1), 63–74. https://doi.org/10.48308/jicse.2023.231077.1017
Pagano, T. P., Loureiro, R. B., Lisboa, F. V. N., Peixoto, R. M., Guimarães, G. A. S., Cruz, G. O. R., Araujo, M. M., Santos, L. L., Cruz, M. A. S., Oliveira, E. L. S., Winkler, I., & Nascimento, E. G. S. (2023). Bias and Unfairness in Machine Learning Models: A Systematic Review on Datasets, Tools, Fairness Metrics, and Identification and Mitigation Methods. Big Data and Cognitive Computing, 7(1), 15. https://doi.org/10.3390/bdcc7010015/s1
Prabhu Kavin, B., Karki, S., Hemalatha, S., Singh, D., Vijayalakshmi, R., Thangamani, M., Haleem, S. L. A., Jose, D., Tirth, V., Kshirsagar, P. R., & Adigo, A. G. (2022). Machine Learning-Based Secure Data Acquisition for Fake Accounts Detection in Future Mobile Communication Networks. Wireless Communications and Mobile Computing, 2022(1), 6356152. https://doi.org/10.1155/2022/6356152
Prabowo, H. Y. (2024). When Gullibility Becomes Us: Exploring the Cultural Roots of Indonesians’ Susceptibility to Investment Fraud. Journal of Financial Crime, 31(1), 14–32. https://doi.org/10.1108/jfc-11-2022-0271
Qureshi, K. A., Malick, R. A. S., Sabih, M., & Cherifi, H. (2022). Deception Detection on Social Media: A Source-Based Perspective. Knowledge-Based Systems, 256, 109649. https://doi.org/10.1016/j.knosys.2022.109649
Saidat, M. R. Al, Yerima, S. Y., & Shaalan, K. (2024). Advancements of SMS Spam Detection: A Comprehensive Survey of NLP and ML Techniques. Procedia Computer Science, 244, 248–259. https://doi.org/10.1016/j.procs.2024.10.198
Salloum, S., Gaber, T., Vadera, S., & Shaalan, K. (2022). A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques. IEEE Access, 10, 65703–65727. https://doi.org/10.1109/access.2022.3183083
Shang, Y., Wu, Z., Du, X., Jiang, Y., Ma, B., & Chi, M. (2022). The Psychology of the Internet Fraud Victimization of Older Adults: A Systematic Review. Frontiers in Psychology, 13, 912242. https://doi.org/10.3389/fpsyg.2022.912242
Shetty, A. A., & Murthy, K. V. (2023). Investigation of Card Skimming Cases: An Indian Perspective. Journal of Applied Security Research, 18(3), 519–532. https://doi.org/10.1080/19361610.2021.2024049
Sholikhah, Z., Adawiyah, W. R., Pramuka, B. A., & Pariyanti, E. (2024). Can Spiritual Power Reduce Online Cheating Behavior Among University Students? The Fraud Triangle Theory Perspective. Journal of International Education in Business, 17(1), 82–106. https://doi.org/10.1108/jieb-11-2022-0082
Siahaan, M. N., Handayani, P. W., & Azzahro, F. (2022). Self-Disclosure of Social Media Users in Indonesia: The Influence of Personal and Social Media Factors. Information Technology and People, 35(7), 1931–1954. https://doi.org/10.1108/itp-06-2020-0389
Taherdoost, H. (2023). Enhancing Social Media Platforms with Machine Learning Algorithms and Neural Networks. Algorithms, 16(6), 271. https://doi.org/10.3390/a16060271
Widiasari, N. K. N., & Thalib, E. F. (2022). The Impact of Information Technology Development on Cybercrime Rate in Indonesia. Journal of Digital Law and Policy, 1(2), 73–86. https://doi.org/10.58982/jdlp.v1i2.165
Xu, X., Xiong, F., & An, Z. (2023). Using Machine Learning to Predict Corporate Fraud: Evidence Based on the GONE Framework. Journal of Business Ethics, 186(1), 137–158. https://doi.org/10.1007/s10551-022-05120-2
Yusriadi, Y., Rusnaedi, Siregar, N. A., Megawati, S., & Sakkir, G. (2023). Implementation of Artificial Intelligence in Indonesia. International Journal of Data and Network Science, 7(1), 283–294. https://doi.org/10.5267/j.ijdns.2022.10.005
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Journal of Technology Informatics and Engineering
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.