Automatic Detection and Explanation of Dark Patterns from Interface Microcopy: Empirical Comparison of BERT-Style Encoders, RoBERTa-Style Encoders, and LLM-Style Decoders on the ec-darkpattern Dataset

Haosen  Xu; Yushan  Chen; Aron  Med

doi:10.51903/jtie.v4i3.491

Authors

Haosen Xu Electrical Engineering and Computer Science, University of California, Berkeley, CA, USA
Yushan Chen Service Design, Savannah College of Art and Design, GA, USA
Aron Med California Institute of the Arts, CA, USA

DOI:

https://doi.org/10.51903/jtie.v4i3.491

Keywords:

dark patterns, deceptive design, interface microcopy, e-commerce, text classification

Abstract

Dark patterns (also called deceptive design patterns) are interface choices that steer or pressure users into unintended actions such as rushed purchases, unnecessary disclosures, or hard-to-cancel subscriptions. In e-commerce, many dark patterns are expressed directly in microcopy (e.g., button labels, banners, and inline messages), which makes text-only detection attractive for scalable auditing. This paper presents a fully reproducible experimental study on ec-darkpattern, a text-based dataset of e-commerce interface strings with balanced binary labels (1,178 dark pattern vs. 1,178 non-dark pattern) and seven dark pattern categories. We compare (i) a rule-based lexicon baseline, (ii) hashed n-gram linear models, (iii) a lightweight BERT-style bidirectional transformer encoder with word tokenization, (iv) a lightweight RoBERTa-style bidirectional transformer encoder with character tokenization, and (v) an LLM-style causal decoder trained as a classifier on the same inputs. On a fixed 80/10/10 split with seed 42, the best-performing model is a hashing + linear SVM baseline (F1=0.9437, ROC-AUC=0.9810), while the BERT-style encoder achieves F1=0.9038 (ROC-AUC=0.9695), the RoBERTa-style encoder achieves F1=0.8907 (ROC-AUC=0.9573), and the LLM-style decoder achieves F1=0.7884 (ROC-AUC=0.8808). These results should be interpreted as a controlled comparison under low-resource, no-pretraining conditions on a single fixed split, rather than as a general ranking of encoder-style versus decoder-style transformers. To support explainability, we generate token-level attributions using gradient-based saliency, summarize them as key phrases, and estimate explanation consistency via top-k token overlap on an exploratory 20-instance sample (mean Jaccard up to 0.7482 between the two character-based transformers). Finally, we curate an error-case library that links misclassifications to their most influential phrases. Within this short-microcopy setting, the findings show that lexical baselines are especially strong, while transformer directionality and tokenization change both accuracy and the stability of highlighted cues.

References

Ashofi, A. A. (2023). The Impact of Data Security, Ease of Use, and Access Speed on User Trust in Mobile Banking Applications. Journal of Management and Informatics, 2(3), 106–115. https://doi.org/10.51903/jmi.v2i3.148

Brignull, H. (2010). Dark Patterns. DarkPatterns.org. https://www.darkpatterns.org

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT 2019, 4171–4186. https://aclanthology.org/n19-1423

Di Geronimo, L., Braz, L., Fregnan, E., Palomba, F., & Bacchelli, A. (2020). UI Dark Patterns and Where to Find Them. In Proceedings of CHI 2020, 1–14. https://doi.org/10.1145/3313831.3376600

Fawcett, T. (2006). An Introduction to ROC Analysis. Pattern Recognition Letters, 27(8), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010

Gray, C. M., Kou, Y., Battles, B., Hoggatt, J., & Toombs, A. L. (2018). The Dark (Patterns) Side of UX Design. In Proceedings of CHI 2018, 1–14. https://doi.org/10.1145/3173574.3174108

Heraditya, N. C., Firmansyah, T. W., Yulianto, N. B., Faisal, S. A., & Supriyono. (2026). Implementing Odoo-Based ERP Sales and Inventory Modules (Case Study: UMKM Sirup Cap Manggis). JUISI: Jurnal Ilmiah Sistem Informasi, 5(2), 1–15. https://doi.org/10.51903/ygw51693

Jain, S., & Wallace, B. C. (2019). Attention Is Not Explanation. arXiv Preprint arXiv:1902.10186. https://arxiv.org/abs/1902.10186

Joachims, T. (1998). Text Categorization With Support Vector Machines: Learning With Many Relevant Features. In Proceedings of the 10th European Conference on Machine Learning (ECML 1998), 137–142. https://doi.org/10.1007/bfb0026683

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... Stoyanov, V. (2019). RoBERTa. arXiv Preprint arXiv:1907.11692. https://arxiv.org/abs/1907.11692

Loshchilov, I., & Hutter, F. (2019). Decoupled Weight Decay Regularization. In ICLR. https://arxiv.org/abs/1711.05101

Lundberg, S. M., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, 4765–4774. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html

Mathur, A., Acar, G., Friedman, M. J., Lucherini, E., Mayer, J., Chetty, M., & Narayanan, A. (2019). Dark Patterns at Scale. Proceedings of the ACM on HCI, 3(CSCW), Article 81. https://doi.org/10.1145/3359183

Mathur, A., Narayanan, A., & Chetty, M. (2021). What Makes a Dark Pattern... Dark? arXiv Preprint arXiv:2101.04843. https://arxiv.org/abs/2101.04843

Nouwens, M., Liccardi, I., Veale, M., Karger, D., & Kagal, L. (2020). Dark Patterns After the GDPR. In Proceedings of CHI 2020, 1–13. https://doi.org/10.1145/3313831.3376321

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why Should I Trust You? In KDD, 1135–1144. https://doi.org/10.1145/2939672.2939778

Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K., & Müller, K.-R. (Eds.). (2019). Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer Cham. https://doi.org/10.1007/978-3-030-28954-6

Santoso, J. T., & Yan, S. (2024). A Hybrid Approach to Typo Correction in Indonesian Documents Using Levenshtein Distance. Journal of Technology Informatics and Engineering, 3(2), 151–168. https://doi.org/10.51903/jtie.v3i2.184

Soe, W. H., Santos, C., & Slavkovik, M. (2022). Automated Detection of Dark Patterns. arXiv Preprint arXiv:2204.11836. https://arxiv.org/abs/2204.11836

Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, 3319–3328. https://proceedings.mlr.press/v70/sundararajan17a.html

Thaler, R. H., & Sunstein, C. R. (2008). Nudge: Improving Decisions About Health, Wealth and Happiness. Yale University Press. https://yalebooks.co.uk/book/9780300146813/nudge

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. In Advances in Neural Information Processing Systems 30, 5998–6008. https://arxiv.org/abs/1706.03762

Wibowo, M. C., & Santoso, J. T. (2024). Utilizing PHPMyAdmin for System Design in Enterprise Administration. Journal of Technology Informatics and Engineering, 3(2), 217–234. https://doi.org/10.51903/jtie.v3i2.193

Yada, Y., Feng, J., Matsumoto, T., Fukushima, N., Kido, F., & Yamana, H. (2022). Dark Patterns in E-Commerce: A Dataset and Its Baseline Evaluations. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data 2022), 3015–3022. https://doi.org/10.1109/bigdata55660.2022.10020800