Evidence-Chain Reliable RAG: Hallucination Detection, Source Attribution, and Deterministic Provenance Explanations

Jiaying  Jin

doi:10.51903/jtie.v4i2.535

Authors

Jiaying Jin Applied Analytics, Columbia University, NY, USA

DOI:

https://doi.org/10.51903/jtie.v4i2.535

Keywords:

retrieval-augmented generation, hallucination detection, RAGTruth, evidence attribution, provenance explanation, trustworthy AI, natural language processing

Abstract

Retrieval-augmented generation (RAG) reduces unsupported generation by grounding answers in source content, but retrieval alone does not guarantee that every output claim is attributable to evidence. This paper presents Evidence-Chain Reliable RAG, an empirical hallucination-detection and provenance framework that scores whether generated response sentences are supported by the corresponding RAG source record. The evaluation uses the complete RAGTruth JSONL data available for this study: 2,965 source records, 17,790 assistant responses, and 14,289 exact-offset hallucination spans across Data2Text, QA, and summarization. The experiment converts word-level spans into response-level, sentence-level, and character-span targets; extracts lexical, BM25, TF-IDF, unsupported-number, unsupported-entity, refusal, and Evidence-Chain Score features; and compares seven methods. On the official held-out test split of 2,700 responses, RandomForest achieved the best case-level F1 of 0.626 and PR-AUC of 0.553. The proposed ECS-Span model achieved case-level F1 of 0.614, ROC-AUC of 0.742, and PR-AUC of 0.536 while also producing deterministic provenance explanations. At sentence level, RandomForest again achieved the highest F1 of 0.321; the proposed method obtained F1 of 0.312, ROC-AUC of 0.777, and PR-AUC of 0.245. Exact character-span localization remained difficult, with character-level F1 of 0.197 because sentence-level predictions often include supported text around shorter hallucinated spans. The findings indicate that evidence-chain features are useful for interpretable RAG auditing, but precise span extraction requires token-level sequence labeling or a comparable fine-grained model.

References

Bowman, S. R., Angeli, G., Potts, C., & Manning, C. D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 632-642). Association for Computational Linguistics.

Chen, D., Fisch, A., Weston, J., & Bordes, A. (2017). Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (pp. 1870-1879). Association for Computational Linguistics.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT 2019 (pp. 4171-4186). Association for Computational Linguistics.

Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv. https://arxiv.org/abs/1702.08608

Es, S., James, J., Espinosa-Anke, L., & Schockaert, S. (2023). RAGAS: Automated evaluation of retrieval augmented generation. arXiv. https://arxiv.org/abs/2309.15217

Gao, L., Dai, Z., Pasupat, P., Chen, A., Chaganty, A. T., Fan, Y., Zhao, V. Y., Lao, N., Lee, H., Juan, D.-C., & Guu, K. (2023). RARR: Researching and revising what language models say, using language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.

Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M.-W. (2020). REALM: Retrieval-augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning (pp. 3929-3938). PMLR.

He, P., Liu, X., Gao, J., & Chen, W. (2021). DeBERTa: Decoding-enhanced BERT with disentangled attention. In International Conference on Learning Representations.

Honovich, O., Choshen, L., Aharoni, R., Neeman, E., Szpektor, I., & Abend, O. (2022). TRUE: Re-evaluating factual consistency evaluation. In Proceedings of NAACL-HLT 2022 (pp. 3905-3920). Association for Computational Linguistics.

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1-38.

Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of EMNLP 2020 (pp. 6769-6781). Association for Computational Linguistics.

Khattab, O., & Zaharia, M. (2020). ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. In Proceedings of SIGIR 2020 (pp. 39-48). Association for Computing Machinery.

Kryscinski, W., McCann, B., Xiong, C., & Socher, R. (2020). Evaluating the factual consistency of abstractive text summarization. In Proceedings of EMNLP 2020 (pp. 9332-9346). Association for Computational Linguistics.

Laban, P., Kryscinski, W., Agarwal, D., Fabbri, A. R., Xiong, C., Joty, S., & Wu, C.-S. (2022). SummaC: Re-visiting NLI-based models for inconsistency detection in summarization. Transactions of the Association for Computational Linguistics, 10, 163-177.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W.-t., Rocktaschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv. https://arxiv.org/abs/1907.11692

Manakul, P., Liusie, A., & Gales, M. J. F. (2023). SelfCheckGPT: Zero-resource black-box hallucination detection for generative large language models. In Proceedings of EMNLP 2023. Association for Computational Linguistics.

Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020). On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 1906-1919). Association for Computational Linguistics.

Min, S., Krishna, K., Lyu, X., Lewis, M., Yih, W.-t., Koh, P. W., Iyyer, M., Zettlemoyer, L., & Hajishirzi, H. (2023). FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. In Proceedings of EMNLP 2023. Association for Computational Linguistics.

Pagnoni, A., Balachandran, V., & Tsvetkov, Y. (2021). Understanding factuality in abstractive summarization with FRANK: A benchmark for factuality metrics. In Proceedings of NAACL-HLT 2021 (pp. 4812-4829). Association for Computational Linguistics.

Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of EMNLP-IJCNLP 2019 (pp. 3982-3992). Association for Computational Linguistics.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you? Explaining the predictions of any classifier. In Proceedings of KDD 2016 (pp. 1135-1144). Association for Computing Machinery.

Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333-389.

Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018). FEVER: A large-scale dataset for fact extraction and verification. In Proceedings of NAACL-HLT 2018 (pp. 809-819). Association for Computational Linguistics.

Williams, A., Nangia, N., & Bowman, S. R. (2018). A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of NAACL-HLT 2018 (pp. 1112-1122). Association for Computational Linguistics.

Wu, Y., Zhu, J., Xu, S., Shum, K., Niu, C., Zhong, R., Song, J., & Zhang, T. (2024). RAGTruth: A hallucination corpus for developing trustworthy retrieval-augmented language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.