Accounting-Aware Evidence Retrieval for Institutional Due Diligence of Tokenized Trade Receivable RWA

Yuanzheng  Chen; Sihan  Zhou; Emma  Lin

doi:10.51903/jtie.v4i3.542

Authors

Yuanzheng Chen Accounting, UIUC, IL, USA
Sihan Zhou Enterprise Risk Management, Columbia University, NY, USA
Emma Lin Computer Engineering, UCSD, CA, USA

DOI:

https://doi.org/10.51903/jtie.v4i3.542

Keywords:

ambiguity-aware retrieval, retrieval-augmented generation, FinDER, trade receivables, real-world assets, tokenization, reranking, evidence selection, answer abstention, institutional due diligence

Abstract

Institutional investors evaluating tokenized real-world asset (RWA) transactions need retrieval systems that can answer short, ambiguous, and legally loaded due-diligence questions with traceable evidence. Trade receivable pools are especially difficult because the same question may require accounting policy, financial metrics, footnote disclosure, legal covenants, insurance language, servicer reporting, or waterfall mechanics. This study implements and evaluates an accounting-aware evidence-retrieval pipeline for tokenized trade receivable RWA due diligence. The main experiment uses the official FinDER benchmark with 5,703 query-evidence-answer triples, 6,121 annotated evidence references, and 5,830 deduplicated evidence passages derived from financial disclosures. The pipeline compares vanilla sparse retrieval, accounting-aware query rewriting, feature reranking, section-aware evidence selection, and calibrated abstention. On the official FinDER evaluation, query rewriting increased Recall@10 from 28.25% to 28.62%, reranking increased Recall@10 to 33.86% and answer-support accuracy to 24.57%, and section-aware evidence selection achieved 34.44% Recall@10, 24.04% nDCG@10, 8.32% EvidencePrecision@3, and 25.23% answer-support accuracy. The accounting-relevant subset, defined as Accounting, Financials, and Footnotes, achieved 37.10% Recall@10 and 26.54% answer-support accuracy. A supplementary stress check using a public receivables purchase agreement and SEC 2026-04 financial statement notes showed that the same retrieval logic can surface schedule, lock-box, GAAP, receivable, and note-disclosure evidence, while also highlighting the need for table extraction and field-level numerical validation. The findings support a narrower deployment claim: accounting-aware RAG can improve evidence discovery and analyst review, but it is not yet suitable for autonomous investment or accounting decision-making

References

Araci, D. (2019). FinBERT: Financial sentiment analysis with pre-trained language models. arXiv.

Catalini, C., & Gans, J. S. (2020). Some simple economics of the blockchain. Communications of the ACM, 63(7), 80-90.

Chen, D., Fisch, A., Weston, J., & Bordes, A. (2017). Reading Wikipedia to answer open-domain questions. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 1870-1879.

Chen, Z., Chen, W., Smiley, C., Shah, S., Borova, I., Langdon, D., Moussa, R., Beane, M., Huang, T. H., Routledge, B., & Wang, W. Y. (2021). FinQA: A dataset of numerical reasoning over financial data. Proceedings of EMNLP, 3697-3711.

Choi, C., Lin, K., Nguyen, A., & Linq AI Research. (2025). FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation. arXiv:2504.15800.

Cong, L. W., & He, Z. (2019). Blockchain disruption and smart contracts. The Review of Financial Studies, 32(5), 1754-1797.

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Guo, M., Wang, H., & Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv.

Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G. Z. (2019). XAI-Explainable artificial intelligence. Science Robotics, 4(37), eaay7120.

Harvey, C. R., Ramachandran, A., & Santoro, J. (2021). DeFi and the future of finance. Wiley.

Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. T. (2020). Dense passage retrieval for open-domain question answering. Proceedings of EMNLP, 6769-6781.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuettler, H., Lewis, M., Yih, W. T., Rocktaeschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.

Manning, C. D., Raghavan, P., & Schuetze, H. (2008). Introduction to information retrieval. Cambridge University Press.

Nogueira, R., & Cho, K. (2019). Passage re-ranking with BERT. arXiv.

Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333-389.

Targa Receivables LLC. (2013). Receivables Purchase Agreement dated January 10, 2013. U.S. Securities and Exchange Commission, Exhibit 10.1.

Thakur, N., Reimers, N., Rueckle, A., Srivastava, A., & Gurevych, I. (2021). BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. Proceedings of NeurIPS Datasets and Benchmarks.

U.S. Securities and Exchange Commission. (2026). Financial Statement and Notes Data Sets. Division of Economic and Risk Analysis.

Voorhees, E. M. (2002). The TREC question answering track. Natural Language Engineering, 8(4), 361-378.

Yang, Y., Uy, M. C. S., & Huang, A. (2020). FinBERT: A pretrained language model for financial communications. arXiv.

Zhu, F., Lei, W., Huang, Y., Wang, C., Zhang, S., Lv, J., Feng, F., & Chua, T. S. (2021). TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. Proceedings of ACL, 3277-3287.

Accounting-Aware Evidence Retrieval for Institutional Due Diligence of Tokenized Trade Receivable RWA

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

full sidebar