LLM-Inspired Offline Reranking for Financial Search: Query Rewriting, Hybrid Retrieval, and Listwise Relevance Ranking on FiQA
DOI:
https://doi.org/10.51903/jtie.v5i1.537Keywords:
Financial information retrieval, FiQA, BEIR, query rewriting, hybrid retrieval, BM25, dense retrieval, LLM rerankingAbstract
Financial search has high practical value because investors and retail users often ask natural-language questions whose wording differs from relevant financial passages. This paper evaluates a multi-stage retrieval pipeline on FiQA, a financial question-answering retrieval collection in BEIR. The systems include BM25, Dense LSA, BM25-LSA hybrid retrieval, reciprocal-rank fusion, a compact linear reranker, fixed pointwise and listwise relevance rubrics inspired by LLM reranking, query rewriting, and the proposed query rewriting plus hybrid retrieval plus listwise reranking pipeline. The evaluation used the full 57,638-document FiQA corpus, 6,648 available queries, and the 648-query BEIR FiQA test qrels with 1,706 binary relevance judgments. BM25 was the best-performing system, with nDCG@10 = 0.2285, MAP = 0.1863, MRR = 0.2994, and Recall@100 = 0.5207. The proposed full pipeline underperformed BM25. The listwise rubric ranked second on nDCG@10 (0.2228) and improved over the pointwise rubric, suggesting that candidate-list normalization can be useful in this setting. The rubric rerankers are fixed local scoring rules, so these results should be read as an evaluation of LLM-inspired ranking logic rather than as a benchmark of an actual prompt-based LLM reranker. Dense LSA retrieval alone was weak (nDCG@10 = 0.0287), which shows the limitation of a conservative non-neural dense baseline in financial semantic matching. Query rewriting reduced average effectiveness. The findings recommend strong lexical baselines, conservative rewrite gating, and careful evaluation before adopting prompt-based or model-based LLM rerankers in financial search.
References
Amati, G., & van Rijsbergen, C. J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems, 20(4), 357–389. https://doi.org/10.1145/582415.582416
Binghua Zhou, Siming Zhao, & David Chao. (2023). LLM-Guided Energy-Aware A/B Testing for Consolidation and DVFS Policies via Power-Sensitivity Clustering. Journal of Advanced Computing Systems , 3(4), 12-30. https://doi.org/10.69987/JACS.2023.30402
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems (Vol. 33, pp. 1877–1901).
Cormack, G. V., Clarke, C. L. A., & Buettcher, S. (2009). Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 758–759). https://doi.org/10.1145/1571941.1572114
Daren Zheng, & Chenyu Li. (2024). Behavior-Level Jailbreak Resistance via Multi-Stage Refusal + Utility Preservation. Journal of Advanced Computing Systems , 4(1), 83-99. https://doi.org/10.69987/JACS.2024.40107
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT 2019 (pp. 4171–4186). https://doi.org/10.18653/v1/N19-1423
Gao, L., Ma, X., Lin, J., & Callan, J. (2023). Precise zero-shot dense retrieval without relevance labels. In Proceedings of ACL 2023 (pp. 1762–1777). https://doi.org/10.18653/v1/2023.acl-long.99
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446. https://doi.org/10.1145/582415.582418
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of EMNLP 2020 (pp. 6769–6781). https://doi.org/10.18653/v1/2020.emnlp-main.550
Khattab, O., & Zaharia, M. (2020). ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. In Proceedings of SIGIR 2020 (pp. 39–48). https://doi.org/10.1145/3397271.3401075
Kuo, M.-J., Zheng, D., & Hires, J. (2025). Federated topic-preference learning for knowledge-grounded chat with differential privacy. Journal of Technology Informatics and Engineering, 4(2). https://doi.org/10.51903/jtie.v4i2.502
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems (Vol. 33, pp. 9459–9474).
Maia, M., Handschuh, S., Freitas, A., Davis, B., McDermott, R., Zarrouk, M., & Balahur, A. (2018). WWW’18 open challenge: Financial opinion mining and question answering. In Companion Proceedings of the Web Conference 2018 (pp. 1941–1942). https://doi.org/10.1145/3184558.3192301
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.
Muennighoff, N., Tazi, N., Magne, L., & Reimers, N. (2023). MTEB: Massive text embedding benchmark. In Proceedings of EACL 2023 (pp. 2014–2037). https://doi.org/10.18653/v1/2023.eacl-main.148
Nogueira, R., & Cho, K. (2019). Passage re-ranking with BERT. arXiv. https://arxiv.org/abs/1901.04085
Nogueira, R., Jiang, Z., & Lin, J. (2020). Document ranking with a pretrained sequence-to-sequence model. In Findings of EMNLP 2020 (pp. 708–718). https://doi.org/10.18653/v1/2020.findings-emnlp.63
Pradeep, R., Sharifymoghaddam, S., & Lin, J. (2023a). RankVicuna: Zero-shot listwise document reranking with open-source large language models. arXiv. https://arxiv.org/abs/2309.15088
Pradeep, R., Sharifymoghaddam, S., & Lin, J. (2023b). RankZephyr: Effective and robust zero-shot listwise reranking is a breeze! arXiv. https://arxiv.org/abs/2312.02724
Qin, Z., Jagerman, R., Hui, K., Zhuang, H., Wu, J., Yan, L., Shen, J., Liu, T., Liu, J., Metzler, D., Wang, X., & Bendersky, M. (2023). Large language models are effective text rankers with pairwise ranking prompting. arXiv. https://arxiv.org/abs/2306.17563
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of EMNLP-IJCNLP 2019 (pp. 3982–3992). https://doi.org/10.18653/v1/D19-1410
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC-3) (pp. 109–126). National Institute of Standards and Technology.
Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389. https://doi.org/10.1561/1500000019
Sun, W., Yan, L., Ma, X., Wang, S., Ren, P., Chen, Z., Yin, D., & Ren, Z. (2023). Is ChatGPT good at search? Investigating large language models as re-ranking agents. In Proceedings of EMNLP 2023 (pp. 14918–14937). https://doi.org/10.18653/v1/2023.emnlp-main.923
Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., & Gurevych, I. (2021). BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Advances in Neural Information Processing Systems (Vol. 34, pp. 7981–7997).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30).
Voorhees, E. M. (2002). The philosophy of information retrieval evaluation. In Revised Papers from CLEF 2001 (pp. 355–370). Springer. https://doi.org/10.1007/3-540-45691-0_34
Wang, L., Yang, N., Huang, X., Jiao, B., Jiang, D., Majumder, R., & Wei, F. (2022). Text embeddings by weakly-supervised contrastive pre-training. arXiv. https://arxiv.org/abs/2212.03533
Xinzhuo Sun, Jing Chen, Binghua Zhou, & Meng-Ju Kuo. (2024). ConRAG: Contradiction-Aware Retrieval-Augmented Generation under Multi-Source Conflicting Evidence. Journal of Advanced Computing Systems , 4(7), 50-64. https://doi.org/10.69987/JACS.2024.40705
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Siquan Meng, Jing Chen, Isa Zheng

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

