Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.11944
Cited By
FinanceBench: A New Benchmark for Financial Question Answering
20 November 2023
Pranab Islam
Anand Kannappan
Douwe Kiela
Rebecca Qian
Nino Scherrer
Bertie Vidgen
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FinanceBench: A New Benchmark for Financial Question Answering"
50 / 53 papers shown
Title
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation
Qianren Mao
Qili Zhang
Hanwen Hao
Zhentao Han
Runhua Xu
...
Bo Li
Y. Song
Jin Dong
Jianxin Li
Philip S. Yu
63
0
0
27 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
SMARTFinRAG: Interactive Modularized Financial RAG Benchmark
Yiwei Zha
55
0
0
25 Apr 2025
FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation
Chanyeol Choi
Jihoon Kwon
Jaeseon Ha
Hojun Choi
Chaewoon Kim
Yongjae Lee
Jy-yong Sohn
Alejandro Lopez-Lira
RALM
54
0
0
22 Apr 2025
SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities
Noga Ben Yoash
Meni Brief
O. Ovadia
Gil Shenderovitz
Moshik Mishaeli
Rachel Lemberg
Eitam Sheetrit
ELM
AIFin
23
0
0
06 Apr 2025
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
Haolong Yan
Kaijun Tan
Yeqing Shen
Xin Huang
Zheng Ge
Xiangyu Zhang
Si Li
Daxin Jiang
VLM
35
0
0
27 Mar 2025
DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective
Changlun Li
Yao Shi
Yuyu Luo
Nan Tang
AIFin
47
0
0
24 Mar 2025
MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers
Yang Tian
Zheng Lu
Mingqi Gao
Zheng Liu
Bo Zhao
LRM
39
0
0
21 Mar 2025
MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering
Jialin Chen
Aosong Feng
Ziyu Zhao
Juan Garza
Gaukhar Nurbek
Cheng Qin
Ali Maatouk
Leandros Tassiulas
Yifeng Gao
Rex Ying
AI4TS
37
0
0
21 Mar 2025
Towards Lighter and Robust Evaluation for Retrieval Augmented Generation
Alex-Razvan Ispas
Charles-Elie Simon
Fabien Caspani
Vincent Guigue
RALM
65
0
0
20 Mar 2025
Optimizing Retrieval Strategies for Financial Question Answering Documents in Retrieval-Augmented Generation Systems
Sejong Kim
Hyunseo Song
Hyunwoo Seo
Hyunjun Kim
RALM
64
0
0
19 Mar 2025
Bridging Language Models and Financial Analysis
Alejandro Lopez-Lira
Jihoon Kwon
Sangwoon Yoon
Jy-yong Sohn
Chanyeol Choi
AIFin
36
0
0
14 Mar 2025
FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance
Fengbin Zhu
Junfeng Li
Liangming Pan
W. Wang
Fuli Feng
Chao Wang
Huanbo Luan
Tat-Seng Chua
AIFin
52
0
0
07 Mar 2025
Bián: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation
Zhouyu Jiang
Mengshu Sun
Zhiqiang Zhang
Lei Liang
RALM
3DV
146
0
0
26 Feb 2025
Position: Standard Benchmarks Fail -- LLM Agents Present Overlooked Risks for Financial Applications
Zichen Chen
Jiaao Chen
Jianda Chen
Misha Sra
ELM
34
1
0
21 Feb 2025
Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models
A. Narayan
D. Biderman
Sabri Eyuboglu
Avner May
Scott W. Linderman
James Zou
Christopher Ré
46
0
0
21 Feb 2025
REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark
Navve Wasserman
Roi Pony
O. Naparstek
Adi Raz Goldfarb
Eli Schwartz
Udi Barzelay
Leonid Karlinsky
3DV
VLM
70
1
0
17 Feb 2025
FinMTEB: Finance Massive Text Embedding Benchmark
Yixuan Tang
Yi Yang
AIFin
58
0
0
16 Feb 2025
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models
Haoyang Li
Xuejia Chen
Zhanchao Xu
Darian Li
Nicole Hu
...
Y. Li
Luyu Qiu
C. Zhang
Qing Li
Lei Chen
LRM
ELM
34
1
0
16 Feb 2025
Expect the Unexpected: FailSafe Long Context QA for Finance
Kiran Kamble
M. Russak
Dmytro Mozolevskyi
Muayad Ali
Mateusz Russak
Waseem Alshikh
72
0
0
10 Feb 2025
FLAME: Financial Large-Language Model Assessment and Metrics Evaluation
Jiayu Guo
Yu Guo
Martha Li
Songtao Tan
ELM
37
0
0
03 Jan 2025
Drowning in Documents: Consequences of Scaling Reranker Inference
Mathew Jacob
Erik Lindgren
Matei A. Zaharia
Michael Carbin
Omar Khattab
Andrew Drozdov
OffRL
74
4
0
18 Nov 2024
Greenback Bears and Fiscal Hawks: Finance is a Jungle and Text Embeddings Must Adapt
Peter Anderson
Mano Vikash Janardhanan
Jason He
Wei Cheng
Charlie Flanagan
RALM
29
3
0
11 Nov 2024
Long Context RAG Performance of Large Language Models
Quinn Leng
Jacob P. Portes
Sam Havens
Matei A. Zaharia
Michael Carbin
AIFin
RALM
3DV
36
6
0
05 Nov 2024
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning
Ziliang Gan
Yu Lu
D. Zhang
Haohan Li
Che Liu
...
Haipang Wu
Chaoyou Fu
Z. Xu
Rongjunchen Zhang
Yong Dai
47
0
0
05 Nov 2024
VERITAS: A Unified Approach to Reliability Evaluation
Rajkumar Ramamurthy
Meghana Arakkal Rajeev
Oliver Molenschot
James Y. Zou
Nazneen Rajani
HILM
31
1
0
05 Nov 2024
A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification
Sorouralsadat Fatemi
Yuheng Hu
Maryam Mousavi
26
0
0
04 Nov 2024
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
Jian Chen
R. Zhang
Yufan Zhou
Tong Yu
Franck Dernoncourt
J. Gu
Ryan Rossi
Changyou Chen
Tong Sun
29
0
0
02 Nov 2024
Opportunities and Challenges of Generative-AI in Finance
Akshar Prabhu Desai
Ganesh Satish Mallya
Mohammad Luqman
Tejasvi Ravi
Nithya Kota
Pranjul Yadav
AIFin
31
2
0
21 Oct 2024
FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering
Siqiao Xue
Tingting Chen
Fan Zhou
Qingyang Dai
Zhixuan Chu
Hongyuan Mei
34
4
0
06 Oct 2024
FLAG: Financial Long Document Classification via AMR-based GNN
Bolun "Namir" Xia
Aparna Gupta
Mohammed J. Zaki
AI4TS
AIFin
19
0
0
02 Oct 2024
Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM Performance -- A Case Study in Finance
Meni Brief
Oded Ovadia
Gil Shenderovitz
Noga Ben Yoash
Rachel Lemberg
Eitam Sheetrit
39
3
0
01 Oct 2024
DANA: Domain-Aware Neurosymbolic Agents for Consistency and Accuracy
Vinh Luong
Sang Dinh
Shruti Raghavan
William Nguyen
Zooey Nguyen
...
Kentaro Maegaito
Loc Nguyen
Thao Nguyen
Anh Hai Ha
Christopher Nguyen
26
0
0
27 Sep 2024
Do We Need Domain-Specific Embedding Models? An Empirical Investigation
Yixuan Tang
Yi Yang
AIFin
36
2
0
27 Sep 2024
ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models
Yuqing Huang
Rongyang Zhang
X. He
Xuyang Zhi
Hao Wang
...
Guoping Hu
Guiquan Liu
Qi Liu
Defu Lian
Enhong Chen
ELM
24
4
0
21 Sep 2024
KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models
Neel Rajani
Lilli Kiessling
Aleksandr Ogaltsov
Claus Lang
ALM
21
0
0
13 Sep 2024
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction
Bhaskarjit Sarmah
Benika Hall
Rohan Rao
Sunil Patel
Stefano Pasquali
Dhagash Mehta
27
34
0
09 Aug 2024
Lynx: An Open Source Hallucination Evaluation Model
Selvan Sunitha Ravi
B. Mielczarek
Anand Kannappan
Douwe Kiela
Rebecca Qian
VLM
RALM
HILM
46
17
0
11 Jul 2024
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Yubo Ma
Yuhang Zang
Liangyu Chen
Meiqi Chen
Yizhu Jiao
...
Liangming Pan
Yu-Gang Jiang
Jiaqi Wang
Yixin Cao
Aixin Sun
ELM
RALM
VLM
24
23
0
01 Jul 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
45
55
0
18 Jun 2024
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges
Yuqi Nie
Yaxuan Kong
Xiaowen Dong
John M. Mulvey
H. Vincent Poor
Qingsong Wen
Stefan Zohren
AIFin
38
40
0
15 Jun 2024
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset
Jie Zhu
Junhui Li
Yalong Wen
Lifan Guo
ELM
ALM
35
6
0
17 May 2024
CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations
Jiahao Zhao
Jingwei Zhu
Minghuan Tan
Min Yang
Di Yang
Chenhao Zhang
Guancheng Ye
Chengming Li
Xiping Hu
ELM
19
0
0
16 May 2024
FinTextQA: A Dataset for Long-form Financial Question Answering
Jian Chen
Peilin Zhou
Yining Hua
Yingxin Loh
Kehui Chen
Ziyuan Li
Bing Zhu
Junwei Liang
22
11
0
16 May 2024
Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study
Zooey Nguyen
Anthony Annunziata
Vinh Luong
Sang Dinh
Quynh Le
Anh Hai Ha
Chanh Le
Hong An Phan
Shruti Raghavan
Christopher Nguyen
LRM
29
3
0
17 Apr 2024
No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks
Gang Hu
Ke Qin
Chenhan Yuan
Min Peng
Alejandro Lopez-Lira
Benyou Wang
Sophia Ananiadou
Wanlong Yu
Jimin Huang
Qianqian Xie
22
4
0
10 Mar 2024
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs
Cem Uluoglakci
T. Taşkaya-Temizel
HILM
30
2
0
25 Feb 2024
FinBen: A Holistic Financial Benchmark for Large Language Models
Qianqian Xie
Weiguang Han
Zhengyu Chen
Ruoyu Xiang
Xiao Zhang
...
Yanzhao Lai
Hao Wang
Min Peng
Sophia Ananiadou
Jimin Huang
AIFin
33
31
0
20 Feb 2024
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Gagan Bhatia
El Moatez Billah Nagoudi
Hasan Cavusoglu
Muhammad Abdul-Mageed
AIFin
22
4
0
16 Feb 2024
CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain
Xin Tong
Bo Jin
Zhi Lin
Binjun Wang
Ting Yu
Qiang Cheng
ELM
17
0
0
11 Feb 2024
1
2
Next