Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2311.09476
Cited By
v1
v2 (latest)
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
16 November 2023
Jon Saad-Falcon
Omar Khattab
Christopher Potts
Matei A. Zaharia
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (6 upvotes)
Papers citing
"ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems"
50 / 75 papers shown
Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy
Desheng Hu
Joachim Baumann
Aleksandra Urman
Elsa Lichtenegger
Robin Forsberg
Anikó Hannák
Christo Wilson
136
0
0
17 Nov 2025
EncouRAGe: Evaluating RAG Local, Fast, and Reliable
Jan Strich
Adeline Scharfenberg
Chris Biemann
Martin Semmann
133
0
0
31 Oct 2025
RCScore: Quantifying Response Consistency in Large Language Models
Dongjun Jang
Youngchae Ahn
Hyopil Shin
140
0
0
30 Oct 2025
Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation
Jiaying Wu
Zihang Fu
Haonan Wang
Fanxiao Li
Min-Yen Kan
Preslav Nakov
Min-Yen Kan
AI4MH
202
4
0
13 Oct 2025
VersionRAG: Version-Aware Retrieval-Augmented Generation for Evolving Documents
Daniel Huwiler
Kurt Stockinger
Jonathan Fürst
89
1
0
09 Oct 2025
Exposing Citation Vulnerabilities in Generative Engines
Riku Mochizuki
Shusuke Komatsu
Souta Noguchi
Kazuto Ataka
ELM
156
0
0
08 Oct 2025
Auto-ARGUE: LLM-Based Report Generation Evaluation
William Walden
Orion Weller
Orion Weller
Laura Dietz
Hannah Recknor
...
Gabrielle Kaili-May Liu
Yu Hou
Dawn J Lawrie
J. Mayfield
Eugene Yang
3DV
211
3
0
30 Sep 2025
TextMineX: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action
Chenyue Zhou
Gürkan Solmaz
Flavio Cirillo
Kiril Gashteovski
Jonathan Fürst
101
0
0
18 Sep 2025
Linguistic Nepotism: Trading-off Quality for Language Preference in Multilingual RAG
Dayeon Ki
Marine Carpuat
Paul McNamee
Daniel Khashabi
Eugene Yang
Dawn J Lawrie
Kevin Duh
294
2
0
17 Sep 2025
LLM Ensemble for RAG: Role of Context Length in Zero-Shot Question Answering for BioASQ Challenge
Dima Galat
Diego Mollá Aliod
90
2
0
10 Sep 2025
Noise or Nuance: An Investigation Into Useful Information and Filtering For LLM Driven AKBC
Alex Clay
Ernesto Jiménez-Ruiz
Pranava Madhyastha
113
1
0
10 Sep 2025
Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap
Jun Wang
Ninglun Gu
Kailai Zhang
Zijiao Zhang
Yelun Bao
...
Liwei Liu
Yihuan Liu
Pengyong Li
Gary G. Yen
Junchi Yan
ALM
ELM
222
0
0
26 Aug 2025
Real-Time RAG for the Identification of Supply Chain Vulnerabilities
Jesse Ponnock
Grace Kenneally
Michael Robert Briggs
Elinor Yeo
Tyrone Patterson III
Nicholas Kinberg
Matthew Kalinowski
David Hechtman
AIFin
123
0
0
23 Aug 2025
Test-time Corpus Feedback: From Retrieval to RAG
Mandeep Rathee
Venktesh V
Sean MacAvaney
Avishek Anand
RALM
3DV
307
2
0
21 Aug 2025
LongRecall: A Structured Approach for Robust Recall Evaluation in Long-Form Text
MohamamdJavad Ardestani
Ehsan Kamalloo
Davood Rafiei
115
1
0
20 Aug 2025
Can we Evaluate RAGs with Synthetic Data?
Jonas van Elburg
Peter van der Putten
Maarten Marx
SyDa
232
0
0
15 Aug 2025
When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs
Fangyi Yu
ELM
230
3
0
05 Aug 2025
PRGB Benchmark: A Robust Placeholder-Assisted Algorithm for Benchmarking Retrieval-Augmented Generation
ZheHao Tan
YiHan Jiao
Dan Yang
Lei Liu
Jie Feng
DuoLin Sun
Yue Shen
Jian Wang
Peng Wei
Jinjie Gu
94
1
0
23 Jul 2025
SEARA: An Automated Approach for Obtaining Optimal Retrievers
Zou Yuheng
Wang Yiran
Tian Yuzhu
Zhu Min
Huang Yanhua
RALM
136
0
0
09 Jul 2025
Benchmarking Vector, Graph and Hybrid Retrieval Augmented Generation (RAG) Pipelines for Open Radio Access Networks (ORAN)
Sarat Ahmad
Zeinab Nezami
Maryam Hafeez
Syed Ali Raza Zaidi
179
0
0
04 Jul 2025
A Vision for Geo-Temporal Deep Research Systems: Towards Comprehensive, Transparent, and Reproducible Geo-Temporal Information Synthesis
Bruno Martins
Piotr Szymañski
Piotr Gramacki
187
0
0
17 Jun 2025
Cost-Optimal Active AI Model Evaluation
Anastasios Nikolas Angelopoulos
Jacob Eisenstein
Jonathan Berant
Alekh Agarwal
Adam Fisch
200
3
0
09 Jun 2025
GaRAGe: A Benchmark with Grounding Annotations for RAG Evaluation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Ionut Teodor Sorodoc
Leonardo F. R. Ribeiro
Rexhina Blloshmi
Christopher Davis
Adria de Gispert
135
6
0
09 Jun 2025
Elementary Math Word Problem Generation using Large Language Models
Nimesh Ariyarathne
Harshani Bandara
Yasith Heshan
Omega Gamage
Surangika Ranathunga
...
Gayathri Lihinikaduarachchi
Tharoosha Vihidun
Meenambika Chandirakumar
Sanujen Premakumar
Sanjula Gathsara
AI4Ed
229
0
0
06 Jun 2025
Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Julia Belikova
Konstantin Polev
Rauf Parchiev
Dmitry Simakov
182
0
0
29 May 2025
Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers
Chaitanya Sharma
RALM
3DV
376
9
0
28 May 2025
CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xiaqiang Tang
Jian Li
Keyu Hu
Du Nan
Xiaolong Li
Xi Zhang
Weigao Sun
Sihong Xie
HILM
439
2
0
27 May 2025
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
Joao Coelho
Jingjie Ning
Jingyuan He
Kangrui Mao
Abhijay Paladugu
...
Jiahe Jin
Jamie Callan
João Magalhães
Bruno Martins
Chenyan Xiong
330
19
0
25 May 2025
FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain
Suifeng Zhao
Zhuoran Jin
Sujian Li
Jun Gao
269
3
0
23 May 2025
THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering
Udita Patel
Rutu Mulkar
Jay Roberts
Cibi Chakravarthy Senthilkumar
Sujay Gandhi
Xiaofei Zheng
Naumaan Nayyar
Parul Kalra
Rafael Castrillo
157
0
0
16 May 2025
Securing RAG: A Risk Assessment and Mitigation Framework
Swiss Conference on Data Science (SDS), 2025
Lukas Ammann
Sara Ott
Christoph R. Landolt
Marco P. Lehmann
SILM
367
6
0
13 May 2025
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Swiss Conference on Data Science (SDS), 2025
Lorenz Brehme
Thomas Ströhle
Ruth Breu
537
6
0
28 Apr 2025
The Viability of Crowdsourcing for RAG Evaluation
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Lukas Gienapp
Tim Hagen
Maik Fröbe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
421
2
0
22 Apr 2025
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Ronak Pradeep
Nandan Thakur
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
263
11
0
21 Apr 2025
Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges
Nandan Thakur
Ronak Pradeep
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
ELM
290
12
0
21 Apr 2025
CRAB: A Benchmark for Evaluating Curation of Retrieval-Augmented LLMs in Biomedicine
Hanmeng Zhong
Linqing Chen
Wentao Wu
Weilei Wang
458
0
0
15 Apr 2025
Automated Construction of a Knowledge Graph of Nuclear Fusion Energy for Effective Elicitation and Retrieval of Information
A. Loreti
K. Chen
R. George
R. Firth
A. Agnello
S. Tanaka
331
0
0
10 Apr 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Shafiq Joty
ELM
375
14
0
19 Mar 2025
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
Mingyue Cheng
Yucong Luo
Jie Ouyang
Qiang Liu
Huijie Liu
...
Bohou Zhang
Jiawei Cao
Jie Ma
Daoyu Wang
Tong Xu
3DV
367
37
0
11 Mar 2025
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
International Conference on Learning Representations (ICLR), 2025
Lu Dai
Yijie Xu
Jinhui Ye
Hao Liu
Hui Xiong
3DV
RALM
656
9
0
03 Mar 2025
Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks
Umar Ali Khan
Ekram Khan
Fiza Khan
A. A. Moinuddin
401
1
0
02 Mar 2025
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong
Kamilė Stankevičiūtė
Chao-gang Wan
Anmol Kabra
Raphael Thesmar
Johann Lee
Julius Klenke
Daniel Schwalbe-Koda
Kilian Q. Weinberger
LRM
RALM
327
4
0
27 Feb 2025
Reference-Aligned Retrieval-Augmented Question Answering over Heterogeneous Proprietary Documents
Nayoung Choi
Grace Byun
Andrew Chung
Ellie S. Paek
S. Lee
Jinho D. Choi
RALM
788
1
0
26 Feb 2025
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Shuliang Liu
Xinze Li
Zhenghao Liu
Shi Yu
Cheng Yang
Zheni Zeng
Zhiyuan Liu
Maosong Sun
Ge Yu
RALM
470
5
0
26 Feb 2025
LettuceDetect: A Hallucination Detection Framework for RAG Applications
Adam Kovacs
Gábor Recski
201
15
0
24 Feb 2025
Evaluation of Large Language Models via Coupled Token Generation
N. C. Benz
Stratis Tsirtsis
Eleni Straitouri
Ivi Chatzi
Ander Artola Velasco
Suhas Thejaswi
Manuel Gomez Rodriguez
368
3
0
03 Feb 2025
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
Robert Friel
Masha Belyi
Atindriyo Sanyal
443
61
0
17 Jan 2025
ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Mohita Chowdhury
Yajie Vera He
Jared Joselowitz
Aisling Higham
Ernest Lim
468
4
0
14 Jan 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
475
39
0
03 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
1.1K
287
0
25 Nov 2024
1
2
Next