v1v2 (latest)

ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems

North American Chapter of the Association for Computational Linguistics (NAACL), 2023

16 November 2023

ArXiv (abs)PDF HTML HuggingFace (6 upvotes)

Papers citing "ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems"

25 / 75 papers shown

Rationale-Guided Retrieval Augmented Generation for Medical Question AnsweringNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

461

01 Nov 2024

Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question CoverageNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Kaige Xie

Philippe Laban

Prafulla Kumar Choubey

Caiming Xiong

Chien-Sheng Wu

163

20 Oct 2024

Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the dataInternational Conference on Learning Representations (ICLR), 2024

407

17 Oct 2024

Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented GenerationInternational Conference on the Theory of Information Retrieval (ICTIR), 2024

To Eun Kim

Fernando Diaz

662

17 Sep 2024

HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications

Zekun Wu

Adriano Soares Koshiyama

Philip C. Treleaven

RALM AILaw

294

29 Aug 2024

Can Unconfident LLM Annotations Be Used for Confident Conclusions?North American Chapter of the Association for Computational Linguistics (NAACL), 2024

386

27 Aug 2024

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

293

05 Aug 2024

RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

...

Yishan Li

Zhiyuan Liu

Xu Han

Zhiyuan Liu

Maosong Sun

669

02 Aug 2024

Retrieval-Augmented Generation for Natural Language Processing: A Survey

Shangyu Wu

Yufei Cui

...

Xue Liu

455

18 Jul 2024

Evaluation of RAG Metrics for Question Answering in the Telecom Domain

289

15 Jul 2024

Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)

219

10 Jul 2024

BERGEN: A Benchmarking Library for Retrieval-Augmented Generation

304

01 Jul 2024

When Search Engine Services meet Large Language Models: Visions and Challenges

353

28 Jun 2024

Evaluating Quality of Answers for Retrieval-Augmented Generation: A Strong LLM Is All You Need

Yang Wang

Alberto Garcia Hernandez

Roman Kyslyi

Nicholas S. Kersting

316

26 Jun 2024

The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches

Bhashithe Abeysinghe

Ruhan Circi

ELM

289

05 Jun 2024

Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost

443

03 Jun 2024

Evaluation of Retrieval-Augmented Generation: A Survey

Kai Zhang

Qi Liu

393

195

13 May 2024

RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing

Yucheng Hu

Yuxing Lu

RALM

402

30 Apr 2024

Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented GenerationIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024

211

19 Apr 2024

A Survey on Retrieval-Augmented Text Generation for Large Language Models

Yizheng Huang

Jimmy X. Huang

3DV RALM

318

17 Apr 2024

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

638

16 Apr 2024

AutoEval Done Right: Using Synthetic Data for Model Evaluation

Pierre Boyeau

Anastasios Nikolas Angelopoulos

350

09 Mar 2024

Prediction-Powered Ranking of Large Language Models

Ivi Chatzi

Eleni Straitouri

Suhas Thejaswi

Manuel Gomez Rodriguez

ALM

453

27 Feb 2024

CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

Yuanjie Lyu

Enhong Chen

300

30 Jan 2024

Billion-scale similarity search with GPUsIEEE Transactions on Big Data (TBD), 2017

Jeff Johnson

Matthijs Douze

Edouard Grave

970

4,531

28 Feb 2017