Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.01431
Cited By
Benchmarking Large Language Models in Retrieval-Augmented Generation
4 September 2023
Jiawei Chen
Hongyu Lin
Xianpei Han
Le Sun
3DV
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Benchmarking Large Language Models in Retrieval-Augmented Generation"
37 / 37 papers shown
Title
Towards Requirements Engineering for RAG Systems
Tor Sporsem
Rasmus Ulfsnes
14
0
0
12 May 2025
CDE-Mapper: Using Retrieval-Augmented Language Models for Linking Clinical Data Elements to Controlled Vocabularies
Komal Gilani
Marlo Verket
Christof Peters
Michel Dumontier
Hans-Peter Brunner-La Rocca
V. Urovi
32
0
0
07 May 2025
Retrieval Augmented Generation Evaluation for Health Documents
Mario Ceresa
Lorenzo Bertolini
Valentin Comte
Nicholas Spadaro
Barbara Raffael
...
Sergio Consoli
Amalia Muñoz Piñeiro
Alex Patak
Maddalena Querci
Tobias Wiesenthal
RALM
3DV
31
0
1
07 May 2025
Traceback of Poisoning Attacks to Retrieval-Augmented Generation
Baolei Zhang
Haoran Xin
Minghong Fang
Zhuqing Liu
Biao Yi
Tong Li
Zheli Liu
SILM
AAML
59
0
0
30 Apr 2025
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Lorenz Brehme
Thomas Ströhle
Ruth Breu
56
0
0
28 Apr 2025
The Viability of Crowdsourcing for RAG Evaluation
Lukas Gienapp
Tim Hagen
Maik Frobe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
21
0
0
22 Apr 2025
TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator
Deepak Vungarala
Mohammed E. Elbtity
Sumiya Syed
Sakila Alam
Kartik Pandit
Arnob Ghosh
Ramtin Zand
Shaahin Angizi
29
1
0
07 Mar 2025
Model-Based Offline Reinforcement Learning with Reliability-Guaranteed Sequence Modeling
Shenghong He
OffRL
90
0
0
10 Feb 2025
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
Robert Friel
Masha Belyi
Atindriyo Sanyal
72
18
0
17 Jan 2025
Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions
Aidan Hogan
Xin Luna Dong
Denny Vrandečić
Gerhard Weikum
50
1
0
12 Jan 2025
Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification
Kyle Thompson
Nuno Saavedra
Pedro Carrott
Kevin Fisher
Alex Sanchez-Stern
Yuriy Brun
J. Ferreira
Sorin Lerner
E. First
LRM
98
1
0
18 Dec 2024
RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards
Xinze Li
Sen Mei
Zhenghao Liu
Yukun Yan
Shuo Wang
...
H. Chen
Ge Yu
Zhiyuan Liu
Maosong Sun
Chenyan Xiong
37
6
0
17 Oct 2024
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
Nandan Thakur
Suleman Kazi
Ge Luo
Jimmy J. Lin
Amin Ahmad
VLM
RALM
26
6
0
17 Oct 2024
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
Songshuo Lu
Hua Wang
Yutian Rong
Zhi Chen
Yaohua Tang
VLM
28
11
0
10 Oct 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming
Senthil Purushwalkam
Shrey Pandit
Zixuan Ke
Xuan-Phi Nguyen
Caiming Xiong
Shafiq R. Joty
HILM
110
16
0
30 Sep 2024
Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation
Quanting Xie
So Yeon Min
Tianyi Zhang
Kedi Xu
Aarav Bajaj
Ruslan Salakhutdinov
Matthew Johnson-Roberson
Yonatan Bisk
Matthew Johnson-Roberson
Yonatan Bisk
LM&Ro
52
7
0
26 Sep 2024
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines
Dongzhi Jiang
Renrui Zhang
Ziyu Guo
Yanmin Wu
Jiayi Lei
...
Guanglu Song
Peng Gao
Yu Liu
Chunyuan Li
Hongsheng Li
MLLM
27
16
0
19 Sep 2024
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Jaehong Cho
Minsu Kim
Hyunmin Choi
Guseul Heo
Jongse Park
38
8
0
10 Aug 2024
High-Throughput Phenotyping of Clinical Text Using Large Language Models
D. B. Hier
S. I. Munzir
Anne Stahlfeld
Tayo Obafemi-Ajayi
M. Carrithers
LM&MA
43
1
0
02 Aug 2024
PersLLM: A Personified Training Approach for Large Language Models
Zheni Zeng
Jiayi Chen
Huimin Chen
Yukun Yan
Yuxuan Chen
Zhenghao Liu
Zhiyuan Liu
Maosong Sun
LLMAG
24
2
0
17 Jul 2024
Better RAG using Relevant Information Gain
Marc Pickett
Jeremy Hartman
Ayan Kumar Bhowmick
Raquib-ul Alam
Aditya Vempaty
RALM
28
3
0
16 Jul 2024
A Chatbot for Asylum-Seeking Migrants in Europe
Bettina Fazzinga
Elena Palmieri
Margherita Vestoso
Luca Bolognini
Andrea Galassi
F. Furfaro
Paolo Torroni
17
0
0
12 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
33
41
0
09 Jul 2024
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems
Florin Cuconasu
Giovanni Trappolini
Nicola Tonellotto
Fabrizio Silvestri
51
2
0
21 Jun 2024
SEC-QA: A Systematic Evaluation Corpus for Financial QA
Viet Dac Lai
Michael Krumdick
Charles Lovering
Varshini Reddy
Craig W. Schmidt
Chris Tanner
41
3
0
20 Jun 2024
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Tim Franzmeyer
Aleksandar Shtedritski
Samuel Albanie
Philip H. S. Torr
João F. Henriques
Jakob N. Foerster
22
1
0
05 Jun 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
27
63
0
30 May 2024
Evaluation of Retrieval-Augmented Generation: A Survey
Hao Yu
Aoran Gan
Kai Zhang
Shiwei Tong
Qi Liu
Zhaofeng Liu
3DV
57
78
0
13 May 2024
ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence
Kevin Wu
Eric Wu
James Y. Zou
AAML
53
39
0
16 Apr 2024
Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning
Teo Susnjak
Peter Hwang
N. Reyes
A. Barczak
Timothy R. McIntosh
Surangika Ranathunga
55
22
0
08 Apr 2024
Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs
Shu Yang
Jiayuan Su
Han Jiang
Mengdi Li
Keyuan Cheng
Muhammad Asif Ali
Lijie Hu
Di Wang
16
5
0
30 Mar 2024
Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines
Lijia Ma
Xingchen Xu
Yong-Ming Tan
27
7
0
29 Feb 2024
Piecing Together Clues: A Benchmark for Evaluating the Detective Skills of Large Language Models
Zhouhong Gu
Lin Zhang
Jiangjie Chen
Haoning Ye
Xiaoxuan Zhu
...
Jianchen Wang
Yikai Zhang
Wenhao Huang
Yanghua Xiao
Hongwei Feng
RALM
ELM
28
0
0
11 Jul 2023
Rethinking with Retrieval: Faithful Large Language Model Inference
Hangfeng He
Hongming Zhang
Dan Roth
KELM
LRM
141
151
0
31 Dec 2022
Compositional Semantic Parsing with Large Language Models
Andrew Drozdov
Nathanael Scharli
Ekin Akyuurek
Nathan Scales
Xinying Song
Xinyun Chen
Olivier Bousquet
Denny Zhou
ReLM
LRM
187
91
0
29 Sep 2022
Factual Error Correction for Abstractive Summarization Models
Mengyao Cao
Yue Dong
Jiapeng Wu
Jackie C.K. Cheung
HILM
KELM
167
159
0
17 Oct 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1