Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

1 July 2024

Philippe Laban

Alexander R. Fabbri

Caiming Xiong

Chien-Sheng Wu

RALM

ArXiv (abs)PDF HTML HuggingFace (90 upvotes)

Papers citing "Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems"

50 / 59 papers shown

NAMeGEn: Creative Name Generation via A Novel Agent-based Multiple Personalized Goal Enhancement Framework

352

19 Nov 2025

Stress Testing Factual Consistency Metrics for Long-Document Summarization

Zain Muhammad Mujahid

Dustin Wright

Isabelle Augenstein

HILM

197

10 Nov 2025

From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering

156

21 Oct 2025

Glyph: Scaling Context Windows via Visual-Text Compression

...

120

20 Oct 2025

PRISM: Agentic Retrieval with LLMs for Multi-Hop Question Answering

Md Mahadi Hasan Nahid

Davood Rafiei

RALM

163

16 Oct 2025

Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQL

Md Mahadi Hasan Nahid

130

16 Oct 2025

Document Intelligence in the Era of Large Language Models: A Survey

190

15 Oct 2025

Attribution Gradients: Incrementally Unfolding Citations for Critical Examination of Attributed AI Answers

141

01 Oct 2025

ClaimIQ at CheckThat! 2025: Comparing Prompted and Fine-Tuned Language Models for Verifying Numerical Claims

Anirban Saha Anik

Md Fahimul Kabir Chowdhury

Andrew Wyckoff

Sagnik Ray Choudhury

136

15 Sep 2025

Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization

160

11 Sep 2025

EviNote-RAG: Enhancing RAG Models via Answer-Supportive Evidence Notes

...

243

31 Aug 2025

Memory Limitations of Prompt Tuning in Transformers

139

30 Aug 2025

OpinioRAG: Towards Generating User-Centric Opinion Highlights from Large-scale Online Reviews

Mir Tafseer Nayeem

Davood Rafiei

150

30 Aug 2025

The Rarity Blind Spot: A Framework for Evaluating Statistical Reasoning in LLMs

Seiji Maekawa

Hayate Iso

Nikita Bhutani

140

29 Aug 2025

LLM Chatbot-Creation Approaches

120

28 Aug 2025

Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts

27 Aug 2025

Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models

137

21 Aug 2025

BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation

180

07 Aug 2025

NeedleChain: Measuring Intact Context Comprehension Capability of Large Language Models

Hyeonseok Moon

Heuiseok Lim

LLMAG RALM LRM

233

30 Jul 2025

Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

218

13 Jul 2025

GenerationPrograms: Fine-grained Attribution with Executable Programs

251

17 Jun 2025

Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking

295

11 Jun 2025

Team Anotheroption at SemEval-2025 Task 8: Bridging the Gap Between Open-Source and Proprietary LLMs in Table QA

Nikolas Evkarpidi

Elena Tutubalina

LMTD

321

11 Jun 2025

GaRAGe: A Benchmark with Grounding Annotations for RAG EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Ionut Teodor Sorodoc

Leonardo F. R. Ribeiro

Rexhina Blloshmi

Christopher Davis

Adria de Gispert

134

09 Jun 2025

Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs

Yifan Wang

Kenneth P. Birman

350

27 May 2025

MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

292

26 May 2025

LLMs Get Lost In Multi-Turn Conversation

362

107

09 May 2025

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

...

554

26 Apr 2025

Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization

Adithya Pratapa

Teruko Mitamura

RALM

222

17 Apr 2025

ML For Hardware Design Interpretability: Challenges and Opportunities

190

11 Apr 2025

Reasoning Beyond Limits: Advances and Open Problems for LLMsICT express (ICT Express), 2025

814

26 Mar 2025

Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis

247

20 Mar 2025

Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual SettingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

375

19 Mar 2025

RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration

Hong Qing Yu

Frank McQuade

273

14 Mar 2025

Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation

134

10 Mar 2025

LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

291

04 Mar 2025

U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack

297

01 Mar 2025

Do Retrieval-Augmented Language Models Adapt to Varying User Needs?

413

27 Feb 2025

Evaluating the Effect of Retrieval Augmentation on Social Biases

Tianhui Zhang

Yi Zhou

Danushka Bollegala

315

24 Feb 2025

Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text ApproachesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

Adithya Pratapa

Teruko Mitamura

299

10 Feb 2025

From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap

Gopi Krishnan Rajbahadur

G. Oliva

Dayi Lin

Ahmed E. Hassan

316

28 Jan 2025

Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?International Conference on Learning Representations (ICLR), 2024

1.1K

07 Nov 2024

Long Context RAG Performance of Large Language Models

274

05 Nov 2024

CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic EnvironmentsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

477

04 Nov 2024

On Positional Bias of Faithfulness for Long-form SummarizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

256

31 Oct 2024

Understanding Synthetic Context Extension via Retrieval Heads

Xinyu Zhao

Fangcong Yin

Greg Durrett

595

29 Oct 2024

Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question CoverageNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Kaige Xie

Philippe Laban

Prafulla Kumar Choubey

Caiming Xiong

Chien-Sheng Wu

163

20 Oct 2024

From Single to Multi: How LLMs Hallucinate in Multi-Document SummarizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

350

17 Oct 2024

Enhancing LLM Trading Performance with Fact-Subjectivity Aware Reasoning

242

16 Oct 2024

Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses

Pranav Narayanan Venkit

248

15 Oct 2024