ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.01370
  4. Cited By
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

1 July 2024
Philippe Laban
Alexander R. Fabbri
Caiming Xiong
Chien-Sheng Wu
    RALM
ArXiv (abs)PDFHTMLHuggingFace (90 upvotes)

Papers citing "Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems"

50 / 59 papers shown
NAMeGEn: Creative Name Generation via A Novel Agent-based Multiple Personalized Goal Enhancement Framework
NAMeGEn: Creative Name Generation via A Novel Agent-based Multiple Personalized Goal Enhancement Framework
Shanlin Zhou
Xinpeng Wang
Jianxun Lian
Zhenghao Liu
L. Lakshmanan
Xiaoyuan Yi
Yongtao Hao
LLMAG
342
0
0
19 Nov 2025
Stress Testing Factual Consistency Metrics for Long-Document Summarization
Stress Testing Factual Consistency Metrics for Long-Document Summarization
Zain Muhammad Mujahid
Dustin Wright
Isabelle Augenstein
HILM
193
0
0
10 Nov 2025
From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering
From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering
Lei Li
Xiao Zhou
Y. Zhang
X. Wu
RALMMedIm
155
0
0
21 Oct 2025
Glyph: Scaling Context Windows via Visual-Text Compression
Glyph: Scaling Context Windows via Visual-Text Compression
Jiale Cheng
Y. Liu
X. Zhang
Yulin Fei
Wenyi Hong
...
Xiao-Yang Liu
Yushi Bai
Jie Tang
Hongning Wang
Shiyu Huang
VLM
118
6
0
20 Oct 2025
PRISM: Agentic Retrieval with LLMs for Multi-Hop Question Answering
PRISM: Agentic Retrieval with LLMs for Multi-Hop Question Answering
Md Mahadi Hasan Nahid
Davood Rafiei
RALM
163
0
0
16 Oct 2025
Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQL
Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQL
Md Mahadi Hasan Nahid
Davood Rafiei
Weiwei Zhang
Yong Zhang
LRM
125
1
0
16 Oct 2025
Document Intelligence in the Era of Large Language Models: A Survey
Document Intelligence in the Era of Large Language Models: A Survey
Weishi Wang
Hengchang Hu
Zhijie Zhang
Zhaochen Li
Hongxin Shao
Daniel Dahlmeier
AI4TS
188
0
0
15 Oct 2025
Attribution Gradients: Incrementally Unfolding Citations for Critical Examination of Attributed AI Answers
Attribution Gradients: Incrementally Unfolding Citations for Critical Examination of Attributed AI Answers
Hita Kambhamettu
Alyssa Hwang
Philippe Laban
Andrew Head
140
0
0
01 Oct 2025
ClaimIQ at CheckThat! 2025: Comparing Prompted and Fine-Tuned Language Models for Verifying Numerical Claims
ClaimIQ at CheckThat! 2025: Comparing Prompted and Fine-Tuned Language Models for Verifying Numerical Claims
Anirban Saha Anik
Md Fahimul Kabir Chowdhury
Andrew Wyckoff
Sagnik Ray Choudhury
128
1
0
15 Sep 2025
Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization
Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization
Chuyuan Li
Austin Xu
Shafiq Joty
Giuseppe Carenini
BDL
152
0
0
11 Sep 2025
EviNote-RAG: Enhancing RAG Models via Answer-Supportive Evidence Notes
EviNote-RAG: Enhancing RAG Models via Answer-Supportive Evidence Notes
Yuqin Dai
Guoqing Wang
Yuan Wang
Kairan Dou
Kaichen Zhou
...
Can Yi
Changhua Meng
Yuchen Zhou
Yongliang Shen
Shuai Lu
RALM
231
4
0
31 Aug 2025
Memory Limitations of Prompt Tuning in Transformers
Memory Limitations of Prompt Tuning in Transformers
Maxime Meyer
Mario Michelessa
C. Chaux
Vincent Y. F. Tan
VLM
132
0
0
30 Aug 2025
OpinioRAG: Towards Generating User-Centric Opinion Highlights from Large-scale Online Reviews
OpinioRAG: Towards Generating User-Centric Opinion Highlights from Large-scale Online Reviews
Mir Tafseer Nayeem
Davood Rafiei
145
0
0
30 Aug 2025
The Rarity Blind Spot: A Framework for Evaluating Statistical Reasoning in LLMs
The Rarity Blind Spot: A Framework for Evaluating Statistical Reasoning in LLMs
Seiji Maekawa
Hayate Iso
Nikita Bhutani
137
0
0
29 Aug 2025
LLM Chatbot-Creation Approaches
LLM Chatbot-Creation Approaches
Hemil Mehta
Tanvi Raut
Kohav Yadav
Edward F. Gehringer
120
0
0
28 Aug 2025
Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts
Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts
Jiaqi Deng
Yuho Lee
Nicole Hee-Yeon Kim
Hyangsuk Min
Taewon Yun
Minjeong Ban
Kim Yul
Hwanjun Song
86
1
0
27 Aug 2025
Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models
Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models
Tobias Schreieder
Tim Schopf
Michael Färber
HILM
128
1
0
21 Aug 2025
BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation
BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation
Yuhao Wang
Ruiyang Ren
Yucheng Wang
Jing Liu
Wayne Xin Zhao
Hua Wu
Haifeng Wang
170
0
0
07 Aug 2025
NeedleChain: Measuring Intact Context Comprehension Capability of Large Language Models
NeedleChain: Measuring Intact Context Comprehension Capability of Large Language Models
Hyeonseok Moon
Heuiseok Lim
LLMAGRALMLRM
226
0
0
30 Jul 2025
Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models
Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
J. Wu
Gefei Gu
Yanan Zheng
Dit-Yan Yeung
Arman Cohan
LLMAGELM
203
3
0
13 Jul 2025
GenerationPrograms: Fine-grained Attribution with Executable Programs
GenerationPrograms: Fine-grained Attribution with Executable Programs
David Wan
Eran Hirsch
Elias Stengel-Eskin
Ido Dagan
Mohit Bansal
251
0
0
17 Jun 2025
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Wuwei Zhang
Fangcong Yin
Howard Yen
Danqi Chen
Xi Ye
LRM
293
4
0
11 Jun 2025
Team Anotheroption at SemEval-2025 Task 8: Bridging the Gap Between Open-Source and Proprietary LLMs in Table QA
Team Anotheroption at SemEval-2025 Task 8: Bridging the Gap Between Open-Source and Proprietary LLMs in Table QA
Nikolas Evkarpidi
Elena Tutubalina
LMTD
310
1
0
11 Jun 2025
GaRAGe: A Benchmark with Grounding Annotations for RAG Evaluation
GaRAGe: A Benchmark with Grounding Annotations for RAG EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ionut Teodor Sorodoc
Leonardo F. R. Ribeiro
Rexhina Blloshmi
Christopher Davis
Adria de Gispert
133
5
0
09 Jun 2025
Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs
Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs
Yifan Wang
Kenneth P. Birman
338
1
0
27 May 2025
MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language Models
MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhongzhan Huang
Guoming Ling
Shanshan Zhong
Hefeng Wu
Liang Lin
292
0
0
26 May 2025
LLMs Get Lost In Multi-Turn Conversation
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
356
107
0
09 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Qi Zhang
Tat-Seng Chua
Tianwei Zhang
ALMELM
548
23
0
26 Apr 2025
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization
Adithya Pratapa
Teruko Mitamura
RALM
216
0
0
17 Apr 2025
ML For Hardware Design Interpretability: Challenges and Opportunities
ML For Hardware Design Interpretability: Challenges and Opportunities
Raymond Baartmans
Andrew Ensinger
Victor Agostinelli
Lizhong Chen
183
1
0
11 Apr 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
Reasoning Beyond Limits: Advances and Open Problems for LLMsICT express (ICT Express), 2025
M. Ferrag
Norbert Tihanyi
Merouane Debbah
OffRLLRMAI4CEELM
804
18
0
26 Mar 2025
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis
Bo Hu
Han Yuan
Vlad Pandelea
Wuqiong Luo
Yingzhu Zhao
Zheng Ma
241
2
0
20 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual SettingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Shafiq Joty
ELM
372
13
0
19 Mar 2025
RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration
RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration
Hong Qing Yu
Frank McQuade
268
7
0
14 Mar 2025
Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation
Junhao Zhang
Richong Zhang
Fanshuang Kong
Ziyang Miao
Yanhan Ye
Yaowei Zheng
SyDa
133
2
0
10 Mar 2025
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
Zhibin Lan
Liqiang Niu
Fandong Meng
Jie Zhou
Jinsong Su
VLM
283
1
0
04 Mar 2025
U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack
Yunfan Gao
Yun Xiong
Wenlong Wu
Zijing Huang
Bohan Li
Haoyu Wang
297
10
0
01 Mar 2025
Do Retrieval-Augmented Language Models Adapt to Varying User Needs?
Do Retrieval-Augmented Language Models Adapt to Varying User Needs?
Peilin Wu
Xinlu Zhang
Wenhao Yu
Xingyu Liu
Xinya Du
Zhiyu Zoey Chen
RALM
413
1
0
27 Feb 2025
Evaluating the Effect of Retrieval Augmentation on Social Biases
Evaluating the Effect of Retrieval Augmentation on Social Biases
Tianhui Zhang
Yi Zhou
Danushka Bollegala
304
1
0
24 Feb 2025
Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches
Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text ApproachesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Adithya Pratapa
Teruko Mitamura
299
1
0
10 Feb 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
Gopi Krishnan Rajbahadur
G. Oliva
Dayi Lin
Ahmed E. Hassan
312
3
0
28 Jan 2025
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?International Conference on Learning Representations (ICLR), 2024
Jonathan Roberts
Kai Han
Samuel Albanie
LLMAG
1.0K
7
0
07 Nov 2024
Long Context RAG Performance of Large Language Models
Long Context RAG Performance of Large Language Models
Quinn Leng
Jacob P. Portes
Sam Havens
Matei A. Zaharia
Michael Carbin
AIFinRALM3DV
268
24
0
05 Nov 2024
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic EnvironmentsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Kung-Hsiang Huang
Akshara Prabhakar
Sidharth Dhawan
Yixin Mao
Huan Wang
Silvio Savarese
Caiming Xiong
Philippe Laban
Chien-Sheng Wu
474
30
0
04 Nov 2024
On Positional Bias of Faithfulness for Long-form Summarization
On Positional Bias of Faithfulness for Long-form SummarizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
David Wan
Jesse Vig
Joey Tianyi Zhou
Shafiq Joty
HILM
256
16
0
31 Oct 2024
Understanding Synthetic Context Extension via Retrieval Heads
Understanding Synthetic Context Extension via Retrieval Heads
Xinyu Zhao
Fangcong Yin
Greg Durrett
595
4
0
29 Oct 2024
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses
  with Sub-Question Coverage
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question CoverageNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Kaige Xie
Philippe Laban
Prafulla Kumar Choubey
Caiming Xiong
Chien-Sheng Wu
157
3
0
20 Oct 2024
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
From Single to Multi: How LLMs Hallucinate in Multi-Document SummarizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Catarina G. Belem
Pouya Pezeskhpour
Hayate Iso
Seiji Maekawa
Nikita Bhutani
Estevam R. Hruschka
HILM
349
11
0
17 Oct 2024
Enhancing LLM Trading Performance with Fact-Subjectivity Aware Reasoning
Enhancing LLM Trading Performance with Fact-Subjectivity Aware Reasoning
Qian Wang
Yuchen Gao
Zhenheng Tang
B. Luo
Bingsheng He
LRM
239
0
0
16 Oct 2024
Search Engines in an AI Era: The False Promise of Factual and Verifiable
  Source-Cited Responses
Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses
Pranav Narayanan Venkit
Philippe Laban
Yilun Zhou
Yixin Mao
Chien-Sheng Wu
ELM
242
15
0
15 Oct 2024
12
Next