ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.01370
  4. Cited By
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

1 July 2024
Philippe Laban
Alexander R. Fabbri
Caiming Xiong
Chien-Sheng Wu
    RALM
ArXivPDFHTML

Papers citing "Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems"

36 / 36 papers shown
Title
LLMs Get Lost In Multi-Turn Conversation
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
21
0
0
09 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization
Adithya Pratapa
Teruko Mitamura
RALM
26
0
0
17 Apr 2025
ML For Hardware Design Interpretability: Challenges and Opportunities
ML For Hardware Design Interpretability: Challenges and Opportunities
Raymond Baartmans
Andrew Ensinger
Victor Agostinelli
Lizhong Chen
24
0
0
11 Apr 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
56
2
0
26 Mar 2025
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis
Bo Hu
Han Yuan
Vlad Pandelea
Wuqiong Luo
Yingzhu Zhao
Zheng Ma
45
0
0
20 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Shafiq R. Joty
ELM
87
2
0
19 Mar 2025
RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration
RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration
Hong Qing Yu
Frank McQuade
41
1
0
14 Mar 2025
Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation
Junhao Zhang
Richong Zhang
Fanshuang Kong
Ziyang Miao
Yanhan Ye
Yaowei Zheng
SyDa
41
0
0
10 Mar 2025
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
Zhibin Lan
Liqiang Niu
Fandong Meng
Jie Zhou
Jinsong Su
VLM
60
0
0
04 Mar 2025
U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack
Yunfan Gao
Yun Xiong
Wenlong Wu
Zijing Huang
Bohan Li
H. Wang
47
3
0
01 Mar 2025
Do Retrieval-Augmented Language Models Adapt to Varying User Needs?
Do Retrieval-Augmented Language Models Adapt to Varying User Needs?
Peilin Wu
Xinlu Zhang
Wenhao Yu
Xingyu Liu
Xinya Du
Zhiyu Zoey Chen
RALM
34
0
0
27 Feb 2025
Evaluating the Effect of Retrieval Augmentation on Social Biases
Evaluating the Effect of Retrieval Augmentation on Social Biases
Tianhui Zhang
Yi Zhou
Danushka Bollegala
31
0
0
24 Feb 2025
Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches
Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches
Adithya Pratapa
Teruko Mitamura
83
1
0
10 Feb 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
Gopi Krishnan Rajbahadur
G. Oliva
Dayi Lin
Ahmed E. Hassan
35
0
0
28 Jan 2025
Understanding Synthetic Context Extension via Retrieval Heads
Understanding Synthetic Context Extension via Retrieval Heads
Xinyu Zhao
Fangcong Yin
Greg Durrett
31
0
0
31 Dec 2024
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Jonathan Roberts
Kai Han
Samuel Albanie
LLMAG
62
0
0
07 Nov 2024
Long Context RAG Performance of Large Language Models
Long Context RAG Performance of Large Language Models
Quinn Leng
Jacob P. Portes
Sam Havens
Matei A. Zaharia
Michael Carbin
AIFin
RALM
3DV
25
6
0
05 Nov 2024
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
Kung-Hsiang Huang
Akshara Prabhakar
Sidharth Dhawan
Yixin Mao
Huan Wang
Silvio Savarese
Caiming Xiong
Philippe Laban
C. Wu
24
7
0
04 Nov 2024
On Positional Bias of Faithfulness for Long-form Summarization
On Positional Bias of Faithfulness for Long-form Summarization
David Wan
Jesse Vig
Mohit Bansal
Shafiq R. Joty
HILM
32
3
0
31 Oct 2024
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses
  with Sub-Question Coverage
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Kaige Xie
Philippe Laban
Prafulla Kumar Choubey
Caiming Xiong
C. Wu
16
0
0
20 Oct 2024
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Catarina G. Belem
Pouya Pezeskhpour
Hayate Iso
Seiji Maekawa
Nikita Bhutani
Estevam R. Hruschka
HILM
65
1
0
17 Oct 2024
Search Engines in an AI Era: The False Promise of Factual and Verifiable
  Source-Cited Responses
Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses
Pranav Narayanan Venkit
Philippe Laban
Yilun Zhou
Yixin Mao
C. Wu
ELM
17
7
0
15 Oct 2024
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Seiji Maekawa
Hayate Iso
Nikita Bhutani
RALM
72
1
0
15 Oct 2024
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning
  in LLMs
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Lei Wang
Shan Dong
Yuhui Xu
Hanze Dong
Yalu Wang
Amrita Saha
Ee-Peng Lim
Caiming Xiong
Doyen Sahoo
LRM
24
0
0
07 Oct 2024
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen
Tianyu Gao
Minmin Hou
Ke Ding
Daniel Fleischer
Peter Izsak
Moshe Wasserblat
Danqi Chen
ALM
ELM
38
24
0
03 Oct 2024
DeFine: Enhancing LLM Decision-Making with Factor Profiles and
  Analogical Reasoning
DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning
Yebowen Hu
Xiaoyang Wang
Wenlin Yao
Yiming Lu
Daoan Zhang
H. Foroosh
Dong Yu
Fei Liu
28
4
0
02 Oct 2024
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long
  Context Evaluation Tasks
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks
Zi Yang
18
0
0
10 Sep 2024
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned
  Language Models
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models
Karime Maamari
Fadhil Abubaker
Daniel Jaroslawicz
Amine Mhedhbi
LRM
50
19
0
14 Aug 2024
Introducing a new hyper-parameter for RAG: Context Window Utilization
Introducing a new hyper-parameter for RAG: Context Window Utilization
Kush Juvekar
A. Purwar
27
3
0
29 Jul 2024
Perceptions of Linguistic Uncertainty by Language Models and Humans
Perceptions of Linguistic Uncertainty by Language Models and Humans
Catarina G Belém
Markelle Kelly
M. Steyvers
Sameer Singh
Padhraic Smyth
33
1
0
22 Jul 2024
One Thousand and One Pairs: A "novel" challenge for long-context
  language models
One Thousand and One Pairs: A "novel" challenge for long-context language models
Marzena Karpinska
Katherine Thai
Kyle Lo
Tanya Goyal
Mohit Iyyer
LRM
28
10
0
24 Jun 2024
MileBench: Benchmarking MLLMs in Long Context
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
53
34
0
29 Apr 2024
Benchmarking Large Language Models in Complex Question Answering
  Attribution using Knowledge Graphs
Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs
Nan Hu
Jiaoyan Chen
Yike Wu
Guilin Qi
Sheng Bi
Tongtong Wu
Jeff Z. Pan
HILM
32
8
0
26 Jan 2024
M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context
  Evaluation Benchmark for Large Language Models
M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
Wai-Chung Kwan
Xingshan Zeng
Yufei Wang
Yusen Sun
Liangyou Li
Lifeng Shang
Qun Liu
Kam-Fai Wong
ELM
86
10
0
30 Oct 2023
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
167
3,504
0
10 Jun 2015
1