v1v2v3 (latest)

L-Eval: Instituting Standardized Evaluation for Long Context Language Models

Annual Meeting of the Association for Computational Linguistics (ACL), 2023

20 July 2023

Lingpeng Kong

Xipeng Qiu

ELM

ALM

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)

Papers citing "L-Eval: Instituting Standardized Evaluation for Long Context Language Models"

50 / 138 papers shown

MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering

348

26 Feb 2025

The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Ting-Rui Chiang

Dani Yogatama

111

16 Feb 2025

Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

514

14 Feb 2025

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

617

10 Feb 2025

Can LLMs Maintain Fundamental Abilities under KV Cache Compression?

1.1K

04 Feb 2025

LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

435

25 Jan 2025

ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language ModelsInternational Conference on Computational Linguistics (COLING), 2024

468

20 Jan 2025

VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

364

14 Dec 2024

Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown

314

24 Nov 2024

LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context ScenariosAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

1.1K

11 Nov 2024

Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?International Conference on Learning Representations (ICLR), 2024

1.1K

07 Nov 2024

Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments

Karthick Thiyagarajan

Jodi Martin

LM&Ro

378

28 Oct 2024

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Junxuan Wang

...

Qipeng Guo

Xuanjing Huang

Zuxuan Wu

Yu-Gang Jiang

Xipeng Qiu

328

27 Oct 2024

ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information CoverageNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

341

22 Oct 2024

Distance between Relevant Information Pieces Causes Bias in Long-Context LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

...

319

18 Oct 2024

Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Xinyu Liu

Runsong Zhao

Pengcheng Huang

Chunyang Xiao

Bei Li

Jingang Wang

Tong Xiao

Jingbo Zhu

162

07 Oct 2024

MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

Lei Wang

Hanze Dong

Caiming Xiong

135

07 Oct 2024

LongGenBench: Long-context Generation BenchmarkConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Xiaowen Chu

375

05 Oct 2024

L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?

Juntao Li

Min Zhang

266

03 Oct 2024

HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly

Peter Izsak

Danqi Chen

369

03 Oct 2024

How to Train Long-Context Language Models (Effectively)Annual Meeting of the Association for Computational Linguistics (ACL), 2024

664

03 Oct 2024

Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices

Yuxiang Huang

Binhang Yuan

Xu Han

Chaojun Xiao

Zhiyuan Liu

RALM

469

02 Oct 2024

Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual UnderstandingInternational Conference on Learning Representations (ICLR), 2024

454

02 Oct 2024

Beyond Prompts: Dynamic Conversational Benchmarking of Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

David Castillo-Bolado

Joseph Davidson

Finlay Gray

Marek Rosa

266

30 Sep 2024

Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks

Zi Yang

178

10 Sep 2024

LongGenBench: Benchmarking Long-Form Generation in Long Context LLMsInternational Conference on Learning Representations (ICLR), 2024

535

03 Sep 2024

MedDec: A Dataset for Extracting Medical Decisions from Discharge SummariesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

215

23 Aug 2024

HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Yao Mu

276

18 Aug 2024

Making Long-Context Language Models Better Multi-Hop ReasonersAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

294

06 Aug 2024

Long Input Benchmark for Russian Analysis

162

05 Aug 2024

Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption

533

25 Jul 2024

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

248

23 Jul 2024

ReAttention: Training-Free Infinite Context with Finite Attention Scope

Xiaoran Liu

Ruixiao Li

Linlin Li

Qun Liu

Xipeng Qiu

LLMAG

208

21 Jul 2024

SEED-Story: Multimodal Long Story Generation with Large Language Model

Yingcong Chen

404

11 Jul 2024

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Philippe Laban

Alexander R. Fabbri

Caiming Xiong

Chien-Sheng Wu

RALM

350

01 Jul 2024

Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP

443

29 Jun 2024

Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

Yang Song

Hengshu Zhu

Rui Yan

207

28 Jun 2024

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

Zheyang Xiong

Vasilis Papageorgiou

Kangwook Lee

Dimitris Papailiopoulos

SyDa RALM

250

27 Jun 2024

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

Minzheng Wang

Longze Chen

Cheng Fu

Shengyi Liao

Xinghua Zhang

...

Run Luo

Yunshui Li

Min Yang

Fei Huang

Yongbin Li

RALM

254

103

25 Jun 2024

LongIns: A Challenging Long-context Instruction-based Exam for LLMs

317

25 Jun 2024

One Thousand and One Pairs: A "novel" challenge for long-context language models

392

24 Jun 2024

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

696

120

24 Jun 2024

MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens

Yongqi Fan

Hongli Sun

Kui Xue

Xiaofan Zhang

Shaoting Zhang

Tong Ruan

302

21 Jun 2024

DoubleDipper: Improving Long-Context LLMs via Context Recycling

...

291

19 Jun 2024

What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling

137

17 Jun 2024

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-HaystackNeural Information Processing Systems (NeurIPS), 2024

Artyom Sorokin

RALM ALM LRM ReLM ELM

274

142

14 Jun 2024

3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position EncodingAAAI Conference on Artificial Intelligence (AAAI), 2024

Xindian Ma

Wenyuan Liu

Peng Zhang

Nan Xu

189

14 Jun 2024

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Guanting Dong

...

392

12 Jun 2024

Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding

Yixin Cao

251

04 Jun 2024

PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference

Yao Hu

239

112

21 May 2024