ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.11088
  4. Cited By
L-Eval: Instituting Standardized Evaluation for Long Context Language
  Models
v1v2v3 (latest)

L-Eval: Instituting Standardized Evaluation for Long Context Language Models

Annual Meeting of the Association for Computational Linguistics (ACL), 2023
20 July 2023
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
    ELMALM
ArXiv (abs)PDFHTMLHuggingFace (5 upvotes)

Papers citing "L-Eval: Instituting Standardized Evaluation for Long Context Language Models"

50 / 138 papers shown
MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering
MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering
Teng Lin
Yuyu Luo
Honglin Zhang
Jicheng Zhang
Chunlin Liu
Kaishun Wu
Nan Tang
RALM
348
7
0
26 Feb 2025
The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval
The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ting-Rui Chiang
Dani Yogatama
111
0
0
16 Feb 2025
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
Egor Cherepanov
Nikita Kachaev
A. Kovalev
Aleksandr I. Panov
OffRL
514
7
0
14 Feb 2025
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Sumin An
Junyoung Sung
Wonpyo Park
Chanjun Park
Paul Hongsuck Seo
617
0
0
10 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
1.1K
14
0
04 Feb 2025
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
Zhan Ling
Kang Liu
Kai Yan
Yue Yang
Weijian Lin
Ting-Han Fan
Lingfeng Shen
Zhengyin Du
Jiecao Chen
ReLMLRMELM
435
19
0
25 Jan 2025
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language ModelsInternational Conference on Computational Linguistics (COLING), 2024
Thibaut Thonet
Jos Rozen
Laurent Besacier
RALM
468
7
0
20 Jan 2025
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Manan Suri
Puneet Mathur
Franck Dernoncourt
Kanika Goswami
Ryan Rossi
Dinesh Manocha
364
17
0
14 Dec 2024
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
Lifu Tu
Rui Meng
Shafiq Joty
Yingbo Zhou
Semih Yavuz
HILM
314
2
0
24 Nov 2024
LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios
LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context ScenariosAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Xiaodong Wu
Minhao Wang
Yichen Liu
Xiaoming Shi
He Yan
Xiangju Lu
Junmin Zhu
Wei Zhang
1.1K
10
0
11 Nov 2024
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?International Conference on Learning Representations (ICLR), 2024
Jonathan Roberts
Kai Han
Samuel Albanie
LLMAG
1.1K
7
0
07 Nov 2024
Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments
Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments
Sangmim Song
S. Kodagoda
A. Gunatilake
Marc G. Carmichael
Karthick Thiyagarajan
Jodi Martin
LM&Ro
378
4
0
28 Oct 2024
Llama Scope: Extracting Millions of Features from Llama-3.1-8B with
  Sparse Autoencoders
Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders
Zhengfu He
Wentao Shu
Xuyang Ge
Lingjie Chen
Junxuan Wang
...
Qipeng Guo
Xuanjing Huang
Zuxuan Wu
Yu-Gang Jiang
Xipeng Qiu
328
74
0
27 Oct 2024
ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage
ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information CoverageNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Taewhoo Lee
Chanwoong Yoon
Kyochul Jang
Donghyeon Lee
Minju Song
Hyunjae Kim
Jaewoo Kang
ELM
341
11
0
22 Oct 2024
Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs
Distance between Relevant Information Pieces Causes Bias in Long-Context LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Runchu Tian
Yanghao Li
Yuepeng Fu
Siyang Deng
Qinyu Luo
...
Zhong Zhang
Yesai Wu
Yankai Lin
Huadong Wang
Xiaojiang Liu
319
8
0
18 Oct 2024
Forgetting Curve: A Reliable Method for Evaluating Memorization
  Capability for Long-context Models
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xinyu Liu
Runsong Zhao
Pengcheng Huang
Chunyang Xiao
Bei Li
Jingang Wang
Tong Xiao
Jingbo Zhu
162
5
0
07 Oct 2024
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning
  in LLMs
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Lei Wang
Shan Dong
Yuhui Xu
Hanze Dong
Yalu Wang
Amrita Saha
Ee-Peng Lim
Caiming Xiong
Doyen Sahoo
LRM
135
5
0
07 Oct 2024
LongGenBench: Long-context Generation Benchmark
LongGenBench: Long-context Generation BenchmarkConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xiang Liu
Peijie Dong
Xuming Hu
Xiaowen Chu
RALM
375
19
0
05 Oct 2024
L-CiteEval: Do Long-Context Models Truly Leverage Context for
  Responding?
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?
Zecheng Tang
Keyan Zhou
Juntao Li
Baibei Ji
Jianye Hou
Min Zhang
266
7
0
03 Oct 2024
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen
Tianyu Gao
Minmin Hou
Ke Ding
Daniel Fleischer
Peter Izsak
Moshe Wasserblat
Danqi Chen
ALMELM
369
71
0
03 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
664
90
0
03 Oct 2024
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
469
11
0
02 Oct 2024
Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding
Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual UnderstandingInternational Conference on Learning Representations (ICLR), 2024
Yanming Liu
Xinyue Peng
Jiannan Cao
Yanxin Shen
Yanxin Shen
Sheng Cheng
Xun Wang
Jianwei Yin
Xuhong Zhang
454
18
0
02 Oct 2024
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language
  Models
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024
David Castillo-Bolado
Joseph Davidson
Finlay Gray
Marek Rosa
266
15
0
30 Sep 2024
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long
  Context Evaluation Tasks
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks
Zi Yang
178
2
0
10 Sep 2024
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMsInternational Conference on Learning Representations (ICLR), 2024
Yuhao Wu
Ming Shan Hee
Zhiqing Hu
Roy Ka-wei Lee
RALM
535
0
0
03 Sep 2024
MedDec: A Dataset for Extracting Medical Decisions from Discharge
  Summaries
MedDec: A Dataset for Extracting Medical Decisions from Discharge SummariesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Mohamed Elgaar
Jiali Cheng
Nidhi Vakil
Hadi Amiri
Leo Anthony Celi
215
2
0
23 Aug 2024
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon
  Agent Tasks with Large Language Model
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Mengkang Hu
Tianxing Chen
Qiguang Chen
Yao Mu
Wenqi Shao
Ping Luo
LM&RoLLMAGRALM
276
36
0
18 Aug 2024
Making Long-Context Language Models Better Multi-Hop Reasoners
Making Long-Context Language Models Better Multi-Hop ReasonersAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yanyang Li
Shuo Liang
Michael R. Lyu
Liwei Wang
LLMAGLRM
294
26
0
06 Aug 2024
Long Input Benchmark for Russian Analysis
Long Input Benchmark for Russian Analysis
I. Churin
Murat Apishev
Maria Tikhonova
Denis Shevelev
Aydar Bulatov
Yuri Kuratov
Sergej Averkiev
Alena Fenogenova
162
2
0
05 Aug 2024
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache
  Consumption
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption
Shi Luohe
Hongyi Zhang
Yao Yao
Z. Li
Zhao Hai
533
93
0
25 Jul 2024
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive
  Study and Hybrid Approach
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach
Zhuowan Li
Cheng-rong Li
Mingyang Zhang
Qiaozhu Mei
Michael Bendersky
3DVRALM
248
96
0
23 Jul 2024
ReAttention: Training-Free Infinite Context with Finite Attention Scope
ReAttention: Training-Free Infinite Context with Finite Attention Scope
Xiaoran Liu
Ruixiao Li
Yuerong Song
Zhigeng Liu
Kai Lv
Hang Yan
Hang Yan
Linlin Li
Qun Liu
Xipeng Qiu
LLMAG
208
1
0
21 Jul 2024
SEED-Story: Multimodal Long Story Generation with Large Language Model
SEED-Story: Multimodal Long Story Generation with Large Language Model
Shuai Yang
Yuying Ge
Yang Li
Yukang Chen
Yixiao Ge
Mingyu Ding
Yingcong Chen
VGenDiffM
404
57
0
11 Jul 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Philippe Laban
Alexander R. Fabbri
Caiming Xiong
Chien-Sheng Wu
RALM
350
86
0
01 Jul 2024
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP
Omer Goldman
Alon Jacovi
Aviv Slobodkin
Aviya Maimon
Ido Dagan
Reut Tsarfaty
443
18
0
29 Jun 2024
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
Hongzhan Lin
Ang Lv
Yuhan Chen
Chen Zhu
Yang Song
Hengshu Zhu
Rui Yan
207
22
0
28 Jun 2024
From Artificial Needles to Real Haystacks: Improving Retrieval
  Capabilities in LLMs by Finetuning on Synthetic Data
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Zheyang Xiong
Vasilis Papageorgiou
Kangwook Lee
Dimitris Papailiopoulos
SyDaRALM
250
19
0
27 Jun 2024
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended
  Multi-Doc QA
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA
Minzheng Wang
Longze Chen
Cheng Fu
Shengyi Liao
Xinghua Zhang
...
Run Luo
Yunshui Li
Min Yang
Fei Huang
Yongbin Li
RALM
254
103
0
25 Jun 2024
LongIns: A Challenging Long-context Instruction-based Exam for LLMs
LongIns: A Challenging Long-context Instruction-based Exam for LLMs
Shawn Gavin
Tuney Zheng
Jiaheng Liu
Quehry Que
Noah Wang
Jian Yang
Chenchen Zhang
Wenhao Huang
Ge Zhang
LRMRALM
317
7
0
25 Jun 2024
One Thousand and One Pairs: A "novel" challenge for long-context
  language models
One Thousand and One Pairs: A "novel" challenge for long-context language models
Marzena Karpinska
Katherine Thai
Kyle Lo
Tanya Goyal
Mohit Iyyer
LRM
392
76
0
24 Jun 2024
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Ruoyu Qin
Zheming Li
Weiran He
Mingxing Zhang
Yongwei Wu
Weimin Zheng
Xinran Xu
696
120
0
24 Jun 2024
MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to
  200K Tokens
MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens
Yongqi Fan
Hongli Sun
Kui Xue
Xiaofan Zhang
Shaoting Zhang
Tong Ruan
302
10
0
21 Jun 2024
DoubleDipper: Improving Long-Context LLMs via Context Recycling
DoubleDipper: Improving Long-Context LLMs via Context Recycling
Arie Cattan
Alon Jacovi
Alex Fabrikant
Jonathan Herzig
Roee Aharoni
...
Dror Marcus
Avinatan Hassidim
Yossi Matias
Idan Szpektor
Avi Caciularu
RALM
291
0
0
19 Jun 2024
What Kinds of Tokens Benefit from Distant Text? An Analysis on Long
  Context Language Modeling
What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling
Yutong Hu
Quzhe Huang
Kangcheng Luo
Yansong Feng
137
2
0
17 Jun 2024
BABILong: Testing the Limits of LLMs with Long Context
  Reasoning-in-a-Haystack
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-HaystackNeural Information Processing Systems (NeurIPS), 2024
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Ivan Rodkin
Dmitry Sorokin
Artyom Sorokin
Andrey Kravchenko
RALMALMLRMReLMELM
274
142
0
14 Jun 2024
3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position
  Encoding
3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position EncodingAAAI Conference on Artificial Intelligence (AAAI), 2024
Xindian Ma
Wenyuan Liu
Peng Zhang
Nan Xu
189
8
0
14 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
392
27
0
12 Jun 2024
Analyzing Temporal Complex Events with Large Language Models? A
  Benchmark towards Temporal, Long Context Understanding
Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding
Zhihan Zhang
Yixin Cao
Chenchen Ye
Yunshan Ma
Lizi Liao
Tat-Seng Chua
251
27
0
04 Jun 2024
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM
  Inference
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
Dongjie Yang
Xiaodong Han
Yan Gao
Yao Hu
Shilin Zhang
Hai Zhao
239
112
0
21 May 2024
Previous
123
Next