Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.13091
Cited By
Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization
22 May 2023
Chenhui Shen
Liying Cheng
Xuan-Phi Nguyen
Yang You
Lidong Bing
ELM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization"
49 / 49 papers shown
Title
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
61
0
0
26 Apr 2025
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language
Yoonshik Kim
Jaeyoon Jung
35
0
0
31 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
64
0
0
17 Mar 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
28
1
0
28 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
128
64
0
20 Jan 2025
Context-Aware Deep Learning for Multi Modal Depression Detection
Genevieve Lam
Huang Dongyan
Weisi Lin
26
0
0
26 Dec 2024
FedPT: Federated Proxy-Tuning of Large Language Models on Resource-Constrained Edge Devices
Zhidong Gao
Yu Zhang
Zhenxiao Zhang
Yanmin Gong
Yuanxiong Guo
13
0
0
01 Oct 2024
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
Xiang Dai
Sarvnaz Karimi
Biaoyan Fang
17
0
0
29 Sep 2024
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
Yuhao Wu
Ming Shan Hee
Zhiqing Hu
Roy Ka-Wei Lee
RALM
22
0
0
03 Sep 2024
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris M. Kitani
Kristen Grauman
VGen
42
2
0
01 Aug 2024
Automated Review Generation Method Based on Large Language Models
Shican Wu
Xiao Ma
Dehui Luo
Lulu Li
Xiangcheng Shi
...
Ran Luo
Chunlei Pei
Zhijian Zhao
Zhi-Jian Zhao
Jinlong Gong
69
0
0
30 Jul 2024
Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts
Sam Yu-Te Lee
Aryaman Bahukhandi
Dongyu Liu
Kwan-Liu Ma
AAML
20
4
0
16 Jul 2024
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao
Xiaoyuan Yi
Xing Xie
ELM
ALM
29
7
0
15 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
Jimmy Huang
ELM
ALM
19
25
0
04 Jul 2024
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
20
29
0
01 Jul 2024
PerSEval: Assessing Personalization in Text Summarizers
Sourish Dasgupta
Ankush Chander
Parth Borad
Isha Motiyani
Tanmoy Chakraborty
27
0
0
29 Jun 2024
Large Language Models have Intrinsic Self-Correction Ability
Dancheng Liu
Amir Nassereldine
Ziming Yang
Chenhui Xu
Yuting Hu
Jiajie Li
Utkarsh Kumar
Changjae Lee
Jinjun Xiong
KELM
ReLM
LRM
21
9
0
21 Jun 2024
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models
Haopeng Zhang
Philip S. Yu
Jiawei Zhang
30
1
0
17 Jun 2024
A Better LLM Evaluator for Text Generation: The Impact of Prompt Output Sequencing and Optimization
Kuanchao Chu
Yi-Pei Chen
Hideki Nakayama
32
9
0
14 Jun 2024
LLM as a Scorer: The Impact of Output Order on Dialogue Evaluation
Yi-Pei Chen
Kuanchao Chu
Hideki Nakayama
LRM
18
1
0
05 Jun 2024
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions
Ruochen Zhao
Wenxuan Zhang
Yew Ken Chia
Deli Zhao
Lidong Bing
30
9
0
30 May 2024
Lessons from the Trenches on Reproducible Evaluation of Language Models
Stella Biderman
Hailey Schoelkopf
Lintang Sutawika
Leo Gao
J. Tow
...
Xiangru Tang
Kevin A. Wang
Genta Indra Winata
Franccois Yvon
Andy Zou
ELM
ALM
120
16
3
23 May 2024
DEBATE: Devil's Advocate-Based Assessment and Text Evaluation
Alex G. Kim
Keonwoo Kim
Sangwon Yoon
ELM
16
1
0
16 May 2024
DOLOMITES: Domain-Specific Long-Form Methodical Tasks
Chaitanya Malaviya
Priyanka Agrawal
Kuzman Ganchev
Pranesh Srinivasan
Fantine Huot
Jonathan Berant
Mark Yatskar
Dipanjan Das
Mirella Lapata
Chris Alberti
24
6
0
09 May 2024
ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation
Ana Brassard
Benjamin Heinzerling
Keito Kudo
Keisuke Sakaguchi
Kentaro Inui
LRM
37
0
0
08 May 2024
Self-Improving Customer Review Response Generation Based on LLMs
Guy Azov
Tatiana Pelc
Adi Fledel Alon
Gila Kamhi
19
0
0
06 May 2024
METAL: Towards Multilingual Meta-Evaluation
Rishav Hada
Varun Gumma
Mohamed Ahmed
Kalika Bali
Sunayana Sitaram
ELM
24
2
0
02 Apr 2024
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations
Swapnaja Achintalwar
Adriana Alvarado Garcia
Ateret Anaby-Tavor
Ioana Baldini
Sara E. Berger
...
Aashka Trivedi
Kush R. Varshney
Dennis L. Wei
Shalisha Witherspooon
Marcel Zalmanovici
25
10
0
09 Mar 2024
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Sciré
Karim Ghonim
Roberto Navigli
HILM
14
6
0
04 Mar 2024
Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy
P. Schoenegger
Indre Tuminauskaite
Peter S. Park
Rafael Valdece Sousa Bastos
P. Tetlock
29
24
0
29 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
53
28
0
02 Feb 2024
PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models
Haochen Tan
Zhijiang Guo
Zhan Shi
Lu Xu
Zhili Liu
...
Xiaoguang Li
Yasheng Wang
Lifeng Shang
Qun Liu
Linqi Song
22
12
0
26 Jan 2024
Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges
Qingyao Li
Lingyue Fu
Weiming Zhang
Xianyu Chen
Jingwei Yu
Wei Xia
Weinan Zhang
Ruiming Tang
Yong Yu
AI4Ed
ELM
25
17
0
27 Dec 2023
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
Yixin Liu
Alexander R. Fabbri
Jiawen Chen
Yilun Zhao
Simeng Han
Shafiq R. Joty
Pengfei Liu
Dragomir R. Radev
Chien-Sheng Wu
Arman Cohan
ELM
36
57
0
15 Nov 2023
Exploring the Potential of Large Language Models in Computational Argumentation
Guizhen Chen
Liying Cheng
Anh Tuan Luu
Lidong Bing
LLMAG
LRM
9
21
0
15 Nov 2023
ODSum: New Benchmarks for Open Domain Multi-Document Summarization
Yijie Zhou
Kejian Shi
Wencai Zhang
Yixin Liu
Yilun Zhao
Arman Cohan
RALM
29
2
0
16 Sep 2023
Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation
Ran Zhang
Jihed Ouni
Steffen Eger
13
6
0
22 Jun 2023
Zero-Shot Listwise Document Reranking with a Large Language Model
Xueguang Ma
Xinyu Crystina Zhang
Ronak Pradeep
Jimmy J. Lin
65
48
0
03 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
206
559
0
03 May 2023
Large Language Models are Diverse Role-Players for Summarization Evaluation
Ning Wu
Ming Gong
Linjun Shou
Shining Liang
Daxin Jiang
57
44
0
27 Mar 2023
Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!
Yubo Ma
Yixin Cao
YongChing Hong
Aixin Sun
RALM
80
85
0
15 Mar 2023
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
MReD: A Meta-Review Dataset for Structure-Controllable Text Generation
Chenhui Shen
Liying Cheng
Ran Zhou
Lidong Bing
Yang You
Luo Si
39
33
0
14 Oct 2021
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
242
191
0
15 Sep 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
215
305
0
27 Apr 2021
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
170
3,504
0
10 Jun 2015
1