Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.02554
Cited By
Human-like Summarization Evaluation with ChatGPT
5 April 2023
Mingqi Gao
Jie Ruan
Renliang Sun
Xunjian Yin
Shiping Yang
Xiaojun Wan
ALM
AI4MH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Human-like Summarization Evaluation with ChatGPT"
31 / 31 papers shown
Title
From General to Targeted Rewards: Surpassing GPT-4 in Open-Ended Long-Context Generation
Zhihan Guo
Jiele Wu
Wenqian Cui
Yifei Zhang
Minda Hu
Yufei Wang
Irwin King
ALM
LRM
15
0
0
19 Jun 2025
Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance
Rudransh Agnihotri
Ananya Pandey
OffRL
ALM
62
0
0
06 Jun 2025
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models
Yuxuan Jiang
Dawei Li
Frank Ferraro
LRM
164
1
0
20 May 2025
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Manveer Singh Tamber
F. S. Bao
Chenyu Xu
Ge Luo
Suleman Kazi
Minseok Bae
Miaoran Li
Ofer Mendelevitch
Renyi Qu
Jimmy J. Lin
VLM
68
1
0
07 May 2025
Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts
Yifei Yu
Qian Zhang
Lingfeng Qiao
Di Yin
Fang Li
Jie Wang
Zheyu Chen
Suncong Zheng
Xiaolong Liang
Xingwu Sun
95
0
0
07 Apr 2025
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Shiran Dudy
Thulasi Tholeti
R. Ramachandranpillai
Muhammad Ali
Toby Jia-Jun Li
Ricardo Baeza-Yates
115
1
0
16 Mar 2025
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
Tao Feng
Yihang Sun
Jiaxuan You
161
1
0
16 Mar 2025
BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in Alignment
Sizhe Wang
Yongqi Tong
Hengyuan Zhang
Dawei Li
Xin Zhang
Tianlong Chen
212
10
0
21 Feb 2025
LAMD: Context-driven Android Malware Detection and Classification with LLMs
Xingzhi Qian
Xinran Zheng
Yiling He
Shuo Yang
Lorenzo Cavallaro
152
4
0
18 Feb 2025
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Dawei Li
Renliang Sun
Yue Huang
Ming Zhong
Bohan Jiang
Jiawei Han
Wei Wei
Wei Wang
Huan Liu
174
30
0
03 Feb 2025
Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance
Borui Xu
Yao Chen
Zeyi Wen
Weiguo Liu
Bingsheng He
186
2
0
02 Feb 2025
An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage
Fan Bu
Zheng Wang
Siyi Wang
Ziyao Liu
74
2
0
03 Jan 2025
Evaluate Summarization in Fine-Granularity: Auto Evaluation with LLM
Dong Yuan
Eti Rastogi
Fen Zhao
Sagar Goyal
Gautam Naik
Sree Prasanna Rajagopal
63
0
0
31 Dec 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
357
112
0
25 Nov 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
89
13
0
09 Oct 2024
Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues
Kuanchao Chu
Yi-Pei Chen
Hideki Nakayama
LLMAG
82
5
0
13 Jul 2024
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
Hanlei Jin
Yang Zhang
Dan Meng
Jun Wang
Jinghua Tan
249
96
0
05 Mar 2024
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
127
9
0
27 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
211
41
0
02 Feb 2024
Language Models Hallucinate, but May Excel at Fact Verification
Jian Guan
Jesse Dodge
David Wadden
Minlie Huang
Hao Peng
LRM
HILM
111
33
0
23 Oct 2023
Tuna: Instruction Tuning using Feedback from Large Language Models
Haoran Li
Yiran Liu
Xingxing Zhang
Wei Lu
Furu Wei
ALM
81
3
0
20 Oct 2023
Instructive Dialogue Summarization with Query Aggregations
Bin Wang
Zhengyuan Liu
Nancy F. Chen
87
3
0
17 Oct 2023
Improving Summarization with Human Edits
Zonghai Yao
Benjamin J Schloss
Sai P. Selvaraj
107
4
0
09 Oct 2023
Decoding ChatGPT: A Taxonomy of Existing Research, Current Challenges, and Possible Future Directions
S. Sohail
Faiza Farhat
Yassine Himeur
Mohammad Nadeem
D. Madsen
Yashbir Singh
Shadi Atalla
W. Mansoor
104
123
0
26 Jul 2023
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Yushi Bai
Jiahao Ying
Yixin Cao
Xin Lv
Yuze He
...
Yijia Xiao
Haozhe Lyu
Jiayin Zhang
Juanzi Li
Lei Hou
ALM
ELM
107
149
0
07 Jun 2023
Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios
Yilun Zhao
Haowei Zhang
Shengyun Si
Linyong Nan
Xiangru Tang
Arman Cohan
LMTD
104
12
0
24 May 2023
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Shafiq Joty
LRM
95
113
0
24 May 2023
DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4
Ye Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
H. Foroosh
Fei Liu
96
13
0
24 May 2023
On Learning to Summarize with Large Language Models as References
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
109
82
0
23 May 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
164
9
0
23 May 2023
DocAsRef: An Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely
F. S. Bao
Ruixuan Tu
Ge Luo
Yinfei Yang
Hebi Li
Minghui Qiu
Youbiao He
Cen Chen
75
2
0
20 Dec 2022
1