Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.13800
Cited By
Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks
20 October 2023
Andrea Sottana
Bin Liang
Kai Zou
Zheng Yuan
ALM
ELM
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks"
39 / 39 papers shown
Title
Large Language Models as Span Annotators
Zdeněk Kasner
Vilém Zouhar
Patrícia Schmidtová
Ivan Kartáč
Kristýna Onderková
Ondřej Plátek
Dimitra Gkatzia
Saad Mahamood
Ondrej Dusek
Simone Balloccu
ALM
30
0
0
11 Apr 2025
Process Reward Modeling with Entropy-Driven Uncertainty
Lang Cao
Renhong Chen
Yingtian Zou
Chao Peng
Wu Ning
...
Y. Wang
Peishuo Su
Mofan Peng
Zijie Chen
Yitong Li
34
0
0
28 Mar 2025
A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models
Palakorn Achananuparp
Ee-Peng Lim
41
0
0
17 Mar 2025
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Shuliang Liu
Xinze Li
Zhenghao Liu
Yukun Yan
Cheng Yang
Zheni Zeng
Zhiyuan Liu
Maosong Sun
Ge Yu
RALM
91
1
0
26 Feb 2025
Towards Reasoning Ability of Small Language Models
Gaurav Srivastava
Shuxiang Cao
Xuan Wang
ReLM
LRM
49
4
0
17 Feb 2025
ReLearn: Unlearning via Learning for Large Language Models
Haoming Xu
Ningyuan Zhao
Liming Yang
Sendong Zhao
Shumin Deng
Mengru Wang
Bryan Hooi
Nay Oo
H. Chen
N. Zhang
KELM
CLL
MU
65
0
0
16 Feb 2025
Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation
Guofu Xie
Xiao Zhang
Ting Yao
Yunsheng Shi
MoMe
53
1
0
15 Feb 2025
Copyright-Protected Language Generation via Adaptive Model Fusion
Javier Abad
Konstantin Donhauser
Francesco Pinto
Fanny Yang
74
1
0
09 Dec 2024
Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison
Shiyu Hu
Xuchen Li
X. Li
Jing Zhang
Yipei Wang
Xin Zhao
Kang Hao Cheong
VLM
22
1
0
20 Oct 2024
Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning
Lang Cao
Chao Peng
Renhong Chen
Wu Ning
Yingtian Zou
Yitong Li
LRM
16
0
0
18 Oct 2024
CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering
Yike Wu
Yi Huang
Nan Hu
Yuncheng Hua
Guilin Qi
Jiaoyan Chen
Jeff Z. Pan
33
6
0
29 Sep 2024
SimulBench: Evaluating Language Models with Creative Simulation Tasks
Qi Jia
Xiang Yue
Tianyu Zheng
Jie Huang
Bill Yuchen Lin
LM&MA
29
3
0
11 Sep 2024
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation
Ingo Ziegler
Abdullatif Köksal
Desmond Elliott
Hinrich Schütze
38
5
0
03 Sep 2024
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
Jaehun Jung
Faeze Brahman
Yejin Choi
ALM
42
11
0
25 Jul 2024
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
Tong Chen
Akari Asai
Niloofar Mireshghallah
Sewon Min
James Grimmelmann
Yejin Choi
Hannaneh Hajishirzi
Luke Zettlemoyer
Pang Wei Koh
48
17
0
09 Jul 2024
Themis: Towards Flexible and Interpretable NLG Evaluation
Xinyu Hu
Li Lin
Mingqi Gao
Xunjian Yin
Xiaojun Wan
ELM
29
6
0
26 Jun 2024
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
Ju-Seung Byun
Jiyun Chun
Jihyung Kil
Andrew Perrault
ReLM
LRM
27
1
0
25 Jun 2024
RuleR: Improving LLM Controllability by Rule-based Data Recycling
Ming Li
Han Chen
Chenguang Wang
Dang Nguyen
Dianqi Li
Tianyi Zhou
19
6
0
22 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
45
55
0
18 Jun 2024
Uncertainty Aware Learning for Language Model Alignment
Yikun Wang
Rui Zheng
Liang Ding
Qi Zhang
Dahua Lin
Dacheng Tao
45
4
0
07 Jun 2024
mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
LRM
38
2
0
06 Jun 2024
Towards Completeness-Oriented Tool Retrieval for Large Language Models
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Jirong Wen
KELM
22
7
0
25 May 2024
FinTextQA: A Dataset for Long-form Financial Question Answering
Jian Chen
Peilin Zhou
Yining Hua
Yingxin Loh
Kehui Chen
Ziyuan Li
Bing Zhu
Junwei Liang
24
11
0
16 May 2024
Explainable Generative AI (GenXAI): A Survey, Conceptualization, and Research Agenda
Johannes Schneider
79
26
0
15 Apr 2024
Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction
Masamune Kobayashi
Masato Mita
Mamoru Komachi
ELM
40
3
0
26 Mar 2024
Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation
Zhiyao Ren
Yibing Zhan
Baosheng Yu
Liang Ding
Dacheng Tao
LM&MA
32
12
0
20 Feb 2024
Are LLM-based Evaluators Confusing NLG Quality Criteria?
Xinyu Hu
Mingqi Gao
Sen Hu
Yang Zhang
Yicheng Chen
Teng Xu
Xiaojun Wan
AAML
ELM
29
21
0
19 Feb 2024
Revisiting Knowledge Distillation for Autoregressive Language Models
Qihuang Zhong
Liang Ding
Li Shen
Juhua Liu
Bo Du
Dacheng Tao
KELM
39
15
0
19 Feb 2024
Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction
Yinghui Li
Shang Qin
Jingheng Ye
Shirong Ma
Yangning Li
Libo Qin
Xuming Hu
Wenhao Jiang
Hai-Tao Zheng
Philip S. Yu
LRM
20
5
0
18 Feb 2024
Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
Ming Li
Yong Zhang
Shwai He
Zhitao Li
Hongyu Zhao
Jianzong Wang
Ning Cheng
Tianyi Zhou
24
62
0
01 Feb 2024
Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation
Zdeněk Kasner
Ondrej Dusek
25
8
0
18 Jan 2024
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
Zhen Li
Xiaohan Xu
Tao Shen
Can Xu
Jia-Chen Gu
Yuxuan Lai
Chongyang Tao
Shuai Ma
LM&MA
ELM
26
9
0
13 Jan 2024
LLM-SAP: Large Language Models Situational Awareness Based Planning
Liman Wang
Hanyang Zhong
LLMAG
23
2
0
26 Dec 2023
Extending Context Window of Large Language Models via Semantic Compression
WeiZhi Fei
Xueyan Niu
Pingyi Zhou
Lu Hou
Bo Bai
Lei Deng
Wei Han
23
26
0
15 Dec 2023
Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation
Flor Miriam Plaza del Arco
Debora Nozza
Dirk Hovy
ALM
24
4
0
24 Jul 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
206
559
0
03 May 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
170
3,504
0
10 Jun 2015
1