ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.04592
  4. Cited By
Shepherd: A Critic for Language Model Generation

Shepherd: A Critic for Language Model Generation

8 August 2023
Tianlu Wang
Ping Yu
Xiaoqing Ellen Tan
Sean O'Brien
Ramakanth Pasunuru
Jane Dwivedi-Yu
O. Yu. Golovneva
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
    ALM
ArXivPDFHTML

Papers citing "Shepherd: A Critic for Language Model Generation"

50 / 69 papers shown
Title
ICon: In-Context Contribution for Automatic Data Selection
ICon: In-Context Contribution for Automatic Data Selection
Yixin Yang
Qingxiu Dong
Linli Yao
Fangwei Zhu
Zhifang Sui
41
0
0
08 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
70
1
0
05 May 2025
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
37
0
0
05 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
Med-CoDE: Medical Critique based Disagreement Evaluation Framework
Med-CoDE: Medical Critique based Disagreement Evaluation Framework
Mohit Gupta
Akiko Aizawa
R. Shah
LM&MA
ELM
30
0
0
21 Apr 2025
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
J. T. Wang
Jin Jiang
Yang Liu
M. Zhang
Xunliang Cai
LRM
32
0
0
18 Apr 2025
Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta Instruction
Liping Liu
Chunhong Zhang
Likang Wu
Chuang Zhao
Zheng Hu
Ming He
Jianping Fan
LLMAG
LRM
36
0
0
02 Mar 2025
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Aliyah R. Hsu
James Zhu
Zhichao Wang
Bin Bi
Shubham Mehrotra
...
Sougata Chaudhuri
Regunathan Radhakrishnan
S. Asur
Claire Na Cheng
Bin Yu
ALM
LRM
67
0
0
20 Feb 2025
Malware Classification using a Hybrid Hidden Markov Model-Convolutional
  Neural Network
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta
Olha Jurecková
Mark Stamp
57
0
0
25 Dec 2024
Smaller Large Language Models Can Do Moral Self-Correction
Smaller Large Language Models Can Do Moral Self-Correction
Guangliang Liu
Zhiyu Xue
Rongrong Wang
K. Johnson
Kristen Marie Johnson
LRM
23
0
0
30 Oct 2024
Improving Model Factuality with Fine-grained Critique-based Evaluator
Improving Model Factuality with Fine-grained Critique-based Evaluator
Yiqing Xie
Wenxuan Zhou
Pradyot Prakash
Di Jin
Yuning Mao
...
Sinong Wang
Han Fang
Carolyn Rose
Daniel Fried
Hejia Zhang
HILM
33
5
0
24 Oct 2024
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
Zonghai Yao
Aditya Parashar
Huixue Zhou
Won Seok Jang
Feiyun Ouyang
Zhichao Yang
Hong-ye Yu
ELM
44
2
0
17 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
33
8
0
09 Oct 2024
Better than Your Teacher: LLM Agents that learn from Privileged AI
  Feedback
Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback
Sanjiban Choudhury
Paloma Sodhi
LLMAG
26
3
0
07 Oct 2024
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and
  Generation
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation
Jonathan Cook
Tim Rocktaschel
Jakob Foerster
Dennis Aumiller
Alex Wang
ALM
29
9
0
04 Oct 2024
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering
Sacha Muller
António Loison
Bilel Omrani
Gautier Viaud
RALM
ELM
31
1
0
10 Sep 2024
Self-Judge: Selective Instruction Following with Alignment
  Self-Evaluation
Self-Judge: Selective Instruction Following with Alignment Self-Evaluation
Hai Ye
Hwee Tou Ng
ELM
ALM
27
4
0
02 Sep 2024
What Makes a Good Story and How Can We Measure It? A Comprehensive
  Survey of Story Evaluation
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
Dingyi Yang
Qin Jin
36
5
0
26 Aug 2024
DHP Benchmark: Are LLMs Good NLG Evaluators?
DHP Benchmark: Are LLMs Good NLG Evaluators?
Yicheng Wang
Jiayi Yuan
Yu-Neng Chuang
Zhuoer Wang
Yingchi Liu
Mark Cusick
Param Kulkarni
Zhengping Ji
Yasser Ibrahim
Xia Hu
LM&MA
ELM
43
3
0
25 Aug 2024
Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation
  Instructions
Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions
Bhuvanashree Murugadoss
Christian Poelitz
Ian Drosos
Vu Le
Nick McKenna
Carina Negreanu
Chris Parnin
Advait Sarkar
ELM
ALM
35
12
0
16 Aug 2024
SAFETY-J: Evaluating Safety with Critique
SAFETY-J: Evaluating Safety with Critique
Yixiu Liu
Yuxiang Zheng
Shijie Xia
Jiajun Li
Yi Tu
Chaoling Song
Pengfei Liu
ELM
24
2
0
24 Jul 2024
Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal
  Mechanisms and the Superficial Hypothesis
Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Guang-Da Liu
Haitao Mao
Jiliang Tang
K. Johnson
LRM
22
7
0
21 Jul 2024
Localizing and Mitigating Errors in Long-form Question Answering
Localizing and Mitigating Errors in Long-form Question Answering
Rachneet Sachdeva
Yixiao Song
Mohit Iyyer
Iryna Gurevych
HILM
41
0
0
16 Jul 2024
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated
  Responses
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao
Xiaoyuan Yi
Xing Xie
ELM
ALM
31
7
0
15 Jul 2024
Learning to Refine with Fine-Grained Natural Language Feedback
Learning to Refine with Fine-Grained Natural Language Feedback
Manya Wadhwa
Xinyu Zhao
Junyi Jessy Li
Greg Durrett
26
11
0
02 Jul 2024
Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in
  Self-Improving Generation
Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation
Jihyun Janice Ahn
Ryo Kamoi
Lu Cheng
Rui Zhang
Wenpeng Yin
33
1
0
27 Jun 2024
Human-AI Collaborative Taxonomy Construction: A Case Study in
  Profession-Specific Writing Assistants
Human-AI Collaborative Taxonomy Construction: A Case Study in Profession-Specific Writing Assistants
Minhwa Lee
Zae Myung Kim
Vivek A. Khetan
Dongyeop Kang
39
3
0
26 Jun 2024
Themis: Towards Flexible and Interpretable NLG Evaluation
Themis: Towards Flexible and Interpretable NLG Evaluation
Xinyu Hu
Li Lin
Mingqi Gao
Xunjian Yin
Xiaojun Wan
ELM
29
6
0
26 Jun 2024
RichRAG: Crafting Rich Responses for Multi-faceted Queries in
  Retrieval-Augmented Generation
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation
Shuting Wang
Xin Yu
Mang Wang
Weipeng Chen
Yutao Zhu
Zhicheng Dou
RALM
32
7
0
18 Jun 2024
A Survey on Human Preference Learning for Large Language Models
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
39
8
0
17 Jun 2024
SED: Self-Evaluation Decoding Enhances Large Language Models for Better
  Generation
SED: Self-Evaluation Decoding Enhances Large Language Models for Better Generation
Ziqin Luo
Haixia Han
Haokun Zhao
Guochao Jiang
Chengyu Du
Tingyun Li
Jiaqing Liang
Deqing Yang
Yanghua Xiao
46
3
0
26 May 2024
Language Models can Evaluate Themselves via Probability Discrepancy
Language Models can Evaluate Themselves via Probability Discrepancy
Tingyu Xia
Bowen Yu
Yuan Wu
Yi-Ju Chang
Chang Zhou
ELM
29
4
0
17 May 2024
The Real, the Better: Aligning Large Language Models with Online Human
  Behaviors
The Real, the Better: Aligning Large Language Models with Online Human Behaviors
Guanying Jiang
Lingyong Yan
Haibo Shi
Dawei Yin
28
2
0
01 May 2024
Toward Self-Improvement of LLMs via Imagination, Searching, and
  Criticizing
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Ye Tian
Baolin Peng
Linfeng Song
Lifeng Jin
Dian Yu
Haitao Mi
Dong Yu
LRM
ReLM
33
62
0
18 Apr 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum
  Discrepancy Competition
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Kede Ma
Zhihua Wang
Qiang Zhang
Huajun Chen
19
10
0
10 Apr 2024
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language
  Models to Coding Preferences
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences
M. Weyssow
Aton Kamanda
H. Sahraoui
ALM
59
30
0
14 Mar 2024
LLMCRIT: Teaching Large Language Models to Use Criteria
LLMCRIT: Teaching Large Language Models to Use Criteria
Weizhe Yuan
Pengfei Liu
Matthias Gallé
ALM
19
6
0
02 Mar 2024
Self-Refinement of Language Models from External Proxy Metrics Feedback
Self-Refinement of Language Models from External Proxy Metrics Feedback
Keshav Ramji
Young-Suk Lee
R. Astudillo
M. Sultan
Tahira Naseem
Asim Munawar
Radu Florian
Salim Roukos
HILM
25
3
0
27 Feb 2024
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Zicheng Lin
Zhibin Gou
Tian Liang
Ruilin Luo
Haowei Liu
Yujiu Yang
LRM
40
43
0
22 Feb 2024
CriticBench: Evaluating Large Language Models as Critic
CriticBench: Evaluating Large Language Models as Critic
Tian Lan
Wenwei Zhang
Chen Xu
Heyan Huang
Dahua Lin
Kai-xiang Chen
Xian-Ling Mao
ELM
AI4MH
LRM
39
3
0
21 Feb 2024
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM
  Instruction-Tuning
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Ming Li
Lichang Chen
Jiuhai Chen
Shwai He
Jiuxiang Gu
Tianyi Zhou
16
50
0
15 Feb 2024
Suppressing Pink Elephants with Direct Principle Feedback
Suppressing Pink Elephants with Direct Principle Feedback
Louis Castricato
Nathan Lile
Suraj Anand
Hailey Schoelkopf
Siddharth Verma
Stella Biderman
58
9
0
12 Feb 2024
"Task Success" is not Enough: Investigating the Use of Video-Language
  Models as Behavior Critics for Catching Undesirable Agent Behaviors
"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors
L. Guan
Yifan Zhou
Denis Liu
Yantian Zha
H. B. Amor
Subbarao Kambhampati
LM&Ro
34
16
0
06 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
53
29
0
02 Feb 2024
Quality of Answers of Generative Large Language Models vs Peer Patients
  for Interpreting Lab Test Results for Lay Patients: Evaluation Study
Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study
Zhe He
Balu Bhasuran
Qiao Jin
Shubo Tian
Karim Hanna
Cindy Shavor
Lisbeth Y. Garcia Arguello
Patrick Murray
Zhiyong Lu
LM&MA
ELM
AI4MH
6
9
0
23 Jan 2024
Leveraging Large Language Models for NLG Evaluation: Advances and
  Challenges
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
Zhen Li
Xiaohan Xu
Tao Shen
Can Xu
Jia-Chen Gu
Yuxuan Lai
Chongyang Tao
Shuai Ma
LM&MA
ELM
26
9
0
13 Jan 2024
Structsum Generation for Faster Text Comprehension
Structsum Generation for Faster Text Comprehension
Parag Jain
Andreea Marzoca
Francesco Piccinno
ReLM
31
5
0
12 Jan 2024
The Critique of Critique
The Critique of Critique
Shichao Sun
Junlong Li
Weizhe Yuan
Ruifeng Yuan
Wenjie Li
Pengfei Liu
ELM
27
0
0
09 Jan 2024
Reasons to Reject? Aligning Language Models with Judgments
Reasons to Reject? Aligning Language Models with Judgments
Weiwen Xu
Deng Cai
Zhisong Zhang
Wai Lam
Shuming Shi
ALM
16
14
0
22 Dec 2023
Is Feedback All You Need? Leveraging Natural Language Feedback in
  Goal-Conditioned Reinforcement Learning
Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning
Sabrina McCallum
Max Taylor-Davies
Stefano V. Albrecht
Alessandro Suglia
11
1
0
07 Dec 2023
12
Next