ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.12366
  4. Cited By
A Critical Evaluation of AI Feedback for Aligning Large Language Models

A Critical Evaluation of AI Feedback for Aligning Large Language Models

19 February 2024
Archit Sharma
Sedrick Scott Keh
Eric Mitchell
Chelsea Finn
Kushal Arora
Thomas Kollar
    ALMLLMAG
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "A Critical Evaluation of AI Feedback for Aligning Large Language Models"

24 / 24 papers shown
TritonRL: Training LLMs to Think and Code Triton Without Cheating
TritonRL: Training LLMs to Think and Code Triton Without Cheating
Jiin Woo
Shaowei Zhu
Allen Nie
Zhen Jia
Yida Wang
Youngsuk Park
125
2
0
18 Oct 2025
How well can LLMs provide planning feedback in grounded environments?
How well can LLMs provide planning feedback in grounded environments?
Yuxuan Li
Victor Zhong
OffRLLM&RoLRM
103
0
0
11 Sep 2025
Understanding Reinforcement Learning for Model Training, and future directions with GRAPE
Understanding Reinforcement Learning for Model Training, and future directions with GRAPE
Rohit Patel
OffRL
177
0
0
02 Sep 2025
TARS: MinMax Token-Adaptive Preference Strategy for MLLM Hallucination Reduction
TARS: MinMax Token-Adaptive Preference Strategy for MLLM Hallucination Reduction
Kejia Zhang
Keda Tao
Zhiming Luo
Chang Liu
Jiasheng Tang
Huan Wang
LRM
281
0
0
29 Jul 2025
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks
Yifei Xu
Tusher Chakraborty
Srinagesh Sharma
Leonardo Nunes
Emre Kıcıman
Songwu Lu
Ranveer Chandra
OffRLLRM
214
11
0
16 Jun 2025
Text2Grad: Reinforcement Learning from Natural Language Feedback
Text2Grad: Reinforcement Learning from Natural Language Feedback
Hanyang Wang
Lu Wang
Chaoyun Zhang
Tianjun Mao
Si Qin
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
225
0
0
28 May 2025
Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity
HyunJin Kim
Xiaoyuan Yi
Jing Yao
Muhua Huang
Jinyeong Bak
James Evans
Xing Xie
306
0
0
08 Mar 2025
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
Alexander Zhang
Marcus Dong
Jing Liu
Wei Zhang
Yejie Wang
...
Yancheng He
K. Deng
Wangchunshu Zhou
Wenhao Huang
Zhenru Zhang
LRM
306
11
0
23 Feb 2025
RLTHF: Targeted Human Feedback for LLM Alignment
RLTHF: Targeted Human Feedback for LLM Alignment
Yifei Xu
Tusher Chakraborty
Emre Kıcıman
Bibek Aryal
Eduardo Rodrigues
...
Rafael Padilha
Leonardo Nunes
Shobana Balakrishnan
Songwu Lu
Ranveer Chandra
470
4
0
19 Feb 2025
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Scaling Autonomous Agents via Automatic Reward Modeling And PlanningInternational Conference on Learning Representations (ICLR), 2025
Zhenfang Chen
Delin Chen
Rui Sun
Wenjun Liu
Chuang Gan
LLMAG
328
12
0
17 Feb 2025
ExpressivityArena: Can LLMs Express Information Implicitly?
ExpressivityArena: Can LLMs Express Information Implicitly?
Joshua Tint
Som Sagar
Aditya Taparia
Kelly Raines
Bimsara Pathiraja
Caleb Liu
Ransalu Senanayake
160
4
0
12 Nov 2024
On The Global Convergence Of Online RLHF With Neural Parametrization
On The Global Convergence Of Online RLHF With Neural Parametrization
Mudit Gaur
Amrit Singh Bedi
Raghu Pasupathy
Vaneet Aggarwal
278
1
0
21 Oct 2024
Personality Alignment of Large Language Models
Personality Alignment of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Minjun Zhu
Linyi Yang
Yue Zhang
Yue Zhang
ALM
349
21
0
21 Aug 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than
  Scaling Model Parameters
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell
Jaehoon Lee
Kelvin Xu
Aviral Kumar
LRM
658
1,290
0
06 Aug 2024
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Ashwinee Panda
Berivan Isik
Xiangyu Qi
Sanmi Koyejo
Tsachy Weissman
Prateek Mittal
MoMe
437
27
0
24 Jun 2024
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Mucong Ding
Souradip Chakraborty
Vibhu Agrawal
Zora Che
Alec Koppel
Mengdi Wang
Amrit Singh Bedi
Furong Huang
264
20
0
21 Jun 2024
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math
  Reasoning by Eight-Fold
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
Amrith Rajagopal Setlur
Saurabh Garg
Xinyang Geng
Naman Garg
Virginia Smith
Aviral Kumar
479
96
0
20 Jun 2024
Dialogue Action Tokens: Steering Language Models in Goal-Directed
  Dialogue with a Multi-Turn Planner
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Kenneth Li
Yiming Wang
Fernanda Viégas
Martin Wattenberg
263
10
0
17 Jun 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for
  Cartoon Captioning
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon CaptioningNeural Information Processing Systems (NeurIPS), 2024
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy T. Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
270
10
0
15 Jun 2024
Direct Preference Optimization for Suppressing Hallucinated Prior Exams
  in Radiology Report Generation
Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation
Oishi Banerjee
Hong-Yu Zhou
Subathra Adithan
Stephen Kwak
Kay Wu
Pranav Rajpurkar
MedIm
297
8
0
10 Jun 2024
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Jiwoo Hong
Sayak Paul
Noah Lee
Kashif Rasul
James Thorne
Jongheon Jeong
300
31
0
10 Jun 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy
  Data
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar
Anika Singh
Archit Sharma
Rafael Rafailov
Jeff Schneider
Tengyang Xie
Stefano Ermon
Chelsea Finn
Aviral Kumar
444
168
0
22 Apr 2024
Social Choice Should Guide AI Alignment in Dealing with Diverse Human
  Feedback
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer
Rachel Freedman
J. Heitzig
Wesley H. Holliday
Bob M. Jacobs
...
Eric Pacuit
Stuart Russell
Hailey Schoelkopf
Emanuel Tewolde
W. Zwicker
336
57
0
16 Apr 2024
Reinforcement Learning from Multi-role Debates as Feedback for Bias
  Mitigation in LLMs
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs
Ruoxi Cheng
Haoxuan Ma
Shuirong Cao
Jiaqi Li
Aihua Pei
Zhiqiang Wang
Pengliang Ji
Haoyu Wang
Jiaqi Huo
AI4CE
437
21
0
15 Apr 2024
1