A Critical Evaluation of AI Feedback for Aligning Large Language Models

19 February 2024

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)

Papers citing "A Critical Evaluation of AI Feedback for Aligning Large Language Models"

24 / 24 papers shown

TritonRL: Training LLMs to Think and Code Triton Without Cheating

125

18 Oct 2025

How well can LLMs provide planning feedback in grounded environments?

Yuxuan Li

Victor Zhong

OffRL LM&Ro LRM

103

11 Sep 2025

Understanding Reinforcement Learning for Model Training, and future directions with GRAPE

Rohit Patel

OffRL

177

02 Sep 2025

TARS: MinMax Token-Adaptive Preference Strategy for MLLM Hallucination Reduction

281

29 Jul 2025

Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks

214

16 Jun 2025

Text2Grad: Reinforcement Learning from Natural Language Feedback

225

28 May 2025

Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity

306

08 Mar 2025

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

...

306

23 Feb 2025

RLTHF: Targeted Human Feedback for LLM Alignment

...

470

19 Feb 2025

Scaling Autonomous Agents via Automatic Reward Modeling And PlanningInternational Conference on Learning Representations (ICLR), 2025

328

17 Feb 2025

ExpressivityArena: Can LLMs Express Information Implicitly?

160

12 Nov 2024

On The Global Convergence Of Online RLHF With Neural Parametrization

278

21 Oct 2024

Personality Alignment of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

349

21 Aug 2024

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

658

1,290

06 Aug 2024

Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

437

24 Jun 2024

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Furong Huang

264

21 Jun 2024

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

Amrith Rajagopal Setlur

479

20 Jun 2024

Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner

Kenneth Li

Yiming Wang

Fernanda Viégas

Martin Wattenberg

263

17 Jun 2024

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon CaptioningNeural Information Processing Systems (NeurIPS), 2024

...

Kevin Jamieson

Robert Nowak

270

15 Jun 2024

Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation

297

10 Jun 2024

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

300

10 Jun 2024

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Stefano Ermon

444

168

22 Apr 2024

Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback

...

336

16 Apr 2024

Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

437

15 Apr 2024