v1v2 (latest)

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

22 April 2024

ArXiv (abs)PDF HTML Github (20★)

Papers citing "Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels"

8 / 8 papers shown

Latent Principle Discovery for Language Model Self-Improvement

Keshav Ramji

Tahira Naseem

Ramón Fernandez Astudillo

LRM

377

22 May 2025

Inference-Time Scaling for Generalist Reward Modeling

645

203

03 Apr 2025

Is Free Self-Alignment Possible?

474

24 Feb 2025

Generative Reward Models

319

102

02 Oct 2024

WPO: Enhancing RLHF with Weighted Preference Optimization

398

17 Jun 2024

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Ziniu Hu

377

04 Jun 2024

Enhancing Large Vision Language Models with Self-Training on Image Comprehension

298

30 May 2024

STaR-GATE: Teaching Language Models to Ask Clarifying Questions

480

28 Mar 2024