ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.14313
  4. Cited By
Self-Supervised Alignment with Mutual Information: Learning to Follow
  Principles without Preference Labels

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

22 April 2024
Jan-Philipp Fränken
E. Zelikman
Rafael Rafailov
Kanishk Gandhi
Tobias Gerstenberg
Noah D. Goodman
ArXivPDFHTML

Papers citing "Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels"

7 / 7 papers shown
Title
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
R. Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Janet Liu
Y. Wu
OffRL
LRM
46
10
0
03 Apr 2025
Is Free Self-Alignment Possible?
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
108
2
0
24 Feb 2025
Enhancing Large Vision Language Models with Self-Training on Image
  Comprehension
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
Yihe Deng
Pan Lu
Fan Yin
Ziniu Hu
Sheng Shen
James Y. Zou
Kai-Wei Chang
Wei Wang
SyDa
VLM
LRM
36
36
0
30 May 2024
KTO: Model Alignment as Prospect Theoretic Optimization
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
159
444
0
02 Feb 2024
Specific versus General Principles for Constitutional AI
Specific versus General Principles for Constitutional AI
Sandipan Kundu
Yuntao Bai
Saurav Kadavath
Amanda Askell
Andrew Callahan
...
Zac Hatfield-Dodds
Sören Mindermann
Nicholas Joseph
Sam McCandlish
Jared Kaplan
AILaw
56
26
0
20 Oct 2023
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
500
0
28 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
308
11,909
0
04 Mar 2022
1