Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.03741
Cited By
Deep reinforcement learning from human preferences
12 June 2017
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep reinforcement learning from human preferences"
50 / 691 papers shown
Title
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Kun Wu
Chengkai Hou
Jiaming Liu
Zhengping Che
Xiaozhu Ju
...
Zhenyu Wang
Pengju An
Siyuan Qian
Shanghang Zhang
Jian Tang
LM&Ro
124
14
0
17 Feb 2025
A Critical Look At Tokenwise Reward-Guided Text Generation
Ahmad Rashid
Ruotian Wu
Julia Grosse
Agustinus Kristiadi
Pascal Poupart
OffRL
78
0
0
17 Feb 2025
Evaluating the Paperclip Maximizer: Are RL-Based Language Models More Likely to Pursue Instrumental Goals?
Yufei He
Yuexin Li
Jiaying Wu
Yuan Sui
Yulin Chen
Bryan Hooi
ALM
96
6
0
16 Feb 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Guoqing Ma
Haoyang Huang
K. Yan
L. Chen
Nan Duan
...
Yansen Wang
Yuanwei Lu
Yu-Cheng Chen
Yu-Juan Luo
Yihao Luo
DiffM
VGen
182
19
0
14 Feb 2025
Preference learning made easy: Everything should be understood through win rate
Lily H. Zhang
Rajesh Ranganath
87
0
0
14 Feb 2025
Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies
Sunnie S. Y. Kim
J. Vaughan
Q. V. Liao
Tania Lombrozo
Olga Russakovsky
112
5
0
12 Feb 2025
AI Alignment at Your Discretion
Maarten Buyl
Hadi Khalaf
C. M. Verdun
Lucas Monteiro Paes
Caio Vieira Machado
Flavio du Pin Calmon
48
0
0
10 Feb 2025
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
L. Yang
Zhaochen Yu
Bin Cui
Mengdi Wang
ReLM
LRM
AI4CE
101
12
0
10 Feb 2025
Barriers and Pathways to Human-AI Alignment: A Game-Theoretic Approach
Aran Nayebi
87
1
0
09 Feb 2025
Design Considerations in Offline Preference-based RL
Alekh Agarwal
Christoph Dann
T. V. Marinov
OffRL
56
0
0
08 Feb 2025
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Yuhui Chen
Shuai Tian
Shugao Liu
Yingting Zhou
Haoran Li
Dongbin Zhao
OffRL
106
1
0
08 Feb 2025
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization
Yongcheng Zeng
Xinyu Cui
Xuanfa Jin
Guoqing Liu
Zexu Sun
Quan He
Dong Li
Ning Yang
Haifeng Zhang
Jun Wang
LLMAG
LRM
100
1
0
08 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
61
3
0
07 Feb 2025
CTR-Driven Advertising Image Generation with Multimodal Large Language Models
Xingye Chen
Wei Feng
Zhenbang Du
Weizhen Wang
Yuxiao Chen
...
Jingping Shao
Yuanjie Shao
Xinge You
Changxin Gao
Nong Sang
OffRL
47
2
0
05 Feb 2025
Learning from Active Human Involvement through Proxy Value Propagation
Zhenghao Peng
Wenjie Mo
Chenda Duan
Quanyi Li
Bolei Zhou
109
14
0
05 Feb 2025
Out-of-Distribution Detection using Synthetic Data Generation
Momin Abbas
Muneeza Azmat
R. Horesh
Mikhail Yurochkin
49
1
0
05 Feb 2025
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shen
Guangtao Zeng
Zhenting Qi
Zhang-Wei Hong
Zhenfang Chen
Wei Lu
G. Wornell
Subhro Das
David D. Cox
Chuang Gan
LLMAG
LRM
240
8
0
04 Feb 2025
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
Javier Rando
Jie Zhang
Nicholas Carlini
F. Tramèr
AAML
ELM
68
4
0
04 Feb 2025
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
Hanyang Zhao
Haoxian Chen
Ji Zhang
D. Yao
Wenpin Tang
62
0
0
03 Feb 2025
Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model
Hadas Ben-Atya
N. Gavrielov
Zvi Badash
G. Focht
R. Cytter-Kuint
Talar Hagopian
Dan Turner
M. Freiman
66
0
0
02 Feb 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAML
AuLLM
69
2
0
02 Feb 2025
Diverse Preference Optimization
Jack Lanchantin
Angelica Chen
S. Dhuliawala
Ping Yu
Jason Weston
Sainbayar Sukhbaatar
Ilia Kulikov
107
4
0
30 Jan 2025
TimeHF: Billion-Scale Time Series Models Guided by Human Feedback
Yongzhi Qi
Hao Hu
Dazhou Lei
Jianshen Zhang
Zhengxin Shi
Yulin Huang
Zhengyu Chen
Xiaoming Lin
Zuo-jun Shen
AI4TS
AI4CE
49
2
0
28 Jan 2025
LiPO: Listwise Preference Optimization through Learning-to-Rank
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
...
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
141
51
0
28 Jan 2025
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Hao Sun
M. Schaar
94
14
0
28 Jan 2025
Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression
Kai Yoshida
M. Mizukami
Seiya Kawano
Canasai Kruengkrai
Hiroaki Sugiyama
Koichiro Yoshino
ALM
OffRL
84
1
0
28 Jan 2025
BoKDiff: Best-of-K Diffusion Alignment for Target-Specific 3D Molecule Generation
Ali Khodabandeh Yalabadi
Mehdi Yazdani-Jahromi
O. Garibay
49
0
0
28 Jan 2025
Pre-train and Fine-tune: Recommenders as Large Models
Zhenhao Jiang
Chong Chen
Hao Feng
Yu Yang
Jin Liu
Jie Zhang
Jia Jia
Ning Hu
47
0
0
24 Jan 2025
CodeMonkeys: Scaling Test-Time Compute for Software Engineering
Ryan Ehrlich
Bradley Brown
Jordan Juravsky
Ronald Clark
Christopher Ré
Azalia Mirhoseini
57
8
0
24 Jan 2025
HumorReject: Decoupling LLM Safety from Refusal Prefix via A Little Humor
Zihui Wu
Haichang Gao
Jiacheng Luo
Zhaoxiang Liu
46
0
0
23 Jan 2025
Evolution and The Knightian Blindspot of Machine Learning
Joel Lehman
Elliot Meyerson
Tarek El-Gaaly
Kenneth O. Stanley
Tarin Ziyaee
96
2
0
22 Jan 2025
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar
Vikrant Varma
David Lindner
David Elson
Caleb Biddulph
Ian Goodfellow
Rohin Shah
96
1
0
22 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
141
72
0
20 Jan 2025
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Yannis Flet-Berliac
Nathan Grinsztajn
Florian Strub
Bill Wu
Eugene Choi
...
Arash Ahmadian
Yash Chandak
M. G. Azar
Olivier Pietquin
Matthieu Geist
OffRL
66
5
0
17 Jan 2025
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Chaoqi Wang
Zhuokai Zhao
Yibo Jiang
Zhaorun Chen
Chen Zhu
...
Jiayi Liu
Lizhu Zhang
Xiangjun Fan
Hao Ma
Sinong Wang
82
4
0
17 Jan 2025
A Survey of Research in Large Language Models for Electronic Design Automation
Jingyu Pan
Guanglei Zhou
Chen-Chia Chang
Isaac Jacobson
Jiang Hu
Yuxiao Chen
79
2
0
17 Jan 2025
Can ChatGPT Overcome Behavioral Biases in the Financial Sector? Classify-and-Rethink: Multi-Step Zero-Shot Reasoning in the Gold Investment
Shuoling Liu
Gaoguo Jia
Yuhang Jiang
Liyuan Chen
Qiang Yang
AIFin
LRM
95
0
0
17 Jan 2025
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
K. M. Collins
Umang Bhatt
Ilia Sucholutsky
61
1
0
16 Jan 2025
RbRL2.0: Integrated Reward and Policy Learning for Rating-based Reinforcement Learning
Mingkang Wu
Devin White
Vernon J. Lawhern
Nicholas R. Waytowich
Yongcan Cao
OffRL
39
0
0
13 Jan 2025
Pairwise Comparisons without Stochastic Transitivity: Model, Theory and Applications
Sze Ming Lee
Yunxiao Chen
44
0
0
13 Jan 2025
Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Evangelos Anagnostopoulos
Christos Varytimidis
Antonio del Rio Chanona
42
0
0
13 Jan 2025
Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific Prompts
Danyal Aftab
Steven Davy
ALM
49
0
0
10 Jan 2025
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Roberto-Rafael Maura-Rivero
Chirag Nagpal
Roma Patel
Francesco Visin
51
1
0
08 Jan 2025
Predictable Artificial Intelligence
Lexin Zhou
Pablo Antonio Moreno Casares
Fernando Martínez-Plumed
John Burden
Ryan Burnell
...
Seán Ó hÉigeartaigh
Danaja Rutar
Wout Schellaert
Konstantinos Voudouris
José Hernández-Orallo
56
2
0
08 Jan 2025
SR-Reward: Taking The Path More Traveled
Seyed Mahdi Basiri Azad
Zahra Padar
Gabriel Kalweit
Joschka Boedecker
OffRL
72
0
0
04 Jan 2025
CREW: Facilitating Human-AI Teaming Research
Lingyu Zhang
Zhengran Ji
Boyuan Chen
54
3
0
03 Jan 2025
DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning
Utsav Singh
Souradip Chakraborty
Wesley A Suttle
Brian M. Sadler
Vinay P. Namboodiri
Amrit Singh Bedi
OffRL
53
0
0
03 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
95
236
0
03 Jan 2025
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment
Jianfei Zhang
Jun Bai
Yangqiu Song
Yanmeng Wang
Rumei Li
Chenghua Lin
Wenge Rong
44
0
0
31 Dec 2024
Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications
Sinan Ibrahim
Mostafa Mostafa
Ali Jnadi
Hadi Salloum
Pavel Osinenko
OffRL
52
14
0
31 Dec 2024
Previous
1
2
3
4
5
6
...
12
13
14
Next