Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.05862
Cited By
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
12 April 2022
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
Nova Dassarma
Dawn Drain
Stanislav Fort
Deep Ganguli
T. Henighan
Nicholas Joseph
Saurav Kadavath
John Kernion
Tom Conerly
S. E. Showk
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
Tristan Hume
Scott Johnston
Shauna Kravec
Liane Lovitt
Neel Nanda
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
50 / 1,795 papers shown
Title
OmniCam: Unified Multimodal Video Generation via Camera Control
Xiaoda Yang
Jiayang Xu
Kaixuan Luan
Xinyu Zhan
Hongshun Qiu
...
Shuai Yang
Li Zhang
Checheng Yu
Cewu Lu
Lixin Yang
DiffM
VGen
62
0
0
03 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
R. Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Janet Liu
Y. Wu
OffRL
LRM
46
9
0
03 Apr 2025
Urban Computing in the Era of Large Language Models
Zhonghang Li
Lianghao Xia
Xubin Ren
J. Tang
Tianyi Chen
Yong-mei Xu
C. Huang
73
0
0
02 Apr 2025
PiCo: Jailbreaking Multimodal Large Language Models via
Pi
\textbf{Pi}
Pi
ctorial
Co
\textbf{Co}
Co
de Contextualization
Aofan Liu
Lulu Tang
Ting Pan
Yuguo Yin
Bin Wang
Ao Yang
MLLM
AAML
40
0
0
02 Apr 2025
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
Shaojin Wu
Mengqi Huang
Wenxu Wu
Yufeng Cheng
Fei Ding
Qian He
DiffM
50
4
0
02 Apr 2025
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection
Souradip Chakraborty
Mohammadreza Pourreza
Ruoxi Sun
Yiwen Song
Nino Scherrer
...
Furong Huang
Amrit Singh Bedi
Ahmad Beirami
Hamid Palangi
Tomas Pfister
48
0
0
02 Apr 2025
Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks
Ali Al-Kaswan
Sebastian Deatc
Begüm Koç
A. van Deursen
M. Izadi
AAML
33
0
0
02 Apr 2025
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
Jiawei Wang
Yushen Zuo
Yuanjun Chai
Z. Liu
Yichen Fu
Yichun Feng
Kin-Man Lam
AAML
VLM
40
0
0
02 Apr 2025
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Chaohu Liu
Tianyi Gui
Yu Liu
Linli Xu
VLM
AAML
68
1
0
02 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
56
0
0
01 Apr 2025
CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment
Jiangnan Li
Thuy-Trang Vu
Christian Herold
Amirhossein Tebbifakhr
Shahram Khadivi
Gholamreza Haffari
33
0
0
31 Mar 2025
Do Large Language Models Exhibit Spontaneous Rational Deception?
Samuel M. Taylor
Benjamin K. Bergen
LRM
38
0
0
31 Mar 2025
FeRG-LLM : Feature Engineering by Reason Generation Large Language Models
Jeonghyun Ko
Gyeongyun Park
Donghoon Lee
Kyunam Lee
LRM
47
0
0
30 Mar 2025
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
Vivek Iyer
Ricardo Rei
Pinzhen Chen
Alexandra Birch
SyDa
LM&MA
66
0
0
29 Mar 2025
A Framework for Lightweight Responsible Prompting Recommendation
Tiago Machado
Sara E. Berger
Cassia Sanctos
Vagner Figueiredo de Santana
Lemara Williams
Zhaoqing Wu
28
0
0
29 Mar 2025
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Wei Shen
Guanlin Liu
Zheng Wu
Ruofei Zhu
Qingping Yang
Chao Xin
Yu Yue
Lin Yan
82
8
0
28 Mar 2025
Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF
Syrine Belakaria
Joshua Kazdan
Charles Marx
Chris Cundy
W. Neiswanger
Sanmi Koyejo
Barbara Engelhardt
Stefano Ermon
34
0
0
28 Mar 2025
Sell It Before You Make It: Revolutionizing E-Commerce with Personalized AI-Generated Items
Jianghao Lin
Peng Du
Jiaqi Liu
W. J. Li
Yong Yu
Weinan Zhang
Yang Cao
DiffM
36
0
0
28 Mar 2025
Modeling Challenging Patient Interactions: LLMs for Medical Communication Training
Anna Bodonhelyi
Christian Stegemann-Philipps
Alessandra Sonanini
Lea Herschbach
Márton Szép
Anne Herrmann-Werner
Teresa Festl-Wietek
Enkelejda Kasneci
Friederike Holderried
LM&MA
71
0
0
28 Mar 2025
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou
Feng Hong
Jiaan Luo
Jiangchao Yao
Dongsheng Li
Bo Han
Y. Zhang
Yanfeng Wang
VLM
59
0
0
28 Mar 2025
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment
Souradip Chakraborty
Sujay Bhatt
Udari Madhushani Sehwag
Soumya Suvra Ghosal
Jiahao Qiu
Mengdi Wang
Dinesh Manocha
Furong Huang
Alec Koppel
Sumitra Ganesh
46
2
0
27 Mar 2025
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models
Y. Zhang
Mengchen Zhang
Tong Wu
Tengfei Wang
Gordon Wetzstein
D. Lin
Ziwei Liu
3DV
ELM
71
0
0
27 Mar 2025
Multi-head Reward Aggregation Guided by Entropy
Xiaomin Li
Xupeng Chen
Jingxuan Fan
Eric Hanchen Jiang
Mingye Gao
AAML
49
1
0
26 Mar 2025
GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization
Zhouhong Gu
Xingzhou Chen
Xiaoran Shi
Tao Wang
Suhang Zheng
Tianyu Li
Hongwei Feng
Yanghua Xiao
67
0
0
26 Mar 2025
Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy
Yinan Sun
Xiongkuo Min
Zicheng Zhang
Yixuan Gao
Y. Cao
Guangtao Zhai
VLM
59
0
0
26 Mar 2025
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
Seungone Kim
Ian Wu
Jinu Lee
Xiang Yue
Seongyun Lee
...
Kiril Gashteovski
Carolin (Haas) Lawrence
J. Hockenmaier
Graham Neubig
Sean Welleck
LRM
48
2
0
25 Mar 2025
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
Zhenyu Pan
Han Liu
OffRL
LRM
64
3
0
24 Mar 2025
Sun-Shine: A Large Language Model for Tibetan Culture
Cheng Huang
Fan Gao
Nyima Tashi
Yutong Liu
Xiangxiang Wang
...
Gadeng Luosang
Rinchen Dongrub
Dorje Tashi
Xiao Feng
Yongbin Yu
ALM
98
2
0
24 Mar 2025
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
Zefeng Zhang
Hengzhu Tang
Jiawei Sheng
Zhenyu Zhang
Yiming Ren
Zhenyang Li
Dawei Yin
Duohe Ma
Tingwen Liu
47
0
0
23 Mar 2025
ExpertRAG: Efficient RAG with Mixture of Experts -- Optimizing Context Retrieval for Adaptive LLM Responses
Esmail Gumaan
MoE
28
0
0
23 Mar 2025
Understanding the Effects of RLHF on the Quality and Detectability of LLM-Generated Texts
Beining Xu
Arkaitz Zubiaga
DeLMO
66
0
0
23 Mar 2025
MultiScale Contextual Bandits for Long Term Objectives
Richa Rastogi
Yuta Saito
Thorsten Joachims
OffRL
48
0
0
22 Mar 2025
Enhancing Persona Consistency for LLMs' Role-Playing using Persona-Aware Contrastive Learning
Ke Ji
Yixin Lian
Linxu Li
Jingsheng Gao
Weiyuan Li
Bin Dai
34
0
0
22 Mar 2025
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Sharan Maiya
Yinhong Liu
Ramit Debnath
Anna Korhonen
30
0
0
22 Mar 2025
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
Jian-Yu Guan
J. Wu
J. Li
Chuanqi Cheng
Wei Yu Wu
LM&MA
69
0
0
21 Mar 2025
HAPI: A Model for Learning Robot Facial Expressions from Human Preferences
Dongsheng Yang
Qianying Liu
Wataru Sato
Takashi Minato
Chaoran Liu
Shin’ya Nishida
41
0
0
21 Mar 2025
What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models
Keyon Vafa
Sarah Bentley
Jon M. Kleinberg
S. Mullainathan
38
0
0
21 Mar 2025
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
Chen Li
Nazhou Liu
Kai Yang
38
3
0
20 Mar 2025
Echoes of Power: Investigating Geopolitical Bias in US and China Large Language Models
Andre G. C. Pacheco
Athus Cavalini
Giovanni Comarela
36
1
0
20 Mar 2025
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
M. Wong
C. Tan
ALM
83
3
0
19 Mar 2025
From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment
J. Li
Jian-Yu Guan
Songhao Wu
Wei Yu Wu
Rui Yan
62
1
0
19 Mar 2025
Rolling Forward: Enhancing LightGCN with Causal Graph Convolution for Credit Bond Recommendation
Ashraf Ghiye
Baptiste Barreau
Laurent Carlier
Michalis Vazirgiannis
76
0
0
18 Mar 2025
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations
Dillon Bowen
Ann-Kathrin Dombrowski
Adam Gleave
Chris Cundy
ELM
48
0
0
17 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
H. Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
57
17
0
17 Mar 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku
Declan Campbell
Xuechunzi Bai
Jiayi Geng
Ryan Liu
...
Ilia Sucholutsky
Veniamin Veselovsky
Liyi Zhang
Jian-Qiao Zhu
Thomas L. Griffiths
ELM
88
2
0
17 Mar 2025
MAP: Multi-user Personalization with Collaborative LLM-powered Agents
Christine P. Lee
Jihye Choi
Bilge Mutlu
LLMAG
70
0
1
17 Mar 2025
D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning
Jia Zhang
Chen-Xi Zhang
Yao Liu
Yi-Xuan Jin
Xiao-Wen Yang
Bo Zheng
Y. Liu
Lan-Zhe Guo
47
2
0
14 Mar 2025
ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning
Xinyi Wang
Jiashui Wang
Peng Chen
Jinbo Su
Yanming Liu
Long Liu
Yangdong Wang
Qiyuan Chen
Kai Yun
Chunfu Jia
42
0
0
14 Mar 2025
Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification
Yingjie Zhang
Tong Liu
Zhe Zhao
Guozhu Meng
Kai Chen
AAML
51
1
0
14 Mar 2025
RankPO: Preference Optimization for Job-Talent Matching
Y. Zhang
M. Wang
Yu Wang
Xiaohui Wang
41
0
0
13 Mar 2025
Previous
1
2
3
4
5
6
...
34
35
36
Next