Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2409.00162
Cited By
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
AAAI Conference on Artificial Intelligence (AAAI), 2024
30 August 2024
Jiayi Zhou
Yalan Qin
Juntao Dai
Yaodong Yang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback"
4 / 4 papers shown
Title
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
Borong Zhang
Yuhao Zhang
Yalan Qin
Yingshan Lei
Josef Dai
Yuanpei Chen
Yaodong Yang
395
4
0
05 Mar 2025
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Wenjie Qiu
Yi-Chen Li
Xuqin Zhang
Tianyi Zhang
Yiming Zhang
Zongzhang Zhang
Yang Yu
ALM
373
2
0
01 Mar 2025
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
533
96
0
29 Apr 2024
Aligner: Efficient Alignment by Learning to Correct
Jiaming Ji
Boyuan Chen
Hantao Lou
Chongye Guo
Borong Zhang
Xuehai Pan
Juntao Dai
Tianyi Qiu
Yaodong Yang
237
71
0
04 Feb 2024
1