ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.00162
  4. Cited By
Sequence to Sequence Reward Modeling: Improving RLHF by Language
  Feedback

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback

AAAI Conference on Artificial Intelligence (AAAI), 2024
30 August 2024
Jiayi Zhou
Yalan Qin
Juntao Dai
Yaodong Yang
ArXiv (abs)PDFHTML

Papers citing "Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback"

4 / 4 papers shown
Title
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
Borong Zhang
Yuhao Zhang
Yalan Qin
Yingshan Lei
Josef Dai
Yuanpei Chen
Yaodong Yang
395
4
0
05 Mar 2025
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Wenjie Qiu
Yi-Chen Li
Xuqin Zhang
Tianyi Zhang
Yiming Zhang
Zongzhang Zhang
Yang Yu
ALM
373
2
0
01 Mar 2025
DPO Meets PPO: Reinforced Token Optimization for RLHF
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
533
96
0
29 Apr 2024
Aligner: Efficient Alignment by Learning to Correct
Aligner: Efficient Alignment by Learning to Correct
Jiaming Ji
Boyuan Chen
Hantao Lou
Chongye Guo
Borong Zhang
Xuehai Pan
Juntao Dai
Tianyi Qiu
Yaodong Yang
237
71
0
04 Feb 2024
1