Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.17257
Cited By
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
27 February 2024
Jie Cheng
Gang Xiong
Xingyuan Dai
Q. Miao
Yisheng Lv
Fei-Yue Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences"
16 / 16 papers shown
Title
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations
Shuaiyi Huang
Mara Levy
Anubhav Gupta
Daniel Ekpo
Ruijie Zheng
Abhinav Shrivastava
19
0
0
09 May 2025
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
Yuki Kadokawa
Jonas Frey
Takahiro Miki
Takamitsu Matsubara
Marco Hutter
21
0
0
09 May 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
J. Z. Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
56
0
0
21 Apr 2025
VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences
Anukriti Singh
Amisha Bhaskar
Peihong Yu
Souradip Chakraborty
Ruthwik Dasyam
Amrit Singh Bedi
Pratap Tokekar
48
0
0
18 Mar 2025
Strategyproof Reinforcement Learning from Human Feedback
Thomas Kleine Buening
Jiarui Gan
Debmalya Mandal
Marta Z. Kwiatkowska
47
0
0
13 Mar 2025
Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments
Soonwoo Kwon
Jin-Young Kim
Hyojun Go
Kyungjune Baek
53
0
0
11 Mar 2025
Skill Expansion and Composition in Parameter Space
Tenglong Liu
J. Li
Yinan Zheng
Haoyi Niu
Yixing Lan
Xin Xu
Xianyuan Zhan
51
4
0
09 Feb 2025
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
Udita Ghosh
Dripta S. Raychaudhuri
Jiachen Li
Konstantinos Karydis
A. Roy-Chowdhury
VLM
55
0
0
03 Feb 2025
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Tongtian Yue
Longteng Guo
Jie Cheng
Xuange Gao
J. Liu
MoE
18
0
0
14 Oct 2024
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Jie Cheng
Ruixi Qiao
Gang Xiong
Binhua Li
Yingwei Ma
Binhua Li
Yongbin Li
Yisheng Lv
OffRL
OnRL
LM&Ro
32
3
0
01 Oct 2024
Robust Reinforcement Learning from Corrupted Human Feedback
Alexander Bukharin
Ilgee Hong
Haoming Jiang
Zichong Li
Qingru Zhang
Zixuan Zhang
Tuo Zhao
26
4
0
21 Jun 2024
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Calarina Muslimani
M. E. Taylor
OffRL
38
2
0
30 Apr 2024
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
100
92
0
22 Jan 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving
Ming Zhou
Jun-Jie Luo
Julian Villela
Yaodong Yang
David Rusu
...
H. Ammar
Hongbo Zhang
Wulong Liu
Jianye Hao
Jun Wang
131
192
0
19 Oct 2020
Curriculum Loss: Robust Learning and Generalization against Label Corruption
Yueming Lyu
Ivor W. Tsang
NoLa
47
170
0
24 May 2019
1