Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.00177
Cited By
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
1 October 2019
Xue Bin Peng
Aviral Kumar
Grace Zhang
Sergey Levine
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning"
50 / 404 papers shown
Title
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Zhanhui Zhou
Zhixuan Liu
Jie Liu
Zhichen Dong
Chao Yang
Yu Qiao
ALM
49
20
0
29 May 2024
Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning
Tianle Zhang
Jiayi Guan
Lin Zhao
Yihang Li
Dongjiang Li
...
Lei Sun
Yue Chen
Xuelong Wei
Lusong Li
Xiaodong He
43
1
0
29 May 2024
Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination
Zhiyao Luo
Yangchen Pan
Peter Watkinson
Tingting Zhu
OffRL
33
0
0
28 May 2024
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
OnRL
50
2
0
28 May 2024
QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation
Gonccalo R. A. Faria
Sweta Agrawal
António Farinhas
Ricardo Rei
José G. C. de Souza
André F. T. Martins
33
4
0
28 May 2024
AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization
Longxiang He
Li Shen
Junbo Tan
Xueqian Wang
56
1
0
28 May 2024
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization
Shutong Ding
Ke Hu
Zhenhao Zhang
Kan Ren
Weinan Zhang
Jingyi Yu
Jingya Wang
Ye-ling Shi
42
8
0
25 May 2024
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
Yang Zhang
Shixin Yang
Chenjia Bai
Fei Wu
Xiu Li
Zhen Wang
Xuelong Li
LLMAG
43
25
0
23 May 2024
Offline Reinforcement Learning from Datasets with Structured Non-Stationarity
Johannes Ackermann
Takayuki Osa
Masashi Sugiyama
OffRL
42
2
0
23 May 2024
Task-agnostic Decision Transformer for Multi-type Agent Control with Federated Split Training
Zhiyuan Wang
Bokui Chen
Xiaoyang Qu
Zhenhou Hong
Jing Xiao
Jianzong Wang
46
0
0
22 May 2024
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
N. Sebe
Mubarak Shah
EGVM
89
6
0
22 May 2024
SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling
Xingzhou Lou
Junge Zhang
Jian Xie
Lifeng Liu
Dong Yan
Kaiqi Huang
45
11
0
21 May 2024
Value Augmented Sampling for Language Model Alignment and Personalization
Seungwook Han
Idan Shenfeld
Akash Srivastava
Yoon Kim
Pulkit Agrawal
OffRL
36
23
0
10 May 2024
Offline Model-Based Optimization via Policy-Guided Gradient Search
Yassine Chemingui
Aryan Deshwal
Trong Nghia Hoang
J. Doppa
OffRL
53
9
0
08 May 2024
Improving Offline Reinforcement Learning with Inaccurate Simulators
Yiwen Hou
Haoyuan Sun
Jinming Ma
Feng Wu
OffRL
37
5
0
07 May 2024
Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning
Stone Tao
Arth Shukla
Tse-kai Chan
Hao Su
OffRL
41
4
0
06 May 2024
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
Puhao Li
Tengyu Liu
Yuyang Li
Muzhi Han
Haoran Geng
Shu Wang
Yixin Zhu
Song-Chun Zhu
Siyuan Huang
41
16
0
26 Apr 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Gokul Swamy
Kianté Brantley
Thorsten Joachims
J. Andrew Bagnell
Jason D. Lee
Wen Sun
OffRL
48
31
0
25 Apr 2024
Offline Reinforcement Learning with Behavioral Supervisor Tuning
Padmanaba Srinivasan
William J. Knottenbelt
OffRL
40
2
0
25 Apr 2024
AFU: Actor-Free critic Updates in off-policy RL for continuous control
Nicolas Perrin-Gilbert
OffRL
42
0
0
24 Apr 2024
Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay
Jinmei Liu
Wenbin Li
Xiangyu Yue
Shilin Zhang
Chunlin Chen
Zhi Wang
OffRL
DiffM
44
5
0
16 Apr 2024
Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning
Xudong Yu
Chenjia Bai
Hongyi Guo
Changhong Wang
Zhen Wang
OffRL
44
0
0
09 Apr 2024
Scaling Vision-and-Language Navigation With Offline RL
Valay Bundele
Mahesh Bhupati
Biplab Banerjee
Aditya Grover
OffRL
34
1
0
27 Mar 2024
IBCB: Efficient Inverse Batched Contextual Bandit for Behavioral Evolution History
Yi Xu
Weiran Shen
Xiao Zhang
Jun Xu
OffRL
51
0
0
24 Mar 2024
GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot
Wenxuan Song
Han Zhao
Pengxiang Ding
Can Cui
Shangke Lyu
Yaning Fan
Donglin Wang
OffRL
40
11
0
20 Mar 2024
Simple Ingredients for Offline Reinforcement Learning
Edoardo Cetin
Andrea Tirinzoni
Matteo Pirotta
A. Lazaric
Yann Ollivier
Ahmed Touati
OffRL
47
2
0
19 Mar 2024
A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization
Yudong Luo
Yangchen Pan
Han Wang
Philip Torr
Pascal Poupart
55
3
0
17 Mar 2024
A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective
Yunpeng Qing
Shunyu Liu
Jingyuan Cong
Kaixuan Chen
Yihe Zhou
Mingli Song
OffRL
49
1
0
12 Mar 2024
Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization
Shitong Duan
Xiaoyuan Yi
Peng Zhang
Tun Lu
Xing Xie
Ning Gu
40
4
0
06 Mar 2024
Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents
Yifan Song
Da Yin
Xiang Yue
Jie Huang
Sujian Li
Bill Yuchen Lin
45
68
0
04 Mar 2024
Offline Goal-Conditioned Reinforcement Learning for Safety-Critical Tasks with Recovery Policy
Chenyang Cao
Zichen Yan
Renhao Lu
Junbo Tan
Xueqian Wang
OffRL
47
2
0
04 Mar 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou
Andrea Zanette
Jiayi Pan
Sergey Levine
Aviral Kumar
65
51
0
29 Feb 2024
DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning
Anthony Liang
Guy Tennenholtz
Chih-Wei Hsu
Yinlam Chow
Erdem Biyik
Craig Boutilier
OffRL
45
1
0
25 Feb 2024
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement
Ruiqi Zhang
Yuexiang Zhai
Andrea Zanette
56
0
0
24 Feb 2024
Foundation Policies with Hilbert Representations
Seohong Park
Tobias Kreiman
Sergey Levine
SSL
OffRL
55
21
0
23 Feb 2024
Generalizing Reward Modeling for Out-of-Distribution Preference Learning
Chen Jia
46
2
0
22 Feb 2024
COPR: Continual Human Preference Learning via Optimal Policy Regularization
Han Zhang
Lin Gui
Yu Lei
Yuanzhao Zhai
Yehong Zhang
...
Hui Wang
Yue Yu
Kam-Fai Wong
Bin Liang
Ruifeng Xu
CLL
42
4
0
22 Feb 2024
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces
Tianyu Zheng
Ge Zhang
Xingwei Qu
Ming Kuang
Stephen W. Huang
Zhaofeng He
OffRL
58
1
0
20 Feb 2024
RLVF: Learning from Verbal Feedback without Overgeneralization
Moritz Stephan
Alexander Khazatsky
Eric Mitchell
Annie S. Chen
Sheryl Hsu
Archit Sharma
Chelsea Finn
47
12
0
16 Feb 2024
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
Shentao Yang
Tianqi Chen
Mingyuan Zhou
EGVM
36
23
0
13 Feb 2024
SPO: Sequential Monte Carlo Policy Optimisation
Matthew Macfarlane
Edan Toledo
Donal Byrne
Paul Duckworth
Alexandre Laterre
32
1
0
12 Feb 2024
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Jost Tobias Springenberg
A. Abdolmaleki
Jingwei Zhang
Oliver Groth
Michael Bloesch
...
Sarah Bechtle
Steven Kapturowski
Roland Hafner
N. Heess
Martin Riedmiller
OffRL
LRM
41
12
0
08 Feb 2024
Noise Contrastive Alignment of Language Models with Explicit Rewards
Huayu Chen
Guande He
Lifan Yuan
Ganqu Cui
Hang Su
Jun Zhu
63
44
0
08 Feb 2024
Transductive Reward Inference on Graph
B. Qu
Xiaofeng Cao
Qing Guo
Yi Chang
Ivor W. Tsang
Chengqi Zhang
OffRL
45
0
0
06 Feb 2024
Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning
Zihan Ding
Amy Zhang
Yuandong Tian
Qinqing Zheng
OffRL
55
17
0
05 Feb 2024
Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning
Abdelhakim Benechehab
Albert Thomas
Balázs Kégl
OffRL
43
2
0
05 Feb 2024
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
M. Pternea
Prerna Singh
Abir Chakraborty
Y. Oruganti
M. Milletarí
Sayli Bapat
Kebei Jiang
OffRL
36
7
0
02 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
182
463
0
02 Feb 2024
Towards Efficient Exact Optimization of Language Model Alignment
Haozhe Ji
Cheng Lu
Yilin Niu
Pei Ke
Hongning Wang
Jun Zhu
Jie Tang
Minlie Huang
63
12
0
01 Feb 2024
Context-Former: Stitching via Latent Conditioned Sequence Modeling
Ziqi Zhang
Jingzehua Xu
Jinxin Liu
Zifeng Zhuang
Donglin Wang
Miao Liu
Shuai Zhang
OffRL
50
4
0
29 Jan 2024
Previous
1
2
3
4
5
6
7
8
9
Next