ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.00177
  4. Cited By
Advantage-Weighted Regression: Simple and Scalable Off-Policy
  Reinforcement Learning

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

1 October 2019
Xue Bin Peng
Aviral Kumar
Grace Zhang
Sergey Levine
    OffRL
ArXivPDFHTML

Papers citing "Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning"

50 / 404 papers shown
Title
VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving
VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving
Yansong Qu
Zilin Huang
Zihao Sheng
Jiancong Chen
Sikai Chen
Samuel Labi
OffRL
12
0
0
22 May 2025
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
Zhepei Wei
Wenlin Yao
Yao Liu
Weizhi Zhang
Qin Lu
...
Puyang Xu
Chao Zhang
Bing Yin
Hyokun Yun
Lihong Li
OffRL
CLL
OnRL
LRM
17
0
0
22 May 2025
FlowQ: Energy-Guided Flow Policies for Offline Reinforcement Learning
FlowQ: Energy-Guided Flow Policies for Offline Reinforcement Learning
Marvin Alles
Nutan Chen
Patrick van der Smagt
Botond Cseke
19
0
0
20 May 2025
Flattening Hierarchies with Policy Bootstrapping
Flattening Hierarchies with Policy Bootstrapping
John L. Zhou
Jonathan C. Kao
OffRL
19
0
0
20 May 2025
Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning
Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning
Hongjoon Ahn
Heewoong Choi
Jisu Han
Taesup Moon
OffRL
27
0
0
19 May 2025
TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion
TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion
Khang Nguyen
Khai Nguyen
An T. Le
Jan Peters
Manfred Huber
Ngo Anh Vien
Minh Nhat Vu
17
0
0
19 May 2025
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Francesco Sovrano
29
0
0
16 May 2025
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
Jiahui Zhang
Yusen Luo
Abrar Anwar
Sumedh Anand Sontakke
Joseph J Lim
Jesse Thomason
Erdem Biyik
Jesse Zhang
OffRL
LM&Ro
29
0
0
16 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
22
0
0
16 May 2025
Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
Ningyuan Yang
Jiaxuan Gao
Feng Gao
Yi Wu
Chao Yu
48
0
0
15 May 2025
Adaptive Diffusion Policy Optimization for Robotic Manipulation
Adaptive Diffusion Policy Optimization for Robotic Manipulation
Huiyun Jiang
Zhuang Yang
34
0
0
13 May 2025
What Matters for Batch Online Reinforcement Learning in Robotics?
What Matters for Batch Online Reinforcement Learning in Robotics?
Perry Dong
Suvir Mirchandani
Dorsa Sadigh
Chelsea Finn
OffRL
36
0
0
12 May 2025
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach
Minting Pan
Yitao Zheng
Jiajian Li
Yunbo Wang
Xiaokang Yang
OffRL
53
0
0
10 May 2025
Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach
Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach
Xuyang Chen
Keyu Yan
Lin Zhao
OffRL
61
0
0
08 May 2025
Flow-GRPO: Training Flow Matching Models via Online RL
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu
Gongye Liu
Jiajun Liang
Yongqian Li
Jiaheng Liu
Xinyu Wang
Pengfei Wan
Di Zhang
Wanli Ouyang
AI4CE
78
0
0
08 May 2025
ARDNS-FN-Quantum: A Quantum-Enhanced Reinforcement Learning Framework with Cognitive-Inspired Adaptive Exploration for Dynamic Environments
ARDNS-FN-Quantum: A Quantum-Enhanced Reinforcement Learning Framework with Cognitive-Inspired Adaptive Exploration for Dynamic Environments
Umberto Gonçalves de Sousa
29
0
0
07 May 2025
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
Jake Grigsby
Yuke Zhu
Michael S Ryoo
Juan Carlos Niebles
OffRL
VLM
48
0
0
06 May 2025
Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
Jifeng Hu
Sili Huang
Zheng Yang
Shengchao Hu
Li Shen
Hechang Chen
Lichao Sun
Yi-Ju Chang
Dacheng Tao
OffRL
248
0
0
03 May 2025
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
Lang Feng
Weihao Tan
Zhiyi Lyu
Longtao Zheng
Haiyang Xu
Ming Yan
Fei Huang
Jingyi Wang
31
0
0
01 May 2025
Dynamic Action Interpolation: A Universal Approach for Accelerating Reinforcement Learning with Expert Guidance
Dynamic Action Interpolation: A Universal Approach for Accelerating Reinforcement Learning with Expert Guidance
Wenjun Cao
52
0
0
26 Apr 2025
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Li He
He Zhao
Stephen Wan
Dadong Wang
Lina Yao
Tongliang Liu
43
0
0
19 Apr 2025
SkyReels-V2: Infinite-length Film Generative Model
SkyReels-V2: Infinite-length Film Generative Model
Guibin Chen
D. Lin
Jiangping Yang
Chunze Lin
J. Zhu
...
Di Qiu
Debang Li
Zhengcong Fei
Yang Li
Yahui Zhou
DiffM
VGen
61
2
0
17 Apr 2025
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
Haoran Xu
Shuozhe Li
Harshit S. Sikchi
S. Niekum
Amy Zhang
OffRL
34
0
0
17 Apr 2025
A Clean Slate for Offline Reinforcement Learning
A Clean Slate for Offline Reinforcement Learning
Matthew Jackson
Uljad Berdica
Jarek Liesen
Shimon Whiteson
Jakob Foerster
OffRL
OnRL
57
0
0
15 Apr 2025
Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing
Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing
Vinal Asodia
Zhenhua Feng
Saber Fallah
OffRL
50
0
0
11 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
39
1
0
04 Apr 2025
Exploring the Evolution of Physics Cognition in Video Generation: A Survey
Exploring the Evolution of Physics Cognition in Video Generation: A Survey
Minghui Lin
Xiang Wang
Yansen Wang
Shu Wang
Fengqi Dai
...
Cunxiang Wang
Zhengrong Zuo
Nong Sang
Siteng Huang
Donglin Wang
EGVM
VGen
90
3
0
27 Mar 2025
Offline Reinforcement Learning with Discrete Diffusion Skills
Offline Reinforcement Learning with Discrete Diffusion Skills
Ruixi Qiao
Jie Cheng
Xingyuan Dai
Yonglin Tian
Yisheng Lv
OffRL
84
0
0
26 Mar 2025
One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF
One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF
Xin Cai
41
1
0
25 Mar 2025
COLSON: Controllable Learning-Based Social Navigation via Diffusion-Based Reinforcement Learning
COLSON: Controllable Learning-Based Social Navigation via Diffusion-Based Reinforcement Learning
Yuki Tomita
Kohei Matsumoto
Yuki Hyodo
Ryo Kurazume
71
0
0
18 Mar 2025
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Yuxiao Qu
Matthew Y. R. Yang
Amrith Rajagopal Setlur
Lewis Tunstall
E. Beeching
Ruslan Salakhutdinov
Aviral Kumar
OffRL
75
20
0
10 Mar 2025
THE-SEAN: A Heart Rate Variation-Inspired Temporally High-Order Event-Based Visual Odometry with Self-Supervised Spiking Event Accumulation Networks
Chaoran Xiong
Litao Wei
Kehui Ma
Zhen Sun
Yan Xiang
Zihan Nan
Trieu-Kien Truong
Ling Pei
46
0
0
07 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Qiufeng Wang
Tony Q.S. Quek
Soujanya Poria
Zuozhu Liu
55
0
0
06 Mar 2025
Behavior Preference Regression for Offline Reinforcement Learning
Padmanaba Srinivasan
William J. Knottenbelt
OffRL
38
0
0
02 Mar 2025
Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset
Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset
Yiqin Yang
Quanwei Wang
Chenghao Li
Hao Hu
Chengjie Wu
...
Dianyu Zhong
Ziyou Zhang
Qianchuan Zhao
Chongjie Zhang
Xu Bo
OffRL
57
0
0
26 Feb 2025
Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches
Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches
D. Elbaz
Oren Salzman
OffRL
37
0
0
13 Feb 2025
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents
Hao Bai
Yifei Zhou
Li Erran Li
Sergey Levine
Aviral Kumar
OffRL
53
2
0
13 Feb 2025
Advancing Autonomous VLM Agents via Variational Subgoal-Conditioned Reinforcement Learning
Advancing Autonomous VLM Agents via Variational Subgoal-Conditioned Reinforcement Learning
Qingyuan Wu
Jianheng Liu
Haifeng Zhang
Jun Wang
Kun Shao
OffRL
107
1
0
11 Feb 2025
DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization
DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization
Xuefeng Liu
Songhao Jiang
Siyu Chen
Zhuoran Yang
Yuxin Chen
Ian Foster
Rick L. Stevens
LM&MA
OffRL
60
0
0
11 Feb 2025
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Vivek Myers
Bill Chunyuan Zheng
Anca Dragan
Kuan Fang
Sergey Levine
72
0
0
08 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
120
4
0
06 Feb 2025
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Zishun Yu
Tengyu Xu
Di Jin
Karthik Abinav Sankararaman
Yun He
...
Eryk Helenowski
Chen Zhu
Sinong Wang
Hao Ma
Han Fang
LRM
56
5
0
29 Jan 2025
Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning
Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning
Zijian Guo
Weichao Zhou
Wenchao Li
OffRL
105
2
0
28 Jan 2025
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi
Xiao-Chang Liu
Iat Long Iong
Hanyu Lai
Xingwu Sun
...
Shuntian Yao
Tianjie Zhang
Wei Xu
J. Tang
Yuxiao Dong
111
18
0
28 Jan 2025
Improving Video Generation with Human Feedback
Improving Video Generation with Human Feedback
Jie Liu
Gongye Liu
Jiajun Liang
Ziyang Yuan
Xiaokun Liu
...
Pengfei Wan
Di Zhang
Kun Gai
Yujiu Yang
Wanli Ouyang
VGen
EGVM
69
13
0
23 Jan 2025
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Yong-Hyun Park
Sangdoo Yun
Jin-Hwa Kim
Junho Kim
Geonhui Jang
Yonghyun Jeong
Junghyo Jo
Gayoung Lee
78
14
0
17 Jan 2025
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
Abdullah Akgul
Manuel Haußmann
M. Kandemir
OffRL
83
1
0
17 Jan 2025
Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques
Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques
Natalia Zhang
X. Wang
Qiwen Cui
Runlong Zhou
Sham Kakade
Simon S. Du
OffRL
61
0
0
10 Jan 2025
OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment
OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment
Yooseok Lim
Sujee Lee
OffRL
152
0
0
03 Jan 2025
Geometric-Averaged Preference Optimization for Soft Preference Labels
Geometric-Averaged Preference Optimization for Soft Preference Labels
Hiroki Furuta
Kuang-Huei Lee
Shixiang Shane Gu
Y. Matsuo
Aleksandra Faust
Heiga Zen
Izzeddin Gur
58
7
0
31 Dec 2024
123456789
Next