ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.10369
  4. Cited By
Reinforcement learning

Reinforcement learning

16 May 2024
Florentin Wörgötter
ArXivPDFHTML

Papers citing "Reinforcement learning"

50 / 89 papers shown
Title
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang
Chongjie Si
Jun Luo
Hanwang Zhang
Chao Ma
83
0
0
23 May 2025
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
Shuang Sun
Huatong Song
Yuhao Wang
Ruiyang Ren
Jinhao Jiang
...
Wayne Xin Zhao
Zheng Liu
Lei Fang
Zhongyuan Wang
Ji-Rong Wen
LRM
27
4
0
22 May 2025
Bi-level Mean Field: Dynamic Grouping for Large-Scale MARL
Bi-level Mean Field: Dynamic Grouping for Large-Scale MARL
Yuxuan Zheng
Yihe Zhou
Feiyang Xu
Mingli Song
Shunyu Liu
OffRL
42
0
0
10 May 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin
Hansi Zeng
Zhenrui Yue
Dong Wang
Sercan O. Arik
Dong Wang
Hamed Zamani
Jiawei Han
RALM
ReLM
KELM
OffRL
AI4TS
LRM
118
77
0
12 Mar 2025
HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion
HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion
Sixu Lin
Guanren Qiao
Yunxin Tai
Ang Li
Kui Jia
Guiliang Liu
51
1
0
02 Mar 2025
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards
Shresth Verma
Niclas Boehmer
Lingkai Kong
Milind Tambe
92
2
0
17 Jan 2025
CORD: Generalizable Cooperation via Role Diversity
CORD: Generalizable Cooperation via Role Diversity
Kanefumi Matsuyama
Kefan Su
Jiangxing Wang
Deheng Ye
Zongqing Lu
53
0
0
04 Jan 2025
Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models
Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models
Alireza Amiri-Margavi
Iman Jebellat
Ehsan Jebellat
Seyed Pouyan Mousavi Davoudi
126
2
0
25 Nov 2024
Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning
Root Cause Attribution of Delivery Risks via Causal Discovery with Reinforcement Learning
Shi Bo
Minheng Xiao
53
7
0
11 Aug 2024
Automated radiotherapy treatment planning guided by GPT-4Vision
Automated radiotherapy treatment planning guided by GPT-4Vision
Sheng Liu
O. Pastor-Serrano
Yizheng Chen
Matthew Gopaulchan
Weixing Liang
...
Michael Gensheimer
P. Dong
Yong Yang
James Zou
Lei Xing
53
6
0
21 Jun 2024
Review-based Recommender Systems: A Survey of Approaches, Challenges and Future Perspectives
Review-based Recommender Systems: A Survey of Approaches, Challenges and Future Perspectives
Emrul Hasan
Mizanur Rahman
Chen Ding
Jimmy Xiangji Huang
Shaina Raza
37
5
0
09 May 2024
Employing Federated Learning for Training Autonomous HVAC Systems
Employing Federated Learning for Training Autonomous HVAC Systems
Fredrik Hagström
Vikas Garg
Fabricio Oliveira
AI4CE
95
0
0
01 May 2024
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
Jianlan Luo
Zheyuan Hu
Charles Xu
You Liang Tan
Jacob Berg
Archit Sharma
S. Schaal
Chelsea Finn
Abhishek Gupta
Sergey Levine
OffRL
OnRL
44
44
0
29 Jan 2024
Proximal Policy Optimization with Graph Neural Networks for Optimal Power Flow
Proximal Policy Optimization with Graph Neural Networks for Optimal Power Flow
Ángela López-Cardona
Guillermo Bernárdez
Pere Barlet-Ros
A. Cabellos-Aparicio
105
4
0
23 Dec 2022
Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
Yujing Hu
Weixun Wang
Hangtian Jia
Yixiang Wang
Yingfeng Chen
Jianye Hao
Feng Wu
Changjie Fan
OffRL
30
175
0
05 Nov 2020
Model-Augmented Actor-Critic: Backpropagating through Paths
Model-Augmented Actor-Critic: Backpropagating through Paths
I. Clavera
Yao Fu
Pieter Abbeel
48
88
0
16 May 2020
Decision-Making with Auto-Encoding Variational Bayes
Decision-Making with Auto-Encoding Variational Bayes
Romain Lopez
Pierre Boyeau
Nir Yosef
Michael I. Jordan
Jeffrey Regier
BDL
138
10,591
0
17 Feb 2020
A Tutorial on Learning With Bayesian Networks
A Tutorial on Learning With Bayesian Networks
David Heckerman
CML
148
3,515
0
01 Feb 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
126
42,038
0
03 Dec 2019
Benchmarking Model-Based Reinforcement Learning
Benchmarking Model-Based Reinforcement Learning
Tingwu Wang
Xuchan Bao
I. Clavera
Jerrick Hoang
Yeming Wen
Eric D. Langlois
Matthew Shunshi Zhang
Guodong Zhang
Pieter Abbeel
Jimmy Ba
OffRL
52
361
0
03 Jul 2019
When to Trust Your Model: Model-Based Policy Optimization
When to Trust Your Model: Model-Based Policy Optimization
Michael Janner
Justin Fu
Marvin Zhang
Sergey Levine
OffRL
48
939
0
19 Jun 2019
Statistical Performance of Radio Interferometric Calibration
Statistical Performance of Radio Interferometric Calibration
S. Yatawatta
42
2
0
27 Feb 2019
Soft Actor-Critic Algorithms and Applications
Soft Actor-Critic Algorithms and Applications
Tuomas Haarnoja
Aurick Zhou
Kristian Hartikainen
George Tucker
Sehoon Ha
...
Vikash Kumar
Henry Zhu
Abhishek Gupta
Pieter Abbeel
Sergey Levine
94
2,391
0
13 Dec 2018
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu
Lihong Li
Ziyang Tang
Dengyong Zhou
OffRL
71
354
0
29 Oct 2018
Model-Based Reinforcement Learning via Meta-Policy Optimization
Model-Based Reinforcement Learning via Meta-Policy Optimization
I. Clavera
Jonas Rothfuss
John Schulman
Yasuhiro Fujita
Tamim Asfour
Pieter Abbeel
59
225
0
14 Sep 2018
Human-level performance in first-person multiplayer games with
  population-based deep reinforcement learning
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning
Max Jaderberg
Wojciech M. Czarnecki
Iain Dunning
Luke Marris
Guy Lever
...
Joel Z Leibo
David Silver
Demis Hassabis
Koray Kavukcuoglu
T. Graepel
OffRL
58
717
0
03 Jul 2018
Deep Reinforcement Learning in a Handful of Trials using Probabilistic
  Dynamics Models
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Kurtland Chua
Roberto Calandra
R. McAllister
Sergey Levine
BDL
137
1,263
0
30 May 2018
Improving a Neural Semantic Parser by Counterfactual Learning from Human
  Bandit Feedback
Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback
Carolin (Haas) Lawrence
Stefan Riezler
OffRL
195
57
0
03 May 2018
Addressing Function Approximation Error in Actor-Critic Methods
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
H. V. Hoof
David Meger
OffRL
134
5,121
0
26 Feb 2018
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
174
8,236
0
04 Jan 2018
Deep Reinforcement Learning for Sepsis Treatment
Deep Reinforcement Learning for Sepsis Treatment
Aniruddh Raghu
Matthieu Komorowski
Imran Ahmed
Leo Anthony Celi
Peter Szolovits
Marzyeh Ghassemi
OffRL
41
172
0
27 Nov 2017
Counterfactual Learning for Machine Translation: Degeneracies and
  Solutions
Counterfactual Learning for Machine Translation: Degeneracies and Solutions
Carolin (Haas) Lawrence
Pratik Gajane
Stefan Riezler
OffRL
CML
35
8
0
23 Nov 2017
CARLA: An Open Urban Driving Simulator
CARLA: An Open Urban Driving Simulator
Alexey Dosovitskiy
G. Ros
Felipe Codevilla
Antonio M. López
V. Koltun
VLM
117
5,111
0
10 Nov 2017
Deep Reinforcement Learning that Matters
Deep Reinforcement Learning that Matters
Peter Henderson
Riashat Islam
Philip Bachman
Joelle Pineau
Doina Precup
David Meger
OffRL
94
1,940
0
19 Sep 2017
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with
  Model-Free Fine-Tuning
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Anusha Nagabandi
G. Kahn
R. Fearing
Sergey Levine
64
967
0
08 Aug 2017
Counterfactual Learning from Bandit Feedback under Deterministic
  Logging: A Case Study in Statistical Machine Translation
Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation
Carolin (Haas) Lawrence
Artem Sokolov
Stefan Riezler
OffRL
42
34
0
28 Jul 2017
Reinforcement Learning for Bandit Neural Machine Translation with
  Simulated Human Feedback
Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback
Khanh Nguyen
Hal Daumé
Jordan L. Boyd-Graber
46
137
0
24 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
324
129,831
0
12 Jun 2017
Constrained Policy Optimization
Constrained Policy Optimization
Joshua Achiam
David Held
Aviv Tamar
Pieter Abbeel
88
1,313
0
30 May 2017
Understanding Black-box Predictions via Influence Functions
Understanding Black-box Predictions via Influence Functions
Pang Wei Koh
Percy Liang
TDI
114
2,854
0
14 Mar 2017
Simple and Scalable Predictive Uncertainty Estimation using Deep
  Ensembles
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Balaji Lakshminarayanan
Alexander Pritzel
Charles Blundell
UQCV
BDL
451
5,748
0
05 Dec 2016
Quantile Reinforcement Learning
Quantile Reinforcement Learning
Hugo Gilbert
Paul Weng
OffRL
26
11
0
03 Nov 2016
Optimization Methods for Large-Scale Machine Learning
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
165
3,191
0
15 Jun 2016
OpenAI Gym
OpenAI Gym
Greg Brockman
Vicki Cheung
Ludwig Pettersson
Jonas Schneider
John Schulman
Jie Tang
Wojciech Zaremba
OffRL
ODL
164
5,048
0
05 Jun 2016
Stochastic Structured Prediction under Bandit Feedback
Stochastic Structured Prediction under Bandit Feedback
Artem Sokolov
Julia Kreutzer
Christopher Lo
Stefan Riezler
31
30
0
02 Jun 2016
TensorFlow: A system for large-scale machine learning
TensorFlow: A system for large-scale machine learning
Martín Abadi
P. Barham
Jianmin Chen
Zhiwen Chen
Andy Davis
...
Vijay Vasudevan
Pete Warden
Martin Wicke
Yuan Yu
Xiaoqiang Zhang
GNN
AI4CE
290
18,300
0
27 May 2016
End to End Learning for Self-Driving Cars
End to End Learning for Self-Driving Cars
Mariusz Bojarski
D. Testa
Daniel Dworakowski
Bernhard Firner
B. Flepp
...
Urs Muller
Jiakai Zhang
Xin Zhang
Jake Zhao
Karol Zieba
SSL
40
4,153
0
25 Apr 2016
Hierarchical Deep Reinforcement Learning: Integrating Temporal
  Abstraction and Intrinsic Motivation
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Tejas D. Kulkarni
Karthik Narasimhan
A. Saeedi
J. Tenenbaum
41
1,130
0
20 Apr 2016
Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
152
8,805
0
04 Feb 2016
Bandit Structured Prediction for Learning from Partial Feedback in
  Statistical Machine Translation
Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation
Artem Sokolov
Stefan Riezler
Tanguy Urvoy
27
22
0
18 Jan 2016
12
Next