ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXivPDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 7,062 papers shown
Title
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Montaser Mohammedalamen
Michael Bowling
34
0
0
13 May 2025
Reinforcement Learning-based Fault-Tolerant Control for Quadrotor with Online Transformer Adaptation
Reinforcement Learning-based Fault-Tolerant Control for Quadrotor with Online Transformer Adaptation
Dohyun Kim
Jayden Dongwoo Lee
Hyochoong Bang
Jungho Bae
41
0
0
13 May 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush K. Rai
Kyle Min
Tarun Krishna
Feiyan Hu
Alan F. Smeaton
Noel E. O'Connor
VGen
36
0
0
13 May 2025
Deep reinforcement learning-based longitudinal control strategy for automated vehicles at signalised intersections
Deep reinforcement learning-based longitudinal control strategy for automated vehicles at signalised intersections
Pankaj Kumar
Aditya Mishra
Pranamesh Chakraborty
Subrahmanya Swamy Peruru
37
0
0
13 May 2025
HuB: Learning Extreme Humanoid Balance
HuB: Learning Extreme Humanoid Balance
Tong Zhang
Boyuan Zheng
Ruiqian Nai
Yingdong Hu
Yen-Jen Wang
...
Fanqi Lin
Jiongye Li
Chuye Hong
Koushil Sreenath
Yang Gao
33
0
0
12 May 2025
Measuring General Intelligence with Generated Games
Measuring General Intelligence with Generated Games
Vivek Verma
David Huang
William Chen
Dan Klein
Nicholas Tomlin
ReLM
ELM
LM&MA
LRM
61
1
0
12 May 2025
H$^{\mathbf{3}}$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
H3^{\mathbf{3}}3DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
Yiyang Lu
Yufeng Tian
Zhecheng Yuan
Xinyu Wang
Pu Hua
Zhengrong Xue
Huazhe Xu
36
0
0
12 May 2025
Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models
Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models
Weiyi Wu
Xinwen Xu
Chongyang Gao
Xingjian Diao
Siting Li
Lucas A. Salas
Jiang Gui
28
0
0
12 May 2025
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Rei Higuchi
Taiji Suzuki
36
0
0
12 May 2025
Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
Ziyang Huang
Xiaowei Yuan
Yiming Ju
Jun Zhao
Kang Liu
RALM
KELM
33
1
0
12 May 2025
Average-Reward Maximum Entropy Reinforcement Learning for Global Policy in Double Pendulum Tasks
Average-Reward Maximum Entropy Reinforcement Learning for Global Policy in Double Pendulum Tasks
Jean Seong Bjorn Choe
Bumkyu Choi
Jong-kook Kim
31
0
0
12 May 2025
SEM: Reinforcement Learning for Search-Efficient Large Language Models
SEM: Reinforcement Learning for Search-Efficient Large Language Models
Zeyang Sha
Shiwen Cui
Weiqiang Wang
KELM
OffRL
LRM
36
0
0
12 May 2025
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
Junjie Ye
Caishuang Huang
Zhaoyu Chen
Wenjie Fu
Chenyuan Yang
...
Tao Gui
Qi Zhang
Zhongchao Shi
Jianping Fan
Xuanjing Huang
ALM
54
0
0
12 May 2025
Must Read: A Systematic Survey of Computational Persuasion
Must Read: A Systematic Survey of Computational Persuasion
Nimet Beyza Bozdag
Shuhaib Mehri
Xiaocheng Yang
Hyeonjeong Ha
Zirui Cheng
Esin Durmus
Jiaxuan You
Heng Ji
Gokhan Tur
Dilek Hakkani-Tur
60
0
0
12 May 2025
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai
Haotian Xu
X. Wu
Weinong Wang
Yingying Zhang
Wenqiang Zhang
ReLM
LRM
51
0
0
12 May 2025
DanceGRPO: Unleashing GRPO on Visual Generation
DanceGRPO: Unleashing GRPO on Visual Generation
Zeyue Xue
Jie Wu
Yu Gao
Fangyuan Kong
Lingting Zhu
...
Zhiheng Liu
Wei Liu
Qiushan Guo
Weilin Huang
Ping Luo
EGVM
VGen
57
1
0
12 May 2025
FACET: Force-Adaptive Control via Impedance Reference Tracking for Legged Robots
FACET: Force-Adaptive Control via Impedance Reference Tracking for Legged Robots
Botian Xu
Haoyang Weng
Qingzhou Lu
Yang Gao
Huazhe Xu
34
0
0
11 May 2025
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real
Prithwish Dan
Kushal Kedia
Angela Chao
Edward Weiyi Duan
Maximus Adrian Pace
Wei-Chiu Ma
Sanjiban Choudhury
34
0
0
11 May 2025
Towards Human-Centric Autonomous Driving: A Fast-Slow Architecture Integrating Large Language Model Guidance with Reinforcement Learning
Towards Human-Centric Autonomous Driving: A Fast-Slow Architecture Integrating Large Language Model Guidance with Reinforcement Learning
Chengkai Xu
Jiaqi Liu
Yicheng Guo
Yanzhe Zhang
Peng Hang
Jian Sun
36
0
0
11 May 2025
LineFlow: A Framework to Learn Active Control of Production Lines
LineFlow: A Framework to Learn Active Control of Production Lines
Kai Müller
Martin Wenzel
Tobias Windisch
AI4CE
26
0
0
10 May 2025
FALCON: Learning Force-Adaptive Humanoid Loco-Manipulation
FALCON: Learning Force-Adaptive Humanoid Loco-Manipulation
Yujie Zhang
Yifu Yuan
Prajwal Gurunath
Tairan He
Shayegan Omidshafiei
Ali-akbar Agha-mohammadi
Marcell Vazquez-Chanlatte
Liam Pedersen
Guanya Shi
38
0
0
10 May 2025
References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
Doyoung Kim
Youngjun Lee
Joeun Kim
Jihwan Bang
Hwanjun Song
Susik Yoon
Jae-Gil Lee
38
0
0
10 May 2025
JAEGER: Dual-Level Humanoid Whole-Body Controller
JAEGER: Dual-Level Humanoid Whole-Body Controller
Ziluo Ding
Haobin Jiang
Yuxuan Wang
Zhenguo Sun
Yu Zhang
Xiaojie Niu
M. Yang
Weishuai Zeng
Xinrun Xu
Zongqing Lu
36
0
0
10 May 2025
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach
Minting Pan
Yitao Zheng
Jiajian Li
Yunbo Wang
Xiaokang Yang
OffRL
53
0
0
10 May 2025
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback
Aniruddha Roy
Pretam Ray
Abhilash Nandy
Somak Aditya
Pawan Goyal
ALM
34
0
0
10 May 2025
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
Yuki Kadokawa
Jonas Frey
Takahiro Miki
Takamitsu Matsubara
Marco Hutter
36
0
0
09 May 2025
Let Humanoids Hike! Integrative Skill Development on Complex Trails
Let Humanoids Hike! Integrative Skill Development on Complex Trails
Kwan-Yee Lin
Stella X.Yu
41
0
0
09 May 2025
Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models
Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models
Lennart Stöpler
Rufat Asadli
Mitja Nikolaus
Ryan Cotterell
Alex Warstadt
LRM
54
0
0
09 May 2025
VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction
VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction
Noah Frahm
Dongxu Zhao
Andrea Dunn Beltran
Ron Alterovitz
Jan-Michael Frahm
Junier Oliva
Roni Sengupta
265
0
0
09 May 2025
Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Representation Learning
Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Representation Learning
Hang Gao
Chenhao Zhang
Tie Wang
Junsuo Zhao
Fengge Wu
Changwen Zheng
Huaping Liu
LRM
39
0
0
09 May 2025
Flow-GRPO: Training Flow Matching Models via Online RL
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu
Gongye Liu
Jiajun Liang
Yongqian Li
Jiaheng Liu
Xinyu Wang
Pengfei Wan
Di Zhang
Wanli Ouyang
AI4CE
78
0
0
08 May 2025
Multi-agent Embodied AI: Advances and Future Directions
Multi-agent Embodied AI: Advances and Future Directions
Zhaohan Feng
Ruiqi Xue
Lei Yuan
Yang Yu
Ning Ding
M. Liu
Bingzhao Gao
Jian Sun
Gang Wang
AI4CE
63
1
0
08 May 2025
GFlowNets for Active Learning Based Resource Allocation in Next Generation Wireless Networks
GFlowNets for Active Learning Based Resource Allocation in Next Generation Wireless Networks
Charbel Bou Chaaya
M. Bennis
50
0
0
08 May 2025
Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation
Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation
Zechu Li
Yufeng Jin
Daniel Felipe Ordoñez Apraez
Claudio Semini
Puze Liu
Georgia Chalvatzaki
264
0
0
08 May 2025
Scalable Chain of Thoughts via Elastic Reasoning
Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu
Hanze Dong
Lei Wang
Doyen Sahoo
Junnan Li
Caiming Xiong
OffRL
LRM
62
3
0
08 May 2025
Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs
Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs
Zijian An
Lifeng Zhou
36
0
0
08 May 2025
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
Zhuocheng Gong
Jian Guan
Wei Wu
Huishuai Zhang
Dongyan Zhao
72
1
0
08 May 2025
LLAMAPIE: Proactive In-Ear Conversation Assistants
LLAMAPIE: Proactive In-Ear Conversation Assistants
Tuochao Chen
Nicholas Batchelder
Alisa Liu
Noah A. Smith
Shyamnath Gollakota
235
0
0
07 May 2025
Optimization of Infectious Disease Intervention Measures Based on Reinforcement Learning - Empirical analysis based on UK COVID-19 epidemic data
Optimization of Infectious Disease Intervention Measures Based on Reinforcement Learning - Empirical analysis based on UK COVID-19 epidemic data
Baida Zhang
Yakai Chen
Huichun Li
Zhenghu Zu
34
0
0
07 May 2025
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
Zhenghao Xing
Xiaowei Hu
Chi-Wing Fu
Wei Wang
Jifeng Dai
Pheng-Ann Heng
MLLM
OffRL
VLM
LRM
60
0
0
07 May 2025
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Hao Sun
Zile Qiao
Jiayan Guo
Xuanbo Fan
Yingyan Hou
Yong Jiang
Pengjun Xie
Yan Zhang
Fei Huang
Jingren Zhou
OffRL
71
5
0
07 May 2025
ARDNS-FN-Quantum: A Quantum-Enhanced Reinforcement Learning Framework with Cognitive-Inspired Adaptive Exploration for Dynamic Environments
ARDNS-FN-Quantum: A Quantum-Enhanced Reinforcement Learning Framework with Cognitive-Inspired Adaptive Exploration for Dynamic Environments
Umberto Gonçalves de Sousa
29
0
0
07 May 2025
Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization
Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization
Wenjun Cao
AAML
47
0
0
07 May 2025
Large Language Models are Autonomous Cyber Defenders
Large Language Models are Autonomous Cyber Defenders
Sebastián R. Castro
Roberto Campbell
Nancy Lau
Octavio Villalobos
Jiaqi Duan
Alvaro A. Cardenas
LLMAG
50
0
0
07 May 2025
On-Device LLM for Context-Aware Wi-Fi Roaming
On-Device LLM for Context-Aware Wi-Fi Roaming
Ju-Hyung Lee
Yanqing Lu
Klaus Doppler
35
0
0
07 May 2025
DYSTIL: Dynamic Strategy Induction with Large Language Models for Reinforcement Learning
DYSTIL: Dynamic Strategy Induction with Large Language Models for Reinforcement Learning
Borui Wang
Kathleen McKeown
Rex Ying
OffRL
44
0
0
06 May 2025
Visual Imitation Enables Contextual Humanoid Control
Visual Imitation Enables Contextual Humanoid Control
Arthur Allshire
Hongsuk Choi
Junyi Zhang
David McAllister
Anthony Zhang
Chung Min Kim
Trevor Darrell
Pieter Abbeel
Jitendra Malik
Angjoo Kanazawa
LM&Ro
245
0
0
06 May 2025
Sustainable Smart Farm Networks: Enhancing Resilience and Efficiency with Decision Theory-Guided Deep Reinforcement Learning
Sustainable Smart Farm Networks: Enhancing Resilience and Efficiency with Decision Theory-Guided Deep Reinforcement Learning
Dian Chen
Zelin Wan
D. Ha
Jin-Hee Cho
22
0
0
06 May 2025
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Yibin Wang
Zhimin Li
Yuhang Zang
Chunyu Wang
Qinglin Lu
Cheng Jin
Jinqiao Wang
LRM
53
2
0
06 May 2025
RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation
RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation
Keyu Chen
Wenchao Sun
Hao Cheng
Sifa Zheng
52
0
0
06 May 2025
Previous
12345...140141142
Next