ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.00079
  4. Cited By
You May Not Need Ratio Clipping in PPO

You May Not Need Ratio Clipping in PPO

31 January 2022
Mingfei Sun
Vitaly Kurin
Guoqing Liu
Sam Devlin
Tao Qin
Katja Hofmann
Shimon Whiteson
ArXiv (abs)PDFHTML

Papers citing "You May Not Need Ratio Clipping in PPO"

13 / 13 papers shown
It's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RL
It's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RL
Madeleine Dwyer
Adam Sobey
Adriane Chapman
73
0
0
25 Sep 2025
Improving Value Estimation Critically Enhances Vanilla Policy Gradient
Improving Value Estimation Critically Enhances Vanilla Policy Gradient
Tao Wang
Ruipeng Zhang
Sicun Gao
OffRL
196
2
0
25 May 2025
No Representation, No Trust: Connecting Representation, Collapse, and
  Trust Issues in PPO
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
Skander Moalla
Andrea Miele
Razvan Pascanu
Çağlar Gülçehre
317
17
0
01 May 2024
RL-X: A Deep Reinforcement Learning Library (not only) for RoboCup
RL-X: A Deep Reinforcement Learning Library (not only) for RoboCup
Nico Bohlinger
Klaus Dorer
188
5
0
20 Oct 2023
Absolute Policy Optimization
Absolute Policy Optimization
Weiye Zhao
Feihan Li
Yifan Sun
Rui Chen
Tianhao Wei
Changliu Liu
430
5
0
20 Oct 2023
Universal Morphology Control via Contextual Modulation
Universal Morphology Control via Contextual ModulationInternational Conference on Machine Learning (ICML), 2023
Zheng Xiong
Jacob Beck
Shimon Whiteson
330
23
0
22 Feb 2023
Trust-Region-Free Policy Optimization for Stochastic Policies
Trust-Region-Free Policy Optimization for Stochastic Policies
Mingfei Sun
Benjamin Ellis
Anuj Mahajan
Sam Devlin
Katja Hofmann
Shimon Whiteson
236
3
0
15 Feb 2023
Sample Dropout: A Simple yet Effective Variance Reduction Technique in
  Deep Policy Optimization
Sample Dropout: A Simple yet Effective Variance Reduction Technique in Deep Policy Optimization
Zichuan Lin
Xiapeng Wu
Mingfei Sun
Deheng Ye
Qiang Fu
Wei Yang
Wei Liu
224
3
0
05 Feb 2023
Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement
  Learning
Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning
Haoxuan Pan
Deheng Ye
Xiaoming Duan
Qiang Fu
Wei Yang
Jianping He
Mingfei Sun
OffRL
207
2
0
20 Jan 2023
SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement
  Learning
SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2022
Benjamin Ellis
Jonathan Cook
S. Moalla
Mikayel Samvelyan
Mingfei Sun
Anuj Mahajan
Jakob N. Foerster
Shimon Whiteson
441
132
0
14 Dec 2022
Inspector: Pixel-Based Automated Game Testing via Exploration,
  Detection, and Investigation
Inspector: Pixel-Based Automated Game Testing via Exploration, Detection, and Investigation
Guoqing Liu
Mengzhang Cai
Li Zhao
Tao Qin
Adrian Brown
Jimmy Bischoff
Tie-Yan Liu
196
12
0
18 Jul 2022
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still
  Insufficient according to an Off-Policy Measure
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy MeasureAAAI Conference on Artificial Intelligence (AAAI), 2022
Xing Chen
Dongcui Diao
Hechang Chen
Hengshuai Yao
Haiyin Piao
Zhixiao Sun
Zhiwei Yang
Randy Goebel
Bei Jiang
Yi-Ju Chang
OffRL
414
23
0
20 May 2022
Trust Region Bounds for Decentralized PPO Under Non-stationarity
Trust Region Bounds for Decentralized PPO Under Non-stationarityAdaptive Agents and Multi-Agent Systems (AAMAS), 2022
Mingfei Sun
Sam Devlin
Jacob Beck
Katja Hofmann
Shimon Whiteson
308
13
0
31 Jan 2022
1