ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.01275
  4. Cited By
ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for
  Last-Iterate Convergence in Constrained MDPs

ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs

2 February 2023
Theodore H. Moskovitz
Brendan O'Donoghue
Vivek Veeriah
Sebastian Flennerhag
Satinder Singh
Tom Zahavy
ArXivPDFHTML

Papers citing "ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs"

4 / 4 papers shown
Title
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura
Tadashi Kozuno
Wataru Kumagai
Kenta Hoshino
Y. Hosoe
Kazumi Kasaura
Masashi Hamaya
Paavo Parmas
Yutaka Matsuo
72
0
0
29 Aug 2024
One-Shot Safety Alignment for Large Language Models via Optimal
  Dualization
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
Xinmeng Huang
Shuo Li
Edgar Dobriban
Osbert Bastani
Hamed Hassani
Dongsheng Ding
47
3
0
29 May 2024
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with
  Uniform PAC Guarantees
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees
Toshinori Kitamura
Tadashi Kozuno
Masahiro Kato
Yuki Ichihara
Soichiro Nishimori
Akiyoshi Sannai
Sho Sonoda
Wataru Kumagai
Yutaka Matsuo
37
2
0
31 Jan 2024
Confronting Reward Model Overoptimization with Constrained RLHF
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz
Aaditya K. Singh
DJ Strouse
T. Sandholm
Ruslan Salakhutdinov
Anca D. Dragan
Stephen Marcus McAleer
34
47
0
06 Oct 2023
1