ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.15821
  4. Cited By
Reinforcing Language Agents via Policy Optimization with Action
  Decomposition

Reinforcing Language Agents via Policy Optimization with Action Decomposition

23 May 2024
Muning Wen
Ziyu Wan
Weinan Zhang
Jun Wang
Ying Wen
ArXivPDFHTML

Papers citing "Reinforcing Language Agents via Policy Optimization with Action Decomposition"

10 / 10 papers shown
Title
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
Lang Feng
Weihao Tan
Zhiyi Lyu
Longtao Zheng
Haiyang Xu
M. Yan
Fei Huang
Bo An
10
0
0
01 May 2025
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and
  Hindsight Relabeling
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling
Loris Gaven
Clément Romac
Thomas Carta
Sylvain Lamprier
Olivier Sigaud
Pierre-Yves Oudeyer
LLMAG
OffRL
20
1
0
16 Oct 2024
Agentic Information Retrieval
Agentic Information Retrieval
Weinan Zhang
Junwei Liao
Ning Li
Kounianhua Du
Jianghao Lin
AIFin
41
2
0
13 Oct 2024
Hammer: Robust Function-Calling for On-Device Language Models via
  Function Masking
Hammer: Robust Function-Calling for On-Device Language Models via Function Masking
Qiqiang Lin
Muning Wen
Qiuying Peng
Guanyu Nie
Junwei Liao
...
Jiamu Zhou
Cheng Cheng
Yin Zhao
Jun Wang
Weinan Zhang
27
15
0
06 Oct 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
215
291
0
18 Jan 2024
Q-Transformer: Scalable Offline Reinforcement Learning via
  Autoregressive Q-Functions
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Yevgen Chebotar
Q. Vuong
A. Irpan
Karol Hausman
F. Xia
...
Brianna Zitkovich
Tomas Jackson
Kanishka Rao
Chelsea Finn
Sergey Levine
OffRL
104
81
0
18 Sep 2023
Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination
Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination
Yang Li
Shao Zhang
Jichen Sun
Wenhao Zhang
Yali Du
Ying Wen
Xinbing Wang
Wei Pan
11
13
0
05 Jun 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
208
2,413
0
06 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1