ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.16621
  4. Cited By
A Reminder of its Brittleness: Language Reward Shaping May Hinder
  Learning for Instruction Following Agents

A Reminder of its Brittleness: Language Reward Shaping May Hinder Learning for Instruction Following Agents

26 May 2023
Sukai Huang
N. Lipovetzky
Trevor Cohn
ArXivPDFHTML

Papers citing "A Reminder of its Brittleness: Language Reward Shaping May Hinder Learning for Instruction Following Agents"

12 / 12 papers shown
Title
Learning Communication Policies for Different Follower Behaviors in a
  Collaborative Reference Game
Learning Communication Policies for Different Follower Behaviors in a Collaborative Reference Game
P. Sadler
Sherzod Hakimov
David Schlangen
21
1
0
07 Feb 2024
FoMo Rewards: Can we cast foundation models as reward functions?
FoMo Rewards: Can we cast foundation models as reward functions?
Ekdeep Singh Lubana
Johann Brehmer
P. D. Haan
Taco S. Cohen
OffRL
LRM
33
2
0
06 Dec 2023
DANLI: Deliberative Agent for Following Natural Language Instructions
DANLI: Deliberative Agent for Following Natural Language Instructions
Yichi Zhang
Jianing Yang
Jiayi Pan
Shane Storks
N. Devraj
Ziqiao Ma
Keunwoo Peter Yu
Yuwei Bao
J. Chai
LM&Ro
48
16
0
22 Oct 2022
Masked World Models for Visual Control
Masked World Models for Visual Control
Younggyo Seo
Danijar Hafner
Hao Liu
Fangchen Liu
Stephen James
Kimin Lee
Pieter Abbeel
OffRL
77
145
0
28 Jun 2022
Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning
Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning
Yinglun Xu
Qi Zeng
Gagandeep Singh
AAML
25
5
0
30 May 2022
Housekeep: Tidying Virtual Households using Commonsense Reasoning
Housekeep: Tidying Virtual Households using Commonsense Reasoning
Yash Kant
Arun Ramachandran
Sriram Yenamandra
Igor Gilitschenski
Dhruv Batra
Andrew Szot
Harsh Agrawal
LM&Ro
LRM
152
71
0
22 May 2022
Planning with Diffusion for Flexible Behavior Synthesis
Planning with Diffusion for Flexible Behavior Synthesis
Michael Janner
Yilun Du
J. Tenenbaum
Sergey Levine
DiffM
202
626
0
20 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Compositionality as Lexical Symmetry
Compositionality as Lexical Symmetry
Ekin Akyürek
Jacob Andreas
CoGe
42
8
0
30 Jan 2022
BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning
BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning
Lun Wang
Zaynah Javed
Xian Wu
Wenbo Guo
Xinyu Xing
D. Song
AAML
119
80
0
02 May 2021
Out of Order: How Important Is The Sequential Order of Words in a
  Sentence in Natural Language Understanding Tasks?
Out of Order: How Important Is The Sequential Order of Words in a Sentence in Natural Language Understanding Tasks?
Thang M. Pham
Trung Bui
Long Mai
Anh Totti Nguyen
207
122
0
30 Dec 2020
Challenges and Countermeasures for Adversarial Attacks on Deep
  Reinforcement Learning
Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning
Inaam Ilahi
Muhammad Usama
Junaid Qadir
M. Janjua
Ala I. Al-Fuqaha
D. Hoang
Dusit Niyato
AAML
55
129
0
27 Jan 2020
1