ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.07839
  4. Cited By
Text Generation by Learning from Demonstrations
v1v2 (latest)

Text Generation by Learning from Demonstrations

16 September 2020
Richard Yuanzhe Pang
He He
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Text Generation by Learning from Demonstrations"

46 / 46 papers shown
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Yongliang Wu
Y. Zhou
Zhou Ziheng
Yingzhe Peng
Xinyu Ye
Xinting Hu
Wenbo Zhu
Lu Qi
Ming-Hsuan Yang
Xu Yang
OffRLLRM
310
76
0
07 Aug 2025
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
Charles Arnal
Gaëtan Narozniak
Vivien A. Cabannes
Yunhao Tang
Julia Kempe
Rémi Munos
OffRL
328
19
0
25 Jun 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
577
44
0
18 Mar 2025
Sequence-level Large Language Model Training with Contrastive Preference Optimization
Sequence-level Large Language Model Training with Contrastive Preference OptimizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Zhili Feng
Dhananjay Ram
Cole Hawkins
Aditya Rawal
Jinman Zhao
Sheng Zha
450
2
0
23 Feb 2025
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference
  Optimization
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Yuchen Hu
Chen Chen
Siyin Wang
Eng Siong Chng
C. Zhang
268
15
0
02 Jul 2024
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language
  Model Critique in Text Generation
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation
Meng Cao
Lei Shu
Lei Yu
Yun Zhu
Nevan Wichers
Yinxiao Liu
Lei Meng
OffRLALM
383
22
0
14 Jan 2024
Successor Features for Efficient Multisubject Controlled Text Generation
Successor Features for Efficient Multisubject Controlled Text Generation
Mengyao Cao
Mehdi Fatemi
Jackie Chi Kit Cheung
Samira Shabanian
BDL
208
1
0
03 Nov 2023
Beyond MLE: Convex Learning for Text Generation
Beyond MLE: Convex Learning for Text GenerationNeural Information Processing Systems (NeurIPS), 2023
Chenze Shao
Zhengrui Ma
Min Zhang
Yang Feng
299
4
0
26 Oct 2023
Building Persona Consistent Dialogue Agents with Offline Reinforcement
  Learning
Building Persona Consistent Dialogue Agents with Offline Reinforcement LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ryan Shea
Zhou Yu
OffRL
364
13
0
16 Oct 2023
EMO: Earth Mover Distance Optimization for Auto-Regressive Language
  Modeling
EMO: Earth Mover Distance Optimization for Auto-Regressive Language ModelingInternational Conference on Learning Representations (ICLR), 2023
Siyu Ren
Zhiyong Wu
Kenny Q. Zhu
441
8
0
07 Oct 2023
Language Model Decoding as Direct Metrics Optimization
Language Model Decoding as Direct Metrics OptimizationInternational Conference on Learning Representations (ICLR), 2023
Haozhe Ji
Pei Ke
Hongning Wang
Shiyu Huang
392
8
0
02 Oct 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
595
27
0
28 Aug 2023
Prompt-Based Length Controlled Generation with Reinforcement Learning
Prompt-Based Length Controlled Generation with Reinforcement Learning
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
443
19
0
23 Aug 2023
Reinforced Self-Training (ReST) for Language Modeling
Reinforced Self-Training (ReST) for Language Modeling
Çağlar Gülçehre
T. Paine
S. Srinivasan
Ksenia Konyushkova
L. Weerts
...
Chenjie Gu
Wolfgang Macherey
Arnaud Doucet
Orhan Firat
Nando de Freitas
OffRL
512
423
0
17 Aug 2023
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence
  Generation
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence GenerationAAAI Conference on Artificial Intelligence (AAAI), 2023
Chenglong Wang
Hang Zhou
Yimin Hu
Yi Huo
Bei Li
Tongran Liu
Tong Xiao
Jingbo Zhu
295
13
0
04 Aug 2023
On the Effectiveness of Offline RL for Dialogue Response Generation
On the Effectiveness of Offline RL for Dialogue Response GenerationInternational Conference on Machine Learning (ICML), 2023
Paloma Sodhi
Felix Wu
Ethan R. Elenberg
Kilian Q. Weinberger
Ryan T. McDonald
OffRL
242
6
0
23 Jul 2023
On the Efficacy of Sampling Adapters
On the Efficacy of Sampling AdaptersAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Clara Meister
Tiago Pimentel
Luca Malagutti
Ethan Gotlieb Wilcox
Robert Bamler
418
19
0
07 Jul 2023
Semi-Offline Reinforcement Learning for Optimized Text Generation
Semi-Offline Reinforcement Learning for Optimized Text GenerationInternational Conference on Machine Learning (ICML), 2023
Changyu Chen
Xiting Wang
Yiqiao Jin
Victor Ye Dong
Li Dong
Jie Cao
Yi Liu
Rui Yan
OffRL
257
18
0
16 Jun 2023
MiniLLM: Knowledge Distillation of Large Language Models
MiniLLM: Knowledge Distillation of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Yuxian Gu
Li Dong
Furu Wei
Shiyu Huang
ALM
735
94
0
14 Jun 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Preference-grounded Token-level Guidance for Language Model Fine-tuningNeural Information Processing Systems (NeurIPS), 2023
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
573
33
0
01 Jun 2023
Zero-shot Visual Question Answering with Language Model Feedback
Zero-shot Visual Question Answering with Language Model FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yifan Du
Junyi Li
Tianyi Tang
Wayne Xin Zhao
Ji-Rong Wen
383
27
0
26 May 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for
  Language Models
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
465
16
0
24 May 2023
On Learning to Summarize with Large Language Models as References
On Learning to Summarize with Large Language Models as ReferencesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
568
131
0
23 May 2023
Think Outside the Code: Brainstorming Boosts Large Language Models in
  Code Generation
Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation
Xinyu Li
Jiang-Tian Xue
Zheng Xie
Ming Li
LRM
249
40
0
18 May 2023
Self-Edit: Fault-Aware Code Editor for Code Generation
Self-Edit: Fault-Aware Code Editor for Code GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Kechi Zhang
Zhuo Li
Jia Li
Ge Li
Zhi Jin
625
164
0
06 May 2023
GEMINI: Controlling the Sentence-level Writing Style for Abstractive
  Text Summarization
GEMINI: Controlling the Sentence-level Writing Style for Abstractive Text SummarizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Guangsheng Bao
Zebin Ou
Yue Zhang
255
11
0
07 Apr 2023
SPEC: Summary Preference Decomposition for Low-Resource Abstractive
  Summarization
SPEC: Summary Preference Decomposition for Low-Resource Abstractive SummarizationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yi-Syuan Chen
Yun-Zhu Song
Hong-Han Shuai
230
6
0
24 Mar 2023
Tailoring Language Generation Models under Total Variation Distance
Tailoring Language Generation Models under Total Variation DistanceInternational Conference on Learning Representations (ICLR), 2023
Haozhe Ji
Pei Ke
Zhipeng Hu
Rongsheng Zhang
Shiyu Huang
311
29
0
26 Feb 2023
Learning with Rejection for Abstractive Text Summarization
Learning with Rejection for Abstractive Text SummarizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Mengyao Cao
Yue Dong
Jingyi He
Jackie C.K. Cheung
228
9
0
16 Feb 2023
Dynamic Scheduled Sampling with Imitation Loss for Neural Text
  Generation
Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation
Xiang Lin
Prathyusha Jwalapuram
Shafiq Joty
DiffM
248
0
0
31 Jan 2023
Weakly-Supervised Questions for Zero-Shot Relation Extraction
Weakly-Supervised Questions for Zero-Shot Relation ExtractionConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Saeed Najafi
Alona Fyshe
289
11
0
21 Jan 2023
Revisiting the Gold Standard: Grounding Summarization Evaluation with
  Robust Human Evaluation
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Shafiq Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
403
164
0
15 Dec 2022
KRLS: Improving End-to-End Response Generation in Task Oriented Dialog
  with Reinforced Keywords Learning
KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xiao Yu
Qingyang Wu
Kun Qian
Zhou Yu
OffRL
333
13
0
30 Nov 2022
GoSum: Extractive Summarization of Long Documents by Reinforcement
  Learning and Graph Organized discourse state
GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse stateKnowledge and Information Systems (KAIS), 2022
Junyi Bian
Xiaodi Huang
Hong Zhou
Shanfeng Zhu
336
14
0
18 Nov 2022
Reward Gaming in Conditional Text Generation
Reward Gaming in Conditional Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Richard Yuanzhe Pang
Vishakh Padmakumar
Thibault Sellam
Ankur P. Parikh
He He
457
30
0
16 Nov 2022
Teacher Forcing Recovers Reward Functions for Text Generation
Teacher Forcing Recovers Reward Functions for Text GenerationNeural Information Processing Systems (NeurIPS), 2022
Yongchang Hao
Yuxin Liu
Lili Mou
OffRL
510
20
0
17 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
686
289
0
03 Oct 2022
Text Summarization with Oracle Expectation
Text Summarization with Oracle ExpectationInternational Conference on Learning Representations (ICLR), 2022
Yumo Xu
Mirella Lapata
VLM
208
4
0
26 Sep 2022
MAD for Robust Reinforcement Learning in Machine Translation
MAD for Robust Reinforcement Learning in Machine Translation
Domenic Donato
Lei Yu
Wang Ling
Chris Dyer
MoE
284
8
0
18 Jul 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Coarse-to-Fine Vision-Language Pre-training with Fusion in the BackboneNeural Information Processing Systems (NeurIPS), 2022
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLMObjD
343
159
0
15 Jun 2022
Offline RL for Natural Language Generation with Implicit Language Q
  Learning
Offline RL for Natural Language Generation with Implicit Language Q LearningInternational Conference on Learning Representations (ICLR), 2022
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
515
143
0
05 Jun 2022
Knowledge Infused Decoding
Knowledge Infused DecodingInternational Conference on Learning Representations (ICLR), 2022
Ruibo Liu
Guoqing Zheng
Shashank Gupta
Radhika Gaonkar
Chongyang Gao
Soroush Vosoughi
Milad Shokouhi
Ahmed Hassan Awadallah
KELM
269
18
0
06 Apr 2022
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yixin Liu
Pengfei Liu
Dragomir R. Radev
Graham Neubig
465
325
0
31 Mar 2022
Amortized Noisy Channel Neural Machine Translation
Amortized Noisy Channel Neural Machine Translation
Richard Yuanzhe Pang
He He
Dong Wang
255
5
0
16 Dec 2021
Improving Scheduled Sampling with Elastic Weight Consolidation for
  Neural Machine Translation
Improving Scheduled Sampling with Elastic Weight Consolidation for Neural Machine Translation
Michalis Korakakis
Andreas Vlachos
CLL
267
3
0
13 Sep 2021
AgreeSum: Agreement-Oriented Multi-Document Summarization
AgreeSum: Agreement-Oriented Multi-Document SummarizationFindings (Findings), 2021
Richard Yuanzhe Pang
Á. Lelkes
Vinh Q. Tran
Cong Yu
256
18
0
04 Jun 2021
1
Page 1 of 1