ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.02796
  4. Cited By
Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models
  with KL-control

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

9 November 2016
Natasha Jaques
S. Gu
Dzmitry Bahdanau
José Miguel Hernández-Lobato
Richard E. Turner
Douglas Eck
ArXivPDFHTML

Papers citing "Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control"

33 / 33 papers shown
Title
Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval
Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval
Kidist Amde Mekonnen
Yubao Tang
Maarten de Rijke
60
0
0
07 Apr 2025
Prompt Optimization with Logged Bandit Data
Prompt Optimization with Logged Bandit Data
Haruka Kiyohara
Daniel Yiming Cao
Yuta Saito
Thorsten Joachims
64
0
0
03 Apr 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
80
0
0
26 Feb 2025
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Yannis Flet-Berliac
Nathan Grinsztajn
Florian Strub
Bill Wu
Eugene Choi
...
Arash Ahmadian
Yash Chandak
M. G. Azar
Olivier Pietquin
Matthieu Geist
OffRL
62
4
0
17 Jan 2025
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Yuancheng Xu
Udari Madhushani Sehwag
Alec Koppel
Sicheng Zhu
Bang An
Furong Huang
Sumitra Ganesh
55
6
0
10 Oct 2024
RL, but don't do anything I wouldn't do
RL, but don't do anything I wouldn't do
Michael K. Cohen
Marcus Hutter
Yoshua Bengio
Stuart J. Russell
OffRL
33
2
0
08 Oct 2024
Moral Alignment for LLM Agents
Moral Alignment for LLM Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
45
0
0
02 Oct 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
56
14
0
24 Jun 2024
Mitigating Open-Vocabulary Caption Hallucinations
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLM
VLM
20
6
0
06 Dec 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
46
10
0
28 Aug 2023
Offline RL for Natural Language Generation with Implicit Language Q
  Learning
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
125
101
0
05 Jun 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning
  Language Models with no Catastrophic Forgetting
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
17
50
0
01 Jun 2022
X2T: Training an X-to-Text Typing Interface with Online Learning from
  User Feedback
X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback
Jensen Gao
S. Reddy
Glen Berseth
Nicholas Hardy
N. Natraj
K. Ganguly
Anca Dragan
Sergey Levine
15
10
0
04 Mar 2022
Offline Reinforcement Learning with Fisher Divergence Critic
  Regularization
Offline Reinforcement Learning with Fisher Divergence Critic Regularization
Ilya Kostrikov
Jonathan Tompson
Rob Fergus
Ofir Nachum
OffRL
27
300
0
14 Mar 2021
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
14
1,966
0
02 Sep 2020
Keep Doing What Worked: Behavioral Modelling Priors for Offline
  Reinforcement Learning
Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning
Noah Y. Siegel
Jost Tobias Springenberg
Felix Berkenkamp
A. Abdolmaleki
Michael Neunert
Thomas Lampe
Roland Hafner
Nicolas Heess
Martin Riedmiller
OffRL
14
282
0
19 Feb 2020
RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement
  Learning
RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning
Nan Jiang
Sheng Jin
Z. Duan
Changshui Zhang
OffRL
45
49
0
08 Feb 2020
Contrastive Multi-document Question Generation
Contrastive Multi-document Question Generation
W. Cho
Yizhe Zhang
Sudha Rao
Asli Celikyilmaz
Chenyan Xiong
Jianfeng Gao
Mengdi Wang
Bill Dolan
SyDa
17
28
0
08 Nov 2019
Benchmarking Batch Deep Reinforcement Learning Algorithms
Benchmarking Batch Deep Reinforcement Learning Algorithms
Shih-Han Chou
Wen-Yen Chang
W. Hsu
Jianlong Fu
OffRL
11
181
0
03 Oct 2019
Deep Reinforcement Learning For Modeling Chit-Chat Dialog With Discrete
  Attributes
Deep Reinforcement Learning For Modeling Chit-Chat Dialog With Discrete Attributes
Chinnadhurai Sankar
Sujith Ravi
OffRL
21
33
0
05 Jul 2019
LiveSketch: Query Perturbations for Guided Sketch-based Visual Search
LiveSketch: Query Perturbations for Guided Sketch-based Visual Search
John Collomosse
Tu Bui
Hailin Jin
19
56
0
14 Apr 2019
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation
  Models
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
Daniil Polykovskiy
Alexander Zhebrak
Benjamín Sánchez-Lengeling
Sergey Golovanov
Oktai Tatanov
...
Simon Johansson
Hongming Chen
Sergey I. Nikolenko
Alán Aspuru-Guzik
Alex Zhavoronkov
ELM
191
633
0
29 Nov 2018
GuacaMol: Benchmarking Models for De Novo Molecular Design
GuacaMol: Benchmarking Models for De Novo Molecular Design
Nathan Brown
Marco Fiscato
Marwin H. S. Segler
Alain C. Vaucher
ELM
36
691
0
22 Nov 2018
Harmonic Recomposition using Conditional Autoregressive Modeling
Harmonic Recomposition using Conditional Autoregressive Modeling
Kyle Kastner
Rithesh Kumar
Tim Cooijmans
Aaron Courville
22
0
0
18 Nov 2018
Latent Molecular Optimization for Targeted Therapeutic Design
Latent Molecular Optimization for Targeted Therapeutic Design
Tristan Aumentado-Armstrong
13
41
0
05 Sep 2018
DeepJ: Style-Specific Music Generation
DeepJ: Style-Specific Music Generation
H. H. Mao
Taylor Shin
G. Cottrell
22
92
0
03 Jan 2018
Music Generation by Deep Learning - Challenges and Directions
Music Generation by Deep Learning - Challenges and Directions
Jean-Pierre Briot
F. Pachet
MGen
35
126
0
09 Dec 2017
Deep Reinforcement Learning for De-Novo Drug Design
Deep Reinforcement Learning for De-Novo Drug Design
Mariya Popova
Olexandr Isayev
Alexander Tropsha
14
1,003
0
29 Nov 2017
MuseGAN: Multi-track Sequential Generative Adversarial Networks for
  Symbolic Music Generation and Accompaniment
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment
Hao-Wen Dong
Wen-Yi Hsiao
Li-Chia Yang
Yi-Hsuan Yang
MGen
GAN
20
535
0
19 Sep 2017
Constrained Bayesian Optimization for Automatic Chemical Design
Constrained Bayesian Optimization for Automatic Chemical Design
Ryan-Rhys Griffiths
José Miguel Hernández-Lobato
BDL
39
76
0
16 Sep 2017
Deep Learning Techniques for Music Generation -- A Survey
Deep Learning Techniques for Music Generation -- A Survey
Jean-Pierre Briot
Gaëtan Hadjeres
F. Pachet
MGen
32
297
0
05 Sep 2017
Automatic chemical design using a data-driven continuous representation
  of molecules
Automatic chemical design using a data-driven continuous representation of molecules
Rafael Gómez-Bombarelli
Jennifer N. Wei
D. Duvenaud
José Miguel Hernández-Lobato
Benjamín Sánchez-Lengeling
Dennis Sheberla
J. Aguilera-Iparraguirre
Timothy D. Hirzel
Ryan P. Adams
Alán Aspuru-Guzik
3DV
17
2,885
0
07 Oct 2016
Deep Reinforcement Learning for Dialogue Generation
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
214
1,326
0
05 Jun 2016
1