ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.01241
  4. Cited By
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

3 October 2022
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
    OffRL
ArXivPDFHTML

Papers citing "Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization"

50 / 202 papers shown
Title
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement
  Learning and Large Language Models
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
M. Pternea
Prerna Singh
Abir Chakraborty
Y. Oruganti
M. Milletarí
Sayli Bapat
Kebei Jiang
OffRL
6
6
0
02 Feb 2024
Plan-Grounded Large Language Models for Dual Goal Conversational
  Settings
Plan-Grounded Large Language Models for Dual Goal Conversational Settings
Diogo Glória-Silva
Rafael Ferreira
Diogo Tavares
David Semedo
João Magalhães
LLMAG
18
4
0
01 Feb 2024
Dense Reward for Free in Reinforcement Learning from Human Feedback
Dense Reward for Free in Reinforcement Learning from Human Feedback
Alex J. Chan
Hao Sun
Samuel Holt
M. Schaar
6
30
0
01 Feb 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
21
22
0
29 Jan 2024
SEER: Facilitating Structured Reasoning and Explanation via
  Reinforcement Learning
SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning
Guoxin Chen
Kexin Tang
Chao Yang
Fuying Ye
Yu Qiao
Yiming Qian
LRM
8
3
0
24 Jan 2024
Finding a Needle in the Adversarial Haystack: A Targeted Paraphrasing
  Approach For Uncovering Edge Cases with Minimal Distribution Distortion
Finding a Needle in the Adversarial Haystack: A Targeted Paraphrasing Approach For Uncovering Edge Cases with Minimal Distribution Distortion
Aly M. Kassem
Sherif Saad
AAML
15
1
0
21 Jan 2024
Aligning Large Language Models with Counterfactual DPO
Aligning Large Language Models with Counterfactual DPO
Bradley Butcher
ALM
18
1
0
17 Jan 2024
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
Dennis Ulmer
Elman Mansimov
Kaixiang Lin
Justin Sun
Xibin Gao
Yi Zhang
LLMAG
17
27
0
10 Jan 2024
Uncertainty-Penalized Reinforcement Learning from Human Feedback with
  Diverse Reward LoRA Ensembles
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
Yuanzhao Zhai
Han Zhang
Yu Lei
Yue Yu
Kele Xu
Dawei Feng
Bo Ding
Huaimin Wang
AI4CE
55
31
0
30 Dec 2023
Preference as Reward, Maximum Preference Optimization with Importance
  Sampling
Preference as Reward, Maximum Preference Optimization with Importance Sampling
Zaifan Jiang
Xing Huang
Chao Wei
14
1
0
27 Dec 2023
Aligning Large Language Models with Human Preferences through
  Representation Engineering
Aligning Large Language Models with Human Preferences through Representation Engineering
Wenhao Liu
Xiaohua Wang
Muling Wu
Tianlong Li
Changze Lv
Zixuan Ling
Jianhao Zhu
Cenyuan Zhang
Xiaoqing Zheng
Xuanjing Huang
11
28
0
26 Dec 2023
Reasons to Reject? Aligning Language Models with Judgments
Reasons to Reject? Aligning Language Models with Judgments
Weiwen Xu
Deng Cai
Zhisong Zhang
Wai Lam
Shuming Shi
ALM
10
13
0
22 Dec 2023
OpenRL: A Unified Reinforcement Learning Framework
OpenRL: A Unified Reinforcement Learning Framework
Shiyu Huang
Wentse Chen
Yiwen Sun
Fuqing Bie
Weijuan Tu
27
3
0
20 Dec 2023
Language Model Alignment with Elastic Reset
Language Model Alignment with Elastic Reset
Michael Noukhovitch
Samuel Lavoie
Florian Strub
Aaron Courville
KELM
87
25
0
06 Dec 2023
Axiomatic Preference Modeling for Longform Question Answering
Axiomatic Preference Modeling for Longform Question Answering
Corby Rosset
Guoqing Zheng
Victor C. Dibia
Ahmed Hassan Awadallah
Paul Bennett
SyDa
11
3
0
02 Dec 2023
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
  Models
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai
Isadora White
Charles Burton Snell
Charles Sun
Joey Hong
Yuexiang Zhai
Kelvin Xu
Sergey Levine
LLMAG
OffRL
LRM
20
15
0
30 Nov 2023
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response
  Generation
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation
Yikun Wang
Rui Zheng
Haoming Li
Qi Zhang
Tao Gui
Fei Liu
OffRL
14
3
0
15 Nov 2023
Value FULCRA: Mapping Large Language Models to the Multidimensional
  Spectrum of Basic Human Values
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yifan Gong
Xing Xie
17
20
0
15 Nov 2023
Fine-tuning Language Models for Factuality
Fine-tuning Language Models for Factuality
Katherine Tian
Eric Mitchell
Huaxiu Yao
Christopher D. Manning
Chelsea Finn
KELM
HILM
SyDa
12
99
0
14 Nov 2023
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Joey Hong
Sergey Levine
Anca Dragan
OffRL
LLMAG
34
20
0
09 Nov 2023
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
Geyang Guo
Ranchi Zhao
Tianyi Tang
Wayne Xin Zhao
Ji-Rong Wen
ALM
16
27
0
07 Nov 2023
Implicit Chain of Thought Reasoning via Knowledge Distillation
Implicit Chain of Thought Reasoning via Knowledge Distillation
Yuntian Deng
Kiran Prasad
Roland Fernandez
P. Smolensky
Vishrav Chaudhary
Stuart M. Shieber
ReLM
LRM
6
42
0
02 Nov 2023
A Simple Solution for Offline Imitation from Observations and Examples
  with Possibly Incomplete Trajectories
A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories
Kai Yan
A. Schwing
Yu-xiong Wang
OffRL
12
5
0
02 Nov 2023
Blending Reward Functions via Few Expert Demonstrations for Faithful and
  Accurate Knowledge-Grounded Dialogue Generation
Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation
Wanyu Du
Yangfeng Ji
16
1
0
02 Nov 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
8
7
0
31 Oct 2023
Counterfactual-Augmented Importance Sampling for Semi-Offline Policy
  Evaluation
Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation
Shengpu Tang
Jenna Wiens
OffRL
CML
8
4
0
26 Oct 2023
COPR: Continual Learning Human Preference through Optimal Policy
  Regularization
COPR: Continual Learning Human Preference through Optimal Policy Regularization
Han Zhang
Lin Gui
Yuanzhao Zhai
Hui Wang
Yu Lei
Ruifeng Xu
CLL
16
0
0
24 Oct 2023
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
  Large Language Model Compression
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression
Jiduan Liu
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
Dongyan Zhao
R. Wang
Rui Yan
6
4
0
24 Oct 2023
Controlled Randomness Improves the Performance of Transformer Models
Controlled Randomness Improves the Performance of Transformer Models
Tobias Deuβer
Cong Zhao
Wolfgang Krämer
David Leonhard
Christian Bauckhage
R. Sifa
11
1
0
20 Oct 2023
Emptying the Ocean with a Spoon: Should We Edit Models?
Emptying the Ocean with a Spoon: Should We Edit Models?
Yuval Pinter
Michael Elhadad
KELM
20
26
0
18 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
11
45
0
16 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
16
1
0
15 Oct 2023
Improving Factual Consistency for Knowledge-Grounded Dialogue Systems
  via Knowledge Enhancement and Alignment
Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment
Boyang Xue
Weichao Wang
Hongru Wang
Fei Mi
Rui Wang
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
KELM
HILM
211
15
0
12 Oct 2023
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk
Ishita Mediratta
Christoforos Nalmpantis
Jelena Luketina
Eric Hambro
Edward Grefenstette
Roberta Raileanu
AI4CE
ALM
95
121
0
10 Oct 2023
Reinforced UI Instruction Grounding: Towards a Generic UI Task
  Automation API
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API
Zhizheng Zhang
Wenxuan Xie
Xiaoyi Zhang
Yan Lu
16
10
0
07 Oct 2023
Confronting Reward Model Overoptimization with Constrained RLHF
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz
Aaditya K. Singh
DJ Strouse
T. Sandholm
Ruslan Salakhutdinov
Anca D. Dragan
Stephen Marcus McAleer
16
47
0
06 Oct 2023
A Long Way to Go: Investigating Length Correlations in RLHF
A Long Way to Go: Investigating Length Correlations in RLHF
Prasann Singhal
Tanya Goyal
Jiacheng Xu
Greg Durrett
32
139
0
05 Oct 2023
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct
  Preference Optimization
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
Zhanhui Zhou
Jie Liu
Chao Yang
Jing Shao
Yu Liu
Xiangyu Yue
Wanli Ouyang
Yu Qiao
16
47
0
05 Oct 2023
Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation
Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation
Benjamin Steenhoek
Michele Tufano
Neel Sundaresan
Alexey Svyatkovskiy
OffRL
ALM
34
14
0
03 Oct 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for
  LLM Alignment
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Tianhao Wu
Banghua Zhu
Ruoyu Zhang
Zhaojin Wen
Kannan Ramchandran
Jiantao Jiao
20
53
0
30 Sep 2023
An Interactive Framework for Profiling News Media Sources
An Interactive Framework for Profiling News Media Sources
Nikhil Mehta
Dan Goldwasser
15
4
0
14 Sep 2023
Statistical Rejection Sampling Improves Preference Optimization
Statistical Rejection Sampling Improves Preference Optimization
Tianqi Liu
Yao-Min Zhao
Rishabh Joshi
Misha Khalman
Mohammad Saleh
Peter J. Liu
Jialu Liu
19
207
0
13 Sep 2023
Mitigating the Alignment Tax of RLHF
Mitigating the Alignment Tax of RLHF
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Yuan Yao
Tong Zhang
MoMe
CLL
11
40
0
12 Sep 2023
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights,
  and Duties
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Taylor Sorensen
Liwei Jiang
Jena D. Hwang
Sydney Levine
Valentina Pyatkin
...
Kavel Rao
Chandra Bhagavatula
Maarten Sap
J. Tasioulas
Yejin Choi
SLR
8
49
0
02 Sep 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
30
10
0
28 Aug 2023
Chunk, Align, Select: A Simple Long-sequence Processing Method for
  Transformers
Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Jiawen Xie
Pengyu Cheng
Xiao Liang
Yong Dai
Nan Du
32
2
0
25 Aug 2023
Prompt-Based Length Controlled Generation with Reinforcement Learning
Prompt-Based Length Controlled Generation with Reinforcement Learning
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
6
8
0
23 Aug 2023
From Instructions to Intrinsic Human Values -- A Survey of Alignment
  Goals for Big Models
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models
Jing Yao
Xiaoyuan Yi
Xiting Wang
Jindong Wang
Xing Xie
ALM
8
41
0
23 Aug 2023
Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language
  Model through Expert Feedback and Real-world Multi-turn Dialogue
Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue
Songhua Yang
Hanjia Zhao
Senbin Zhu
Guangyu Zhou
Hongfei Xu
Yuxiang Jia
Hongying Zan
AI4MH
LM&MA
8
106
0
07 Aug 2023
Reinforcement Learning for Generative AI: State of the Art,
  Opportunities and Open Research Challenges
Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges
Giorgio Franceschelli
Mirco Musolesi
AI4CE
16
19
0
31 Jul 2023
Previous
12345
Next