Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.01241
Cited By
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
3 October 2022
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization"
50 / 202 papers shown
Title
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
31
0
0
05 May 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
40
5
0
18 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
19
0
0
07 Apr 2025
Prompt Optimization with Logged Bandit Data
Haruka Kiyohara
Daniel Yiming Cao
Yuta Saito
Thorsten Joachims
58
0
0
03 Apr 2025
Multi-head Reward Aggregation Guided by Entropy
Xiaomin Li
Xupeng Chen
Jingxuan Fan
Eric Hanchen Jiang
Mingye Gao
AAML
44
1
0
26 Mar 2025
Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners
Wen Zheng Terence Ng
Jianda Chen
Yuan Xu
Tianwei Zhang
35
0
0
24 Mar 2025
CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization
Brihi Joshi
Sriram Venkatapathy
Mohit Bansal
Nanyun Peng
Haw-Shiuan Chang
LRM
41
0
0
21 Mar 2025
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
M. Wong
C. Tan
ALM
66
3
0
19 Mar 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei
Yijun Yang
Junliang Xing
Yuanchun Shi
Zongqing Lu
Deheng Ye
OffRL
LRM
39
1
0
11 Mar 2025
Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models
Niccolò Turcato
Matteo Iovino
Aris Synodinos
Alberto Dalla Libera
R. Carli
Pietro Falco
LM&Ro
33
0
0
06 Mar 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu
Zeyi Sun
Yuhang Zang
Xiaoyi Dong
Y. Cao
Haodong Duan
D. Lin
Jiaqi Wang
ObjD
VLM
LRM
62
40
0
03 Mar 2025
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Chenghua Huang
Lu Wang
Fangkai Yang
Pu Zhao
Z. Li
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
40
1
0
24 Feb 2025
Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation
Sha Li
Naren Ramakrishnan
RALM
KELM
145
1
0
18 Feb 2025
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Yuhui Chen
Shuai Tian
Shugao Liu
Yingting Zhou
Haoran Li
Dongbin Zhao
OffRL
53
1
0
08 Feb 2025
IPO: Iterative Preference Optimization for Text-to-Video Generation
Xiaomeng Yang
Zhiyu Tan
Xuecheng Nie
VGen
101
1
0
04 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
59
50
0
28 Jan 2025
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
Xuan Chen
Yuzhou Nie
Wenbo Guo
Xiangyu Zhang
103
9
0
28 Jan 2025
Quality Estimation based Feedback Training for Improving Pronoun Translation
Harshit Dhankhar
Baban Gain
Asif Ekbal
Yogesh Mani Tripathi
28
0
0
06 Jan 2025
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment
Jianfei Zhang
Jun Bai
B. Li
Yanmeng Wang
Rumei Li
Chenghua Lin
Wenge Rong
39
0
0
31 Dec 2024
AddrLLM: Address Rewriting via Large Language Model on Nationwide Logistics Data
Qinchen Yang
Zhiqing Hong
Dongjiang Cao
Haotian Wang
Zejun Xie
Tian He
Yunhuai Liu
Yu Yang
Desheng Zhang
KELM
57
0
0
17 Nov 2024
TODO: Enhancing LLM Alignment with Ternary Preferences
Yuxiang Guo
Lu Yin
Bo Jiang
Jiaqi Zhang
31
1
0
02 Nov 2024
Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning
Haining Wang
Jason Clark
Hannah McKelvey
Leila Sterman
Zheng Gao
Zuoyu Tian
Sandra Kübler
Xiaozhong Liu
26
1
0
22 Oct 2024
Negative-Prompt-driven Alignment for Generative Language Model
Shiqi Qiao
Ning Xv
Biao Liu
Xin Geng
ALM
SyDa
16
0
0
16 Oct 2024
QE-EBM: Using Quality Estimators as Energy Loss for Machine Translation
Gahyun Yoo
Jay Yoon Lee
22
0
0
14 Oct 2024
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin
Sadhika Malladi
Adithya Bhaskar
Danqi Chen
Sanjeev Arora
Boris Hanin
86
12
0
11 Oct 2024
Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
Tao Meng
Ninareh Mehrabi
Palash Goyal
Anil Ramakrishna
Aram Galstyan
Richard Zemel
Kai-Wei Chang
Rahul Gupta
Charith Peris
17
1
0
07 Oct 2024
The Perfect Blend: Redefining RLHF with Mixture of Judges
Tengyu Xu
Eryk Helenowski
Karthik Abinav Sankararaman
Di Jin
Kaiyan Peng
...
Gabriel Cohen
Yuandong Tian
Hao Ma
Sinong Wang
Han Fang
28
9
0
30 Sep 2024
Model-based Preference Optimization in Abstractive Summarization without Human Feedback
Jaepill Choi
Kyubyung Chae
Jiwoo Song
Yohan Jo
Taesup Kim
19
0
0
27 Sep 2024
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Jian Li
Haojing Huang
Yujia Zhang
Pengfei Xu
Xi Chen
Rui Song
Lida Shi
Jingwen Wang
Hao Xu
11
0
0
26 Sep 2024
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult
Cheolhun Jang
18
0
0
26 Sep 2024
Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through
f
f
f
-divergence Minimization
Haoyuan Sun
Bo Xia
Yongzhe Chang
Xueqian Wang
EGVM
24
2
0
15 Sep 2024
Table-to-Text Generation with Pretrained Diffusion Models
Aleksei S. Krylov
Oleg D. Somov
25
0
0
10 Sep 2024
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
Han Xia
Songyang Gao
Qiming Ge
Zhiheng Xi
Qi Zhang
Xuanjing Huang
28
4
0
27 Aug 2024
Making Large Language Models Better Planners with Reasoning-Decision Alignment
Zhijian Huang
Tao Tang
Shaoxiang Chen
Sihao Lin
Zequn Jie
Lin Ma
Guangrun Wang
Xiaodan Liang
35
8
0
25 Aug 2024
TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback
Eunseop Yoon
Hee Suk Yoon
Soohwan Eom
Gunsoo Han
D. W. Nam
DaeJin Jo
Kyoung-Woon On
M. Hasegawa-Johnson
Sungwoong Kim
C. Yoo
ALM
20
15
0
23 Jul 2024
Clinical Reading Comprehension with Encoder-Decoder Models Enhanced by Direct Preference Optimization
Md Sultan al Nahian
R. Kavuluru
MedIm
AI4CE
23
0
0
19 Jul 2024
New Desiderata for Direct Preference Optimization
Xiangkun Hu
Tong He
David Wipf
44
2
0
12 Jul 2024
Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment
Qizhang Feng
Siva Rajesh Kasa
Santhosh Kumar Kasa
Hyokun Yun
C. Teo
S. Bodapati
76
5
0
08 Jul 2024
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization
Yuyan Chen
Zhihao Wen
Ge Fan
Zhengyu Chen
Wei Yu Wu
Dayiheng Liu
Zhixu Li
Bang Liu
Yanghua Xiao
23
17
0
04 Jul 2024
Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data
Younghun Lee
Sungchul Kim
Ryan A. Rossi
Tong Yu
Xiang Chen
LMTD
27
1
0
03 Jul 2024
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Yun-Nung Chen
VLM
ALM
26
1
0
01 Jul 2024
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning
Zimu Lu
Aojun Zhou
Ke Wang
Houxing Ren
Weikang Shi
Junting Pan
Mingjie Zhan
Hongsheng Li
LRM
21
22
0
30 Jun 2024
Computational Politeness in Natural Language Processing: A Survey
Priyanshu Priya
Mauajama Firdaus
Asif Ekbal
27
10
0
28 Jun 2024
Suri: Multi-constraint Instruction Following for Long-form Text Generation
Chau Minh Pham
Simeng Sun
Mohit Iyyer
ALM
LRM
26
15
0
27 Jun 2024
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions
Nigel Fernandez
Alexander Scarlatos
Simon Woodhead
Andrew S. Lan
AAML
27
5
0
27 Jun 2024
Alignment For Performance Improvement in Conversation Bots
Raghav Garg
Kapil Sharma
Shrey Singla
21
0
0
27 Jun 2024
Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue
Huifang Du
Shuqin Li
Minghao Wu
Xuejing Feng
Yuan-Fang Li
Haofen Wang
OffRL
78
1
0
20 Jun 2024
FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering
Tianchi Cai
Zhiwen Tan
Xierui Song
Tao Sun
Jiyan Jiang
Yunqi Xu
Yinger Zhang
Jinjie Gu
19
5
0
19 Jun 2024
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
39
8
0
17 Jun 2024
Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
Junru Lu
Jiazheng Li
Siyu An
Meng Zhao
Yulan He
Di Yin
Xing Sun
31
13
0
16 Jun 2024
1
2
3
4
5
Next