Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2106.07704
Cited By
v1
v2
v3
v4 (latest)
Efficient (Soft) Q-Learning for Text Generation with Limited Good Data
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
14 June 2021
Han Guo
Bowen Tan
Zhengzhong Liu
Eric P. Xing
Zhiting Hu
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Efficient (Soft) Q-Learning for Text Generation with Limited Good Data"
17 / 17 papers shown
Title
How to Auto-optimize Prompts for Domain Tasks? Adaptive Prompting and Reasoning through Evolutionary Domain Knowledge Adaptation
Yang Zhao
Pu Wang
Hao Frank Yang
LRM
76
0
0
24 Oct 2025
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Yurun Yuan
Fan Chen
Zeyu Jia
Alexander Rakhlin
Tengyang Xie
OffRL
340
1
0
21 May 2025
Supervised Optimism Correction: Be Confident When LLMs Are Sure
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jing Zhang
Rushuai Yang
Shunyu Liu
Ting-En Lin
Fei Huang
Yi Chen
Yongqian Li
Dacheng Tao
OffRL
283
4
0
10 Apr 2025
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
Neural Information Processing Systems (NeurIPS), 2024
Xuan Chen
Yuzhou Nie
Wenbo Guo
Xiangyu Zhang
400
38
0
28 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
320
5
0
07 Jan 2025
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Buhua Liu
Shitong Shao
Bao Li
Lichen Bai
Zhiqiang Xu
Haoyi Xiong
James Kwok
Sumi Helal
Bo Han
431
22
0
11 Sep 2024
Can a Bayesian Oracle Prevent Harm from an Agent?
Conference on Uncertainty in Artificial Intelligence (UAI), 2024
Yoshua Bengio
Michael K. Cohen
Nikolay Malkin
Matt MacDermott
Damiano Fornasiere
Pietro Greiner
Younesse Kaddar
376
9
0
09 Aug 2024
M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models
Rishabh Maheshwary
Vikas Yadav
Hoang Nguyen
Khyati Mahajan
Sathwik Tejaswi Madhusudhan
444
7
0
24 Jun 2024
An Automatic Prompt Generation System for Tabular Data Tasks
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Ashlesha Akella
Abhijit Manatkar
Brij Chavda
Hima Patel
LMTD
183
2
0
09 May 2024
APrompt4EM: Augmented Prompt Tuning for Generalized Entity Matching
Yikuan Xia
Jiazun Chen
Xinchi Li
Jun Gao
VLM
307
3
0
08 May 2024
PRewrite: Prompt Rewriting with Reinforcement Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Weize Kong
Spurthi Amba Hombaiah
Mingyang Zhang
Qiaozhu Mei
Michael Bendersky
LLMAG
208
38
0
16 Jan 2024
GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zhihua Wen
Zhiliang Tian
Wei Wu
Yuxin Yang
Yanqi Shi
Zhen Huang
Dongsheng Li
RALM
307
19
0
09 Oct 2023
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
503
22
0
28 Aug 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Neural Information Processing Systems (NeurIPS), 2023
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
449
31
0
01 Jun 2023
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ximing Lu
Faeze Brahman
Peter West
Jaehun Jang
Khyathi Chandu
...
Bill Yuchen Lin
Skyler Hallinan
Xiang Ren
Sean Welleck
Yejin Choi
318
33
0
24 May 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
International Conference on Learning Representations (ICLR), 2023
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
322
13
0
24 May 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
216
5
0
04 Mar 2023
1