v1v2v3v4 (latest)

Efficient (Soft) Q-Learning for Text Generation with Limited Good Data

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

14 June 2021

Papers citing "Efficient (Soft) Q-Learning for Text Generation with Limited Good Data"

17 / 17 papers shown

How to Auto-optimize Prompts for Domain Tasks? Adaptive Prompting and Reasoning through Evolutionary Domain Knowledge Adaptation

24 Oct 2025

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

354

21 May 2025

Supervised Optimism Correction: Be Confident When LLMs Are SureAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

292

10 Apr 2025

When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided SearchNeural Information Processing Systems (NeurIPS), 2024

415

28 Jan 2025

Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model

324

07 Jan 2025

Alignment of Diffusion Models: Fundamentals, Challenges, and Future

451

11 Sep 2024

Can a Bayesian Oracle Prevent Harm from an Agent?Conference on Uncertainty in Artificial Intelligence (UAI), 2024

388

09 Aug 2024

M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Sathwik Tejaswi Madhusudhan

470

24 Jun 2024

An Automatic Prompt Generation System for Tabular Data TasksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

195

09 May 2024

APrompt4EM: Augmented Prompt Tuning for Generalized Entity Matching

333

08 May 2024

PRewrite: Prompt Rewriting with Reinforcement LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Weize Kong

Spurthi Amba Hombaiah

232

16 Jan 2024

GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of EvidenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Dongsheng Li

320

09 Oct 2023

Reinforcement Learning for Generative AI: A Survey

Yuanjiang Cao

534

28 Aug 2023

Preference-grounded Token-level Guidance for Language Model Fine-tuningNeural Information Processing Systems (NeurIPS), 2023

472

01 Jun 2023

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Faeze Brahman

...

Xiang Ren

Yejin Choi

326

24 May 2023

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Faeze Brahman

347

24 May 2023

The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges

Maria Lymperaiou

Giorgos Stamou

VLM

235

04 Mar 2023