Papers citing 'Efficient (Soft) Q-Learning for Text Generation with Limited Good Data'

How to Auto-optimize Prompts for Domain Tasks? Adaptive Prompting and Reasoning through Evolutionary Domain Knowledge Adaptation

84

1

0

24 Oct 2025

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

352

1

0

21 May 2025

Supervised Optimism Correction: Be Confident When LLMs Are SureAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

291

4

0

10 Apr 2025

When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided SearchNeural Information Processing Systems (NeurIPS), 2024

412

39

0

28 Jan 2025

Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model

320

5

0

07 Jan 2025

Alignment of Diffusion Models: Fundamentals, Challenges, and Future

451

22

0

11 Sep 2024

Can a Bayesian Oracle Prevent Harm from an Agent?Conference on Uncertainty in Artificial Intelligence (UAI), 2024

380

10

0

09 Aug 2024

M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Sathwik Tejaswi Madhusudhan

464

7

0

24 Jun 2024

An Automatic Prompt Generation System for Tabular Data TasksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

187

2

0

09 May 2024

APrompt4EM: Augmented Prompt Tuning for Generalized Entity Matching

327

2

0

08 May 2024

PRewrite: Prompt Rewriting with Reinforcement LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Weize Kong

Spurthi Amba Hombaiah

228

38

0

16 Jan 2024

GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of EvidenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Dongsheng Li

319

19

0

09 Oct 2023

Reinforcement Learning for Generative AI: A Survey

Yuanjiang Cao

527

22

0

28 Aug 2023

Preference-grounded Token-level Guidance for Language Model Fine-tuningNeural Information Processing Systems (NeurIPS), 2023

461

31

0

01 Jun 2023

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Faeze Brahman

...

Xiang Ren

Yejin Choi

322

33

0

24 May 2023

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Faeze Brahman

346

13

0

24 May 2023

The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges

Maria Lymperaiou

Giorgos Stamou

VLM

224

5

0

04 Mar 2023

All Papers

Efficient (Soft) Q-Learning for Text Generation with Limited Good Data

Papers citing "Efficient (Soft) Q-Learning for Text Generation with Limited Good Data"

Efficient (Soft) Q-Learning for Text Generation with Limited Good Data

Papers citing "Efficient (Soft) Q-Learning for Text Generation with Limited Good Data"