Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2011.02511
Cited By

Offline Reinforcement Learning from Human Feedback in Real-World
Sequence-to-Sequence Tasks

v1v2v3 (latest)

Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks

4 November 2020

Carolin (Haas) Lawrence

ArXiv (abs)PDF HTML

Papers citing "Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks"

11 / 11 papers shown

PARL: Prompt-based Agents for Reinforcement Learning

PARL: Prompt-based Agents for Reinforcement Learning

Yarik Menchaca Resendiz

228

0

0

24 Oct 2025

Conversational User-AI Intervention: A Study on Prompt Rewriting for Improved LLM Response Generation

Conversational User-AI Intervention: A Study on Prompt Rewriting for Improved LLM Response Generation

Bahareh Sarrafzadeh

N. Chandrasekaran

393

7

0

21 Mar 2025

TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for
T-Cell Receptor Repertoires Generation

TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for T-Cell Receptor Repertoires Generation

Roberto Santana

223

6

0

02 Aug 2024

Participation in the age of foundation models

Participation in the age of foundation models

435

60

0

29 May 2024

Multi-User Chat Assistant (MUCA): a Framework Using LLMs to Facilitate
Group Conversations

Multi-User Chat Assistant (MUCA): a Framework Using LLMs to Facilitate Group Conversations

476

14

0

10 Jan 2024

Reinforcement Learning for Generative AI: A Survey

Reinforcement Learning for Generative AI: A Survey

Yuanjiang Cao

588

27

0

28 Aug 2023

Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement
Learning

Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement LearningInternational Conference on Machine Learning (ICML), 2023

284

4

0

14 Jun 2023

SPEECH: Structured Prediction with Energy-Based Event-Centric
Hyperspheres

SPEECH: Structured Prediction with Energy-Based Event-Centric HyperspheresAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Shumin Deng

Ningyu Zhang

Bryan Hooi

285

6

0

23 May 2023

Is Reinforcement Learning (Not) for Natural Language Processing:
Benchmarks, Baselines, and Building Blocks for Natural Language Policy
Optimization

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

Rajkumar Ramamurthy

Prithviraj Ammanabrolu

Kianté Brantley

Christian Bauckhage

Hannaneh Hajishirzi

Yejin Choi

684

286

0

03 Oct 2022

Learning Non-Autoregressive Models from Search for Unsupervised Sentence
Summarization

Learning Non-Autoregressive Models from Search for Unsupervised Sentence SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

237

20

0

28 May 2022

A Survey of Human-in-the-loop for Machine Learning

A Survey of Human-in-the-loop for Machine Learning

Xingjiao Wu

655

717

0

02 Aug 2021

Page 1 of 1