ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.06147
  4. Cited By
Reinforcement Learning in the Era of LLMs: What is Essential? What is
  needed? An RL Perspective on RLHF, Prompting, and Beyond

Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond

9 October 2023
Hao Sun
    OffRL
ArXivPDFHTML

Papers citing "Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond"

14 / 14 papers shown
Title
Stratified Expert Cloning with Adaptive Selection for User Retention in Large-Scale Recommender Systems
Stratified Expert Cloning with Adaptive Selection for User Retention in Large-Scale Recommender Systems
Chengzhi Lin
Annan Xie
Shuchang Liu
Wuhong Wang
Chuyuan Wang
Yongqi Liu
OffRL
19
0
0
08 Apr 2025
Advantage-Guided Distillation for Preference Alignment in Small Language Models
Advantage-Guided Distillation for Preference Alignment in Small Language Models
Shiping Gao
Fanqi Wan
Jiajian Guo
Xiaojun Quan
Qifan Wang
ALM
58
0
0
25 Feb 2025
Reinforcement Learning Enhanced LLMs: A Survey
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
J. Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
131
7
0
05 Dec 2024
Exploring Prompt Engineering: A Systematic Review with SWOT Analysis
Exploring Prompt Engineering: A Systematic Review with SWOT Analysis
Aditi Singh
Abul Ehtesham
Gaurav Kumar Gupta
Nikhil Kumar Chatta
Saket Kumar
T. T. Khoei
28
1
0
09 Oct 2024
Optimizing Autonomous Driving for Safety: A Human-Centric Approach with
  LLM-Enhanced RLHF
Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF
Yuan Sun
Navid Salami Pargoo
Peter J. Jin
Jorge Ortiz
27
18
0
06 Jun 2024
Using RL to Identify Divisive Perspectives Improves LLMs Abilities to
  Identify Communities on Social Media
Using RL to Identify Divisive Perspectives Improves LLMs Abilities to Identify Communities on Social Media
Nikhil Mehta
Dan Goldwasser
21
0
0
03 Jun 2024
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is
  Implicitly an Adversarial Regularizer
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
38
41
0
26 May 2024
Rewards-in-Context: Multi-objective Alignment of Foundation Models with
  Dynamic Preference Adjustment
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
Rui Yang
Xiaoman Pan
Feng Luo
Shuang Qiu
Han Zhong
Dong Yu
Jianshu Chen
95
66
0
15 Feb 2024
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement
  Learning and Large Language Models
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
M. Pternea
Prerna Singh
Abir Chakraborty
Y. Oruganti
M. Milletarí
Sayli Bapat
Kebei Jiang
OffRL
11
6
0
02 Feb 2024
Towards Robust Offline Reinforcement Learning under Diverse Data
  Corruption
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
Rui Yang
Han Zhong
Jiawei Xu
Amy Zhang
Chong Zhang
Lei Han
Tong Zhang
OffRL
OnRL
41
15
0
19 Oct 2023
SELF: Self-Evolution with Language Feedback
SELF: Self-Evolution with Language Feedback
Jianqiao Lu
Wanjun Zhong
Wenyong Huang
Yufei Wang
Qi Zhu
...
Weichao Wang
Xingshan Zeng
Lifeng Shang
Xin Jiang
Qun Liu
LRM
SyDa
13
6
0
01 Oct 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
42
10
0
28 Aug 2023
What is Flagged in Uncertainty Quantification? Latent Density Models for
  Uncertainty Categorization
What is Flagged in Uncertainty Quantification? Latent Density Models for Uncertainty Categorization
Hao Sun
B. V. Breugel
Jonathan Crabbé
Nabeel Seedat
M. Schaar
22
4
0
11 Jul 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
1