ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.12621
  4. Cited By
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
v1v2 (latest)

Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

20 February 2024
Runlong Zhou
Simon S. Du
Beibin Li
    OffRL
ArXiv (abs)PDFHTMLGithub (10★)

Papers citing "Reflect-RL: Two-Player Online RL Fine-Tuning for LMs"

3 / 3 papers shown
Reinforced Language Models for Sequential Decision Making
Reinforced Language Models for Sequential Decision Making
Jim Dilkes
V. Yazdanpanah
Sebastian Stein
LLMAG
143
0
0
14 Aug 2025
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLMSyDaALMLRM
979
540
0
18 Jan 2024
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Qingyun Wu
Gagan Bansal
Jieyu Zhang
Yiran Wu
Beibin Li
...
Jiale Liu
Ahmed Hassan Awadallah
Ryen W. White
Doug Burger
Chi Wang
LLMAGAI4CE
591
1,285
0
16 Aug 2023
1
Page 1 of 1