ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.18631
  4. Cited By
ReDit: Reward Dithering for Improved LLM Policy Optimization
v1v2v3v4 (latest)

ReDit: Reward Dithering for Improved LLM Policy Optimization

23 June 2025
Chenxing Wei
Jiarui Yu
Y. He
Hande Dong
Yao Shu
Fei Richard Yu
    LRM
ArXiv (abs)PDFHTMLHuggingFace (7 upvotes)Github (16024★)

Papers citing "ReDit: Reward Dithering for Improved LLM Policy Optimization"

3 / 3 papers shown
Title
GAPO: Robust Advantage Estimation for Real-World Code LLMs
GAPO: Robust Advantage Estimation for Real-World Code LLMs
Jianqing Zhang
Zhezheng Hao
Wei Xia
Hande Dong
Hong Wang
Chenxing Wei
Yuyan Zhou
Yubin Qi
Qiang Lin
Jian Cao
210
0
0
22 Oct 2025
Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs
Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs
Chenxing Wei
Hong Wang
Ying He
Fei Richard Yu
Yao Shu
96
1
0
27 Sep 2025
Flexora: Flexible Low Rank Adaptation for Large Language Models
Flexora: Flexible Low Rank Adaptation for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Chenxing Wei
Yao Shu
Y. He
Fei Richard Yu
AI4CE
308
8
0
20 Aug 2024
1