Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2506.18631
Cited By
v1
v2
v3
v4 (latest)
ReDit: Reward Dithering for Improved LLM Policy Optimization
23 June 2025
Chenxing Wei
Jiarui Yu
Y. He
Hande Dong
Yao Shu
Fei Richard Yu
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (7 upvotes)
Github (16024★)
Papers citing
"ReDit: Reward Dithering for Improved LLM Policy Optimization"
3 / 3 papers shown
Title
GAPO: Robust Advantage Estimation for Real-World Code LLMs
Jianqing Zhang
Zhezheng Hao
Wei Xia
Hande Dong
Hong Wang
Chenxing Wei
Yuyan Zhou
Yubin Qi
Qiang Lin
Jian Cao
210
0
0
22 Oct 2025
Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs
Chenxing Wei
Hong Wang
Ying He
Fei Richard Yu
Yao Shu
96
1
0
27 Sep 2025
Flexora: Flexible Low Rank Adaptation for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Chenxing Wei
Yao Shu
Y. He
Fei Richard Yu
AI4CE
308
8
0
20 Aug 2024
1