Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.21438
Cited By
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function
28 October 2024
Zhichao Wang
Bin Bi
Z. Zhu
Xiangbo Mao
Jun Wang
Shiyu Wang
CLL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function"
1 / 1 papers shown
Title
Towards Widening The Distillation Bottleneck for Reasoning Models
Huifeng Yin
Yu Zhao
M. Wu
Xuanfan Ni
Bo Zeng
...
Liangying Shao
Chenyang Lyu
Longyue Wang
Weihua Luo
Kaifu Zhang
LRM
42
1
0
03 Mar 2025
1