ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.21438
  4. Cited By
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function

UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function

28 October 2024
Zhichao Wang
Bin Bi
Z. Zhu
Xiangbo Mao
Jun Wang
Shiyu Wang
    CLL
ArXivPDFHTML

Papers citing "UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function"

1 / 1 papers shown
Title
Towards Widening The Distillation Bottleneck for Reasoning Models
Huifeng Yin
Yu Zhao
M. Wu
Xuanfan Ni
Bo Zeng
...
Liangying Shao
Chenyang Lyu
Longyue Wang
Weihua Luo
Kaifu Zhang
LRM
42
1
0
03 Mar 2025
1