ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.02320
32
0

Post-edits Are Preferences Too

24 February 2025
Nathaniel Berger
Stefan Riezler
M. Exel
Matthias Huck
ArXivPDFHTML
Abstract

Preference Optimization (PO) techniques are currently one of the state of the art techniques for fine-tuning large language models (LLMs) on pairwise preference feedback from human annotators. However, in machine translation, this sort of feedback can be difficult to solicit. Additionally, Kreutzer et al. (2018) have shown that, for machine translation, pairwise preferences are less reliable than other forms of human feedback, such as 5-point ratings.We examine post-edits to see if they can be a source of reliable human preferences by construction. In PO, a human annotator is shown sequences s1s_1s1​ and s2s_2s2​ and asked for a preference judgment, %s1>s2s_1 > s_2s1​>s2​; while for post-editing, editors create s1s_1s1​ and know that it should be better than s2s_2s2​. We attempt to use these implicit preferences for PO and show that it helps the model move towards post-edit-like hypotheses and away from machine translation-like hypotheses. Furthermore, we show that best results are obtained by pre-training the model with supervised fine-tuning (SFT) on post-edits in order to promote post-edit-like hypotheses to the top output ranks.

View on arXiv
@article{berger2025_2410.02320,
  title={ Post-edits Are Preferences Too },
  author={ Nathaniel Berger and Miriam Exel and Matthias Huck and Stefan Riezler },
  journal={arXiv preprint arXiv:2410.02320},
  year={ 2025 }
}
Comments on this paper