Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting

9 March 2025

Abstract

Generic text rewriting is a prevalent large language model (LLM) application that covers diverse real-world tasks, such as style transfer, fact correction, and email editing. These tasks vary in rewriting objectives (e.g., factual consistency vs. semantic preservation), making it challenging to develop a unified model that excels across all dimensions. Existing methods often specialize in either a single task or a specific objective, limiting their generalizability. In this work, we introduce a generic model proficient in factuality, stylistic, and conversational rewriting tasks. To simulate real-world user rewrite requests, we construct a conversational rewrite dataset, ChatRewrite, that presents ``natural''-sounding instructions, from raw emails using LLMs. Combined with other popular rewrite datasets, including LongFact for the factuality rewrite task and RewriteLM for the stylistic rewrite task, this forms a broad benchmark for training and evaluating generic rewrite models. To align with task-specific objectives, we propose Dr Genre, a Decoupled-reward learning framework for Generic rewriting, that utilizes objective-oriented reward models with a task-specific weighting. Evaluation shows that \approach delivers higher-quality rewrites across all targeted tasks, improving objectives including instruction following (agreement), internal consistency (coherence), and minimal unnecessary edits (conciseness).

View on arXiv

@article{li2025_2503.06781,
  title={ Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting },
  author={ Yufei Li and John Nham and Ganesh Jawahar and Lei Shu and David Uthus and Yun-Hsuan Sung and Chengrun Yang and Itai Rolnick and Yi Qiao and Cong Liu },
  journal={arXiv preprint arXiv:2503.06781},
  year={ 2025 }
}

Comments on this paper