Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.02513
Cited By
Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF
4 March 2024
Chen Zheng
Ke Sun
Hang Wu
Chenguang Xi
Xun Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF"
7 / 7 papers shown
Title
LiPO: Listwise Preference Optimization through Learning-to-Rank
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
...
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
133
47
0
28 Jan 2025
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
153
437
0
02 Feb 2024
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers
Chen Zheng
Ke Sun
Da Tang
Yukun Ma
Yuyu Zhang
Chenguang Xi
Xun Zhou
LRM
LLMAG
39
2
0
04 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
55
95
0
03 Jan 2024
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
Yihan Wang
Si Si
Daliang Li
Michal Lukasik
Felix X. Yu
Cho-Jui Hsieh
Inderjit S Dhillon
Sanjiv Kumar
32
29
0
01 Nov 2022
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
Suchin Gururangan
Dallas Card
Sarah K. Drier
E. K. Gade
Leroy Z. Wang
Zeyu Wang
Luke Zettlemoyer
Noah A. Smith
160
72
0
25 Jan 2022
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Mai Elsherief
Caleb Ziems
D. Muchlinski
Vaishnavi Anupindi
Jordyn Seybolt
M. D. Choudhury
Diyi Yang
87
233
0
11 Sep 2021
1