Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16869
Cited By
MPO: Multilingual Safety Alignment via Reward Gap Optimization
22 May 2025
Weixiang Zhao
Yulin Hu
Yang Deng
Tongtong Wu
Wenxuan Zhang
Jiahe Guo
An Zhang
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MPO: Multilingual Safety Alignment via Reward Gap Optimization"
20 / 20 papers shown
Title
The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context
Nikhil Verma
Manasa Bharadwaj
61
2
0
03 Apr 2025
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization
Zhenglin Zhou
Xiaobo Xia
Fan Ma
Hehe Fan
Yi Yang
Tat-Seng Chua
55
6
0
05 Feb 2025
LIMO: Less is More for Reasoning
Yixin Ye
Zhen Huang
Yang Xiao
Ethan Chern
Shijie Xia
Pengfei Liu
LRM
127
132
0
05 Feb 2025
LiPO: Listwise Preference Optimization through Learning-to-Rank
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
...
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
231
54
0
28 Jan 2025
Cross-lingual Transfer of Reward Models in Multilingual Alignment
Jiwoo Hong
Noah Lee
Rodrigo Martínez-Castaño
César Rodríguez
James Thorne
78
5
0
23 Oct 2024
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
Samuele Poppi
Zheng-Xin Yong
Yifei He
Bobbie Chern
Han Zhao
Aobo Yang
Jianfeng Chi
AAML
117
19
0
23 Oct 2024
Language Imbalance Driven Rewarding for Multilingual Self-improving
Wen Yang
Junhong Wu
Chen Wang
Chengqing Zong
J.N. Zhang
ALM
LRM
136
6
0
11 Oct 2024
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin
Sadhika Malladi
Adithya Bhaskar
Danqi Chen
Sanjeev Arora
Boris Hanin
146
23
0
11 Oct 2024
Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs
Weixuan Wang
Barry Haddow
Wei Peng
Alexandra Birch
MILM
58
16
0
13 Jun 2024
WildChat: 1M ChatGPT Interaction Logs in the Wild
Wenting Zhao
Xiang Ren
Jack Hessel
Claire Cardie
Yejin Choi
Yuntian Deng
67
208
0
02 May 2024
Disentangling Length from Quality in Direct Preference Optimization
Ryan Park
Rafael Rafailov
Stefano Ermon
Chelsea Finn
ALM
81
128
0
28 Mar 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
226
510
0
02 Feb 2024
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Bing Wang
Rui Zheng
Luyao Chen
Yan Liu
Shihan Dou
...
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yuanyuan Jiang
ALM
77
106
0
11 Jan 2024
Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages
Libo Qin
Qiguang Chen
Fuxuan Wei
Shijue Huang
Wanxiang Che
LRM
72
82
0
23 Oct 2023
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
155
597
0
18 Oct 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
300
4,186
0
09 Jun 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
54
439
0
13 Apr 2023
No Language Left Behind: Scaling Human-Centered Machine Translation
Nllb team
Marta R. Costa-jussá
James Cross
Onur cCelebi
Maha Elbayad
...
Alexandre Mourachko
C. Ropers
Safiyyah Saleem
Holger Schwenk
Jeff Wang
MoE
166
1,220
0
11 Jul 2022
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
548
41,106
0
28 May 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
434
1,664
0
18 Sep 2019
1