Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.02834
Cited By
MergeDistill: Merging Pre-trained Language Models using Distillation
5 June 2021
Simran Khanuja
Melvin Johnson
Partha P. Talukdar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MergeDistill: Merging Pre-trained Language Models using Distillation"
8 / 8 papers shown
Title
Advantage-Guided Distillation for Preference Alignment in Small Language Models
Shiping Gao
Fanqi Wan
Jiajian Guo
Xiaojun Quan
Qifan Wang
ALM
58
0
0
25 Feb 2025
Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing
Vilém Zouhar
AAML
35
0
0
29 Jan 2024
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
27
61
0
19 Jan 2024
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
234
0
31 Dec 2020
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
93
142
0
24 Oct 2020
Improving Multilingual Models with Language-Clustered Vocabularies
Hyung Won Chung
Dan Garrette
Kiat Chuan Tan
Jason Riesa
VLM
58
65
0
24 Oct 2020
What the [MASK]? Making Sense of Language-Specific BERT Models
Debora Nozza
Federico Bianchi
Dirk Hovy
84
105
0
05 Mar 2020
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,740
0
26 Sep 2016
1