Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.17110
Cited By
Sinkhorn Distance Minimization for Knowledge Distillation
27 February 2024
Xiao Cui
Yulei Qin
Yuting Gao
Enwei Zhang
Zihan Xu
Tong Wu
Ke Li
Xing Sun
Wen-gang Zhou
Houqiang Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sinkhorn Distance Minimization for Knowledge Distillation"
8 / 8 papers shown
Title
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
33
0
0
08 May 2025
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Xiao Cui
Mo Zhu
Yulei Qin
Liang Xie
Wengang Zhou
H. Li
81
2
0
19 Dec 2024
Improving Neural Cross-Lingual Summarization via Employing Optimal Transport Distance for Knowledge Distillation
Thong Nguyen
A. Luu
50
39
0
07 Dec 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Distilling Linguistic Context for Language Model Compression
Geondo Park
Gyeongman Kim
Eunho Yang
34
37
0
17 Sep 2021
Learning Student-Friendly Teacher Networks for Knowledge Distillation
D. Park
Moonsu Cha
C. Jeong
Daesin Kim
Bohyung Han
113
99
0
12 Feb 2021
Training Deep Energy-Based Models with f-Divergence Minimization
Lantao Yu
Yang Song
Jiaming Song
Stefano Ermon
163
42
0
06 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1