ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.17110
  4. Cited By
Sinkhorn Distance Minimization for Knowledge Distillation

Sinkhorn Distance Minimization for Knowledge Distillation

27 February 2024
Xiao Cui
Yulei Qin
Yuting Gao
Enwei Zhang
Zihan Xu
Tong Wu
Ke Li
Xing Sun
Wen-gang Zhou
Houqiang Li
ArXivPDFHTML

Papers citing "Sinkhorn Distance Minimization for Knowledge Distillation"

8 / 8 papers shown
Title
Scaling Laws for Speculative Decoding
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
33
0
0
08 May 2025
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Xiao Cui
Mo Zhu
Yulei Qin
Liang Xie
Wengang Zhou
H. Li
81
2
0
19 Dec 2024
Improving Neural Cross-Lingual Summarization via Employing Optimal
  Transport Distance for Knowledge Distillation
Improving Neural Cross-Lingual Summarization via Employing Optimal Transport Distance for Knowledge Distillation
Thong Nguyen
A. Luu
50
39
0
07 Dec 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
Geondo Park
Gyeongman Kim
Eunho Yang
34
37
0
17 Sep 2021
Learning Student-Friendly Teacher Networks for Knowledge Distillation
Learning Student-Friendly Teacher Networks for Knowledge Distillation
D. Park
Moonsu Cha
C. Jeong
Daesin Kim
Bohyung Han
113
99
0
12 Feb 2021
Training Deep Energy-Based Models with f-Divergence Minimization
Training Deep Energy-Based Models with f-Divergence Minimization
Lantao Yu
Yang Song
Jiaming Song
Stefano Ermon
163
42
0
06 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1