ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09816
  4. Cited By
Towards Understanding Ensemble, Knowledge Distillation and
  Self-Distillation in Deep Learning

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

17 December 2020
Zeyuan Allen-Zhu
Yuanzhi Li
    FedML
ArXivPDFHTML

Papers citing "Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning"

50 / 215 papers shown
Title
Towards a theory of model distillation
Towards a theory of model distillation
Enric Boix-Adserà
FedML
VLM
44
6
0
14 Mar 2024
How does promoting the minority fraction affect generalization? A
  theoretical study of the one-hidden-layer neural network on group imbalance
How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance
Hongkang Li
Shuai Zhang
Yihua Zhang
Meng Wang
Sijia Liu
Pin-Yu Chen
33
4
0
12 Mar 2024
Learning to Maximize Mutual Information for Chain-of-Thought
  Distillation
Learning to Maximize Mutual Information for Chain-of-Thought Distillation
Xin Chen
Hanxian Huang
Yanjun Gao
Yi Wang
Jishen Zhao
Ke Ding
35
11
0
05 Mar 2024
How Do Nonlinear Transformers Learn and Generalize in In-Context
  Learning?
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
Hongkang Li
Meng Wang
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
MLT
40
14
0
23 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Tianyi Zhou
KELM
VLM
42
100
0
20 Feb 2024
Node Duplication Improves Cold-start Link Prediction
Node Duplication Improves Cold-start Link Prediction
Zhichun Guo
Tong Zhao
Yozen Liu
Kaiwen Dong
William Shiao
Neil Shah
Nitesh V. Chawla
AI4CE
18
3
0
15 Feb 2024
How many views does your deep neural network use for prediction?
How many views does your deep neural network use for prediction?
Keisuke Kawano
Takuro Kutsuna
Keisuke Sano
AI4CE
23
0
0
02 Feb 2024
MoDE: A Mixture-of-Experts Model with Mutual Distillation among the
  Experts
MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts
Zhitian Xie
Yinger Zhang
Chenyi Zhuang
Qitao Shi
Zhining Liu
Jinjie Gu
Guannan Zhang
MoE
27
3
0
31 Jan 2024
Bayes Conditional Distribution Estimation for Knowledge Distillation
  Based on Conditional Mutual Information
Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information
Linfeng Ye
Shayan Mohajer Hamidi
Renhao Tan
En-Hui Yang
VLM
37
12
0
16 Jan 2024
Revisiting Knowledge Distillation under Distribution Shift
Revisiting Knowledge Distillation under Distribution Shift
Songming Zhang
Ziyu Lyu
Xiaofeng Chen
26
1
0
25 Dec 2023
DEAP: Design Space Exploration for DNN Accelerator Parallelism
DEAP: Design Space Exploration for DNN Accelerator Parallelism
Ekansh Agrawal
Xiangyu Sam Xu
24
1
0
24 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
23
1
0
18 Dec 2023
CR-SFP: Learning Consistent Representation for Soft Filter Pruning
CR-SFP: Learning Consistent Representation for Soft Filter Pruning
Jingyang Xiang
Zhuangzhi Chen
Jianbiao Mei
Siqi Li
Jun Chen
Yong-Jin Liu
26
0
0
17 Dec 2023
Diversifying Spatial-Temporal Perception for Video Domain Generalization
Diversifying Spatial-Temporal Perception for Video Domain Generalization
Kun-Yu Lin
Jia-Run Du
Yipeng Gao
Jiaming Zhou
Wei-Shi Zheng
42
14
0
27 Oct 2023
DistillCSE: Distilled Contrastive Learning for Sentence Embeddings
DistillCSE: Distilled Contrastive Learning for Sentence Embeddings
Jiahao Xu
Wei Shao
Lihui Chen
Lemao Liu
FedML
24
4
0
20 Oct 2023
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Qingyue Zhao
Banghua Zhu
28
4
0
11 Oct 2023
Promoting Robustness of Randomized Smoothing: Two Cost-Effective
  Approaches
Promoting Robustness of Randomized Smoothing: Two Cost-Effective Approaches
Linbo Liu
T. Hoang
Lam M. Nguyen
Tsui-Wei Weng
AAML
19
0
0
11 Oct 2023
Why Does Sharpness-Aware Minimization Generalize Better Than SGD?
Why Does Sharpness-Aware Minimization Generalize Better Than SGD?
Zixiang Chen
Junkai Zhang
Yiwen Kou
Xiangning Chen
Cho-Jui Hsieh
Quanquan Gu
24
11
0
11 Oct 2023
In-Context Convergence of Transformers
In-Context Convergence of Transformers
Yu Huang
Yuan-Chia Cheng
Yingbin Liang
MLT
35
59
0
08 Oct 2023
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
  Reasoning
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Ke Wang
Houxing Ren
Aojun Zhou
Zimu Lu
Sichun Luo
Weikang Shi
Renrui Zhang
Linqi Song
Mingjie Zhan
Hongsheng Li
ReLM
LRM
SyDa
22
94
0
05 Oct 2023
Distilling Influences to Mitigate Prediction Churn in Graph Neural
  Networks
Distilling Influences to Mitigate Prediction Churn in Graph Neural Networks
Andreas Roth
Thomas Liebig
52
0
0
02 Oct 2023
Spurious Feature Diversification Improves Out-of-distribution
  Generalization
Spurious Feature Diversification Improves Out-of-distribution Generalization
Yong Lin
Lu Tan
Yifan Hao
Honam Wong
Hanze Dong
Weizhong Zhang
Yujiu Yang
Tong Zhang
OODD
26
24
0
29 Sep 2023
VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime
  Inference
VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference
S. Banerjee
Vinay K. Verma
Avideep Mukherjee
Deepak Gupta
Vinay P. Namboodiri
Piyush Rai
CLL
33
4
0
15 Sep 2023
Mitigating the Alignment Tax of RLHF
Mitigating the Alignment Tax of RLHF
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Yuan Yao
Tong Zhang
MoMe
CLL
29
63
0
12 Sep 2023
MoMA: Momentum Contrastive Learning with Multi-head Attention-based
  Knowledge Distillation for Histopathology Image Analysis
MoMA: Momentum Contrastive Learning with Multi-head Attention-based Knowledge Distillation for Histopathology Image Analysis
T. Vuong
J. T. Kwak
33
6
0
31 Aug 2023
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Hong Li
Xingyu Li
Pengbo Hu
Yinuo Lei
Chunxiao Li
Yi Zhou
28
20
0
15 Aug 2023
Towards Better Query Classification with Multi-Expert Knowledge
  Condensation in JD Ads Search
Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search
Kun-Peng Ning
Ming Pang
Zheng Fang
Xue Jiang
Xi-Wei Zhao
Changping Peng
Zhangang Lin
Jinghe Hu
Jingping Shao
17
0
0
02 Aug 2023
Cluster-aware Semi-supervised Learning: Relational Knowledge
  Distillation Provably Learns Clustering
Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering
Yijun Dong
Kevin Miller
Qiuyu Lei
Rachel A. Ward
17
4
0
20 Jul 2023
MinT: Boosting Generalization in Mathematical Reasoning via Multi-View
  Fine-Tuning
MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning
Zhenwen Liang
Dian Yu
Xiaoman Pan
Wenlin Yao
Qingkai Zeng
Xiangliang Zhang
Dong Yu
ALM
LRM
38
13
0
16 Jul 2023
Distilling Self-Supervised Vision Transformers for Weakly-Supervised
  Few-Shot Classification & Segmentation
Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation
Dahyun Kang
Piotr Koniusz
Minsu Cho
Naila Murray
VLM
ViT
23
24
0
07 Jul 2023
Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural
  Representations
Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Yongyi Yang
Jacob Steinhardt
Wei Hu
23
10
0
29 Jun 2023
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging
Max Zimmer
Christoph Spiegel
S. Pokutta
MoMe
41
14
0
29 Jun 2023
Graph Neural Networks Provably Benefit from Structural Information: A
  Feature Learning Perspective
Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective
Wei Huang
Yuanbin Cao
Hong Wang
Xin Cao
Taiji Suzuki
MLT
37
6
0
24 Jun 2023
The Implicit Bias of Batch Normalization in Linear Models and Two-layer
  Linear Convolutional Neural Networks
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks
Yuan Cao
Difan Zou
Yuan-Fang Li
Quanquan Gu
MLT
29
5
0
20 Jun 2023
Consistent Explanations in the Face of Model Indeterminacy via
  Ensembling
Consistent Explanations in the Face of Model Indeterminacy via Ensembling
Dan Ley
Leonard Tang
Matthew Nazari
Hongjin Lin
Suraj Srinivas
Himabindu Lakkaraju
14
2
0
09 Jun 2023
Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs.
  Continual Pre-training
Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training
Haode Zhang
Haowen Liang
Li-Ming Zhan
Xiao-Ming Wu
Albert Y. S. Lam
VLM
16
8
0
08 Jun 2023
Robust Learning with Progressive Data Expansion Against Spurious
  Correlation
Robust Learning with Progressive Data Expansion Against Spurious Correlation
Yihe Deng
Yu Yang
Baharan Mirzasoleiman
Quanquan Gu
OOD
MLT
21
24
0
08 Jun 2023
On the Joint Interaction of Models, Data, and Features
On the Joint Interaction of Models, Data, and Features
Yiding Jiang
Christina Baek
J. Zico Kolter
FedML
28
4
0
07 Jun 2023
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient
  for Convolutional Neural Networks
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
Mohammed Nowaz Rabbani Chowdhury
Shuai Zhang
M. Wang
Sijia Liu
Pin-Yu Chen
MoE
26
17
0
07 Jun 2023
Towards Understanding Clean Generalization and Robust Overfitting in
  Adversarial Training
Towards Understanding Clean Generalization and Robust Overfitting in Adversarial Training
Binghui Li
Yuanzhi Li
AAML
26
3
0
02 Jun 2023
Birth of a Transformer: A Memory Viewpoint
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
21
81
0
01 Jun 2023
Provable Benefit of Mixup for Finding Optimal Decision Boundaries
Provable Benefit of Mixup for Finding Optimal Decision Boundaries
Junsoo Oh
Chulee Yun
17
5
0
01 Jun 2023
A Recipe for Efficient SBIR Models: Combining Relative Triplet Loss with
  Batch Normalization and Knowledge Distillation
A Recipe for Efficient SBIR Models: Combining Relative Triplet Loss with Batch Normalization and Knowledge Distillation
Omar Seddati
Nathan Hubens
Stéphane Dupont
Thierry Dutoit
19
0
0
30 May 2023
EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture
  Generation
EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation
Xingqun Qi
Chen Liu
Lincheng Li
Jie Hou
Haoran Xin
Xin Yu
SLR
39
29
0
30 May 2023
Matrix Information Theory for Self-Supervised Learning
Matrix Information Theory for Self-Supervised Learning
Yifan Zhang
Zhi-Hao Tan
Jingqin Yang
Weiran Huang
Yang Yuan
SSL
42
16
0
27 May 2023
Towards Higher Pareto Frontier in Multilingual Machine Translation
Towards Higher Pareto Frontier in Multilingual Machine Translation
Yi-Chong Huang
Xiaocheng Feng
Xinwei Geng
Baohang Li
Bing Qin
33
9
0
25 May 2023
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better
  than Chain-of-thought Fine-tuning
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning
Xuekai Zhu
Biqing Qi
Kaiyan Zhang
Xingwei Long
Zhouhan Lin
Bowen Zhou
ALM
LRM
28
19
0
23 May 2023
Accurate Knowledge Distillation with n-best Reranking
Accurate Knowledge Distillation with n-best Reranking
Hendra Setiawan
21
2
0
20 May 2023
Logit-Based Ensemble Distribution Distillation for Robust Autoregressive
  Sequence Uncertainties
Logit-Based Ensemble Distribution Distillation for Robust Autoregressive Sequence Uncertainties
Yassir Fathullah
Guoxuan Xia
Mark J. F. Gales
UQCV
20
2
0
17 May 2023
TinyStories: How Small Can Language Models Be and Still Speak Coherent
  English?
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Ronen Eldan
Yuan-Fang Li
SyDa
LRM
18
237
0
12 May 2023
Previous
12345
Next