ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1503.02531
  4. Cited By
Distilling the Knowledge in a Neural Network

Distilling the Knowledge in a Neural Network

9 March 2015
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
    FedML
ArXivPDFHTML

Papers citing "Distilling the Knowledge in a Neural Network"

50 / 327 papers shown
Title
Hybrid Attention Model Using Feature Decomposition and Knowledge Distillation for Glucose Forecasting
Ebrahim Farahmand
Shovito Barua Soumma
Nooshin Taheri Chatrudi
Hassan Ghasemzadeh
57
2
0
16 Nov 2024
Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head
Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head
Penghui Yang
Chen-Chen Zong
Sheng-Jun Huang
Lei Feng
Bo An
66
1
0
13 Nov 2024
Quantifying Knowledge Distillation Using Partial Information Decomposition
Quantifying Knowledge Distillation Using Partial Information Decomposition
Pasan Dissanayake
Faisal Hamman
Barproda Halder
Ilia Sucholutsky
Qiuyi Zhang
Sanghamitra Dutta
53
0
0
12 Nov 2024
LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
Runming Yang
Taiqiang Wu
Jiahao Wang
Pengfei Hu
Ngai Wong
Yujiu Yang
Yujiu Yang
325
1
0
11 Nov 2024
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Miguel Moura Ramos
Tomás Almeida
Daniel Vareta
Filipe Azevedo
Sweta Agrawal
Patrick Fernandes
André F. T. Martins
70
2
0
08 Nov 2024
Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using Knowledge Distillation and In-Context Adaptation
Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using Knowledge Distillation and In-Context Adaptation
Francisco Giral
Ignacio Gómez
Ricardo Vinuesa
S. L. Clainche
60
2
0
05 Nov 2024
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
59
0
0
01 Nov 2024
Scale-Aware Recognition in Satellite Images under Resource Constraints
Scale-Aware Recognition in Satellite Images under Resource Constraints
Shreelekha Revankar
Cheng Perng Phoo
Utkarsh Mall
Bharath Hariharan
Kavita Bala
71
0
0
31 Oct 2024
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
Zhaochong An
Guolei Sun
Yun Liu
Runjia Li
Min Wu
Ming-Ming Cheng
Ender Konukoglu
Serge Belongie
85
6
0
29 Oct 2024
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Justin Deschenaux
Çağlar Gülçehre
62
3
0
28 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
89
6
0
28 Oct 2024
Sparse Decomposition of Graph Neural Networks
Sparse Decomposition of Graph Neural Networks
Yaochen Hu
Mai Zeng
Ge Zhang
Pavel Rumiantsev
Liheng Ma
Yingxue Zhang
Mark Coates
95
0
0
25 Oct 2024
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
Jahyun Koo
Yerin Hwang
Yongil Kim
Taegwan Kang
Hyunkyung Bae
Kyomin Jung
83
0
0
25 Oct 2024
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
M. E. Ildiz
Halil Alperen Gozeten
Ege Onur Taga
Marco Mondelli
Samet Oymak
79
5
0
24 Oct 2024
Self-calibration for Language Model Quantization and Pruning
Self-calibration for Language Model Quantization and Pruning
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
341
0
0
22 Oct 2024
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
Kai Wang
Zekai Li
Zhi-Qi Cheng
Samir Khaki
A. Sajedi
Ramakrishna Vedantam
Konstantinos N. Plataniotis
Alexander G. Hauptmann
Yang You
DD
95
5
0
22 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
120
5
0
22 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
48
3
0
21 Oct 2024
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary
Hao-Tang Tsui
Chien-Yao Wang
H. Liao
ObjD
VLM
72
0
0
20 Oct 2024
Future-Guided Learning: A Predictive Approach To Enhance Time-Series Forecasting
Future-Guided Learning: A Predictive Approach To Enhance Time-Series Forecasting
Skye Gunasekaran
Assel Kembay
Hugo J. Ladret
Rui-Jie Zhu
Laurent Udo Perrinet
Omid Kavehei
Jason K. Eshraghian
AI4TS
50
0
0
19 Oct 2024
StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples
Ajay Patel
Jiacheng Zhu
Justin Qiu
Zachary Horvitz
Marianna Apidianaki
Kathleen McKeown
Chris Callison-Burch
92
3
0
16 Oct 2024
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Bokai Lin
Zihao Zeng
Zipeng Xiao
Siqi Kou
Tianqi Hou
Xiaofeng Gao
Hao Zhang
Zhijie Deng
36
5
0
16 Oct 2024
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenyuan Xu
Rujun Han
Zhenting Wang
L. Le
Dhruv Madeka
Lei Li
Wenjie Wang
Rishabh Agarwal
Chen-Yu Lee
Tomas Pfister
98
9
0
15 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
122
5
0
14 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
65
2
0
13 Oct 2024
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
Abhijnan Nath
Changsoo Jung
Ethan Seefried
Nikhil Krishnaswamy
347
2
0
11 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
107
0
0
09 Oct 2024
JPEG Inspired Deep Learning
JPEG Inspired Deep Learning
Ahmed H. Salamah
Kaixiang Zheng
Yiwen Liu
En-Hui Yang
51
0
0
09 Oct 2024
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan
Elias Stengel-Eskin
Jaemin Cho
Joey Tianyi Zhou
VGen
79
1
0
08 Oct 2024
MIRACLE3D: Memory-efficient Integrated Robust Approach for Continual Learning on Point Clouds via Shape Model Construction
MIRACLE3D: Memory-efficient Integrated Robust Approach for Continual Learning on Point Clouds via Shape Model Construction
Hossein Resani
B. Nasihatkon
3DV
310
0
0
08 Oct 2024
Efficient Inference for Large Language Model-based Generative Recommendation
Efficient Inference for Large Language Model-based Generative Recommendation
Xinyu Lin
Chaoqun Yang
Wenjie Wang
Yongqi Li
Cunxiao Du
Fuli Feng
See-Kiong Ng
Tat-Seng Chua
89
4
0
07 Oct 2024
Provable Weak-to-Strong Generalization via Benign Overfitting
Provable Weak-to-Strong Generalization via Benign Overfitting
David X. Wu
A. Sahai
108
8
0
06 Oct 2024
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Sijin Chen
Omar Hagrass
Jason M. Klusowski
60
4
0
04 Oct 2024
Selective Attention Improves Transformer
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
74
10
0
03 Oct 2024
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
S. Joshi
Jiayi Ni
Baharan Mirzasoleiman
DD
125
2
0
03 Oct 2024
Unveiling AI's Blind Spots: An Oracle for In-Domain, Out-of-Domain, and Adversarial Errors
Unveiling AI's Blind Spots: An Oracle for In-Domain, Out-of-Domain, and Adversarial Errors
Shuangpeng Han
Mengmi Zhang
269
0
0
03 Oct 2024
Backdooring Vision-Language Models with Out-Of-Distribution Data
Backdooring Vision-Language Models with Out-Of-Distribution Data
Weimin Lyu
Jiachen Yao
Saumya Gupta
Lu Pang
Tao Sun
Lingjie Yi
Lijie Hu
Haibin Ling
Chao Chen
VLM
AAML
78
4
0
02 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
85
5
0
02 Oct 2024
Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning
Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning
Da-Wei Zhou
Zi-Wen Cai
Han-Jia Ye
Lijun Zhang
De-Chuan Zhan
CLL
AI4CE
116
2
0
01 Oct 2024
Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies
Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies
Shalini Sarode
Muhammad Saif Ullah Khan
Tahira Shehzadi
Didier Stricker
Muhammad Zeshan Afzal
58
0
0
30 Sep 2024
Enhancing elusive clues in knowledge learning by contrasting attention of language models
Enhancing elusive clues in knowledge learning by contrasting attention of language models
Jian Gao
Xiao Zhang
Ji Wu
Miao Li
69
0
0
26 Sep 2024
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference
Zongyue Qin
Zifan He
Neha Prakriya
Jason Cong
Yizhou Sun
48
4
0
25 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
45
1
0
25 Sep 2024
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Amin Karimi Monsefi
Mengxi Zhou
Nastaran Karimi Monsefi
Ser-Nam Lim
Wei-Lun Chao
R. Ramnath
73
1
0
16 Sep 2024
Your Weak LLM is Secretly a Strong Teacher for Alignment
Your Weak LLM is Secretly a Strong Teacher for Alignment
Leitian Tao
Yixuan Li
95
6
0
13 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
132
26
0
10 Sep 2024
Replay Consolidation with Label Propagation for Continual Object Detection
Replay Consolidation with Label Propagation for Continual Object Detection
Riccardo De Monte
Davide Dalle Pezze
Marina Ceccon
Francesco Pasti
Francesco Paissan
Elisabetta Farella
Gian Antonio Susto
Nicola Bellotto
69
2
0
09 Sep 2024
On the Complexity of Neural Computation in Superposition
On the Complexity of Neural Computation in Superposition
Micah Adler
Nir Shavit
144
3
0
05 Sep 2024
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
Qianlong Xiang
Miao Zhang
Yuzhang Shang
Jianlong Wu
Yan Yan
Liqiang Nie
DiffM
82
10
0
05 Sep 2024
Collaborative Learning for Enhanced Unsupervised Domain Adaptation
Collaborative Learning for Enhanced Unsupervised Domain Adaptation
Minhee Cho
Hyesong Choi
Hayeon Jo
Dongbo Min
101
1
0
04 Sep 2024
Previous
1234567
Next