ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.10957
  4. Cited By
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

25 February 2020
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
    VLM
ArXivPDFHTML

Papers citing "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers"

50 / 144 papers shown
Title
SMUTF: Schema Matching Using Generative Tags and Hybrid Features
SMUTF: Schema Matching Using Generative Tags and Hybrid Features
Yu Zhang
Mei Di
Haozheng Luo
Chenwei Xu
Richard Tzong-Han Tsai
54
7
0
22 Jan 2024
Knowledge Fusion of Large Language Models
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
27
61
0
19 Jan 2024
An Empirical Study of Scaling Law for OCR
An Empirical Study of Scaling Law for OCR
Miao Rang
Zhenni Bi
Chuanjian Liu
Yunhe Wang
Kai Han
27
6
0
29 Dec 2023
Vulnerability Analysis of Transformer-based Optical Character
  Recognition to Adversarial Attacks
Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks
Lucas Beerens
D. Higham
21
1
0
28 Nov 2023
How Well Do Large Language Models Truly Ground?
How Well Do Large Language Models Truly Ground?
Hyunji Lee
Se June Joo
Chaeeun Kim
Joel Jang
Doyoung Kim
Kyoung-Woon On
Minjoon Seo
HILM
21
6
0
15 Nov 2023
SAGE: Smart home Agent with Grounded Execution
SAGE: Smart home Agent with Grounded Execution
D. Rivkin
F. Hogan
Amal Feriani
Abhisek Konar
Adam Sigal
Steve Liu
Gregory Dudek
LM&Ro
LLMAG
ELM
LRM
18
3
0
01 Nov 2023
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Morris Alper
Hadar Averbuch-Elor
28
10
0
25 Oct 2023
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
  Language Models by Extrapolating Errors from Small Models
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Ruida Wang
Wangchunshu Zhou
Mrinmaya Sachan
19
32
0
20 Oct 2023
IDTraffickers: An Authorship Attribution Dataset to link and connect
  Potential Human-Trafficking Operations on Text Escort Advertisements
IDTraffickers: An Authorship Attribution Dataset to link and connect Potential Human-Trafficking Operations on Text Escort Advertisements
V. Saxena
Benjamin Bashpole
Gijs Van Dijck
Gerasimos Spanakis
37
2
0
09 Oct 2023
Papeos: Augmenting Research Papers with Talk Videos
Papeos: Augmenting Research Papers with Talk Videos
Tae Soo Kim
Matt Latzke
Jonathan Bragg
Amy X. Zhang
Joseph Chee Chang
14
10
0
29 Aug 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language
  Models
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
25
5
0
07 Aug 2023
Towards General Text Embeddings with Multi-stage Contrastive Learning
Towards General Text Embeddings with Multi-stage Contrastive Learning
Zehan Li
Xin Zhang
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
47
336
0
07 Aug 2023
DARE: Towards Robust Text Explanations in Biomedical and Healthcare
  Applications
DARE: Towards Robust Text Explanations in Biomedical and Healthcare Applications
Adam Ivankay
Mattia Rigotti
P. Frossard
OOD
MedIm
16
1
0
05 Jul 2023
GIO: Gradient Information Optimization for Training Dataset Selection
GIO: Gradient Information Optimization for Training Dataset Selection
Dante Everaert
Christopher Potts
19
3
0
20 Jun 2023
Scalable Performance Analysis for Vision-Language Models
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
19
1
0
30 May 2023
Referral Augmentation for Zero-Shot Information Retrieval
Referral Augmentation for Zero-Shot Information Retrieval
Michael Tang
Shunyu Yao
John Yang
Karthik Narasimhan
19
3
0
24 May 2023
Evaluating Prompt-based Question Answering for Object Prediction in the
  Open Research Knowledge Graph
Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge Graph
Jennifer D'Souza
Moussab Hrou
Sören Auer
19
2
0
22 May 2023
Farewell to Aimless Large-scale Pretraining: Influential Subset
  Selection for Language Model
Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model
Xiao Wang
Wei Zhou
Qi Zhang
Jie Zhou
Songyang Gao
Junzhe Wang
Menghan Zhang
Xiang Gao
Yunwen Chen
Tao Gui
34
7
0
22 May 2023
MemoryBank: Enhancing Large Language Models with Long-Term Memory
MemoryBank: Enhancing Large Language Models with Long-Term Memory
Wanjun Zhong
Lianghong Guo
Qi-Fei Gao
He Ye
Yanlin Wang
LLMAG
RALM
KELM
28
123
0
17 May 2023
Curating corpora with classifiers: A case study of clean energy sentiment online
Curating corpora with classifiers: A case study of clean energy sentiment online
M. V. Arnold
P. Dodds
C. Danforth
16
0
0
04 May 2023
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
Jiahao Wang
Songyang Zhang
Yong Liu
Taiqiang Wu
Yujiu Yang
Xihui Liu
Kai-xiang Chen
Ping Luo
Dahua Lin
11
20
0
12 Apr 2023
Towards Efficient Fine-tuning of Pre-trained Code Models: An
  Experimental Study and Beyond
Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond
Ensheng Shi
Yanlin Wang
Hongyu Zhang
Lun Du
Shi Han
Dongmei Zhang
Hongbin Sun
28
41
0
11 Apr 2023
On Codex Prompt Engineering for OCL Generation: An Empirical Study
On Codex Prompt Engineering for OCL Generation: An Empirical Study
Seif Abukhalaf
Mohammad Hamdaqa
Foutse Khomh
26
20
0
28 Mar 2023
The Semantic Reader Project: Augmenting Scholarly Documents through
  AI-Powered Interactive Reading Interfaces
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Kyle Lo
Joseph Chee Chang
Andrew Head
Jonathan Bragg
Amy X. Zhang
...
Caroline M Wu
Jiangjiang Yang
Angele Zamarron
Marti A. Hearst
Daniel S. Weld
19
19
0
25 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
23
30
0
21 Mar 2023
Neural Architecture Search for Effective Teacher-Student Knowledge
  Transfer in Language Models
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Rameswar Panda
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
16
6
0
16 Mar 2023
Smooth and Stepwise Self-Distillation for Object Detection
Smooth and Stepwise Self-Distillation for Object Detection
Jieren Deng
Xiaoxia Zhou
Hao Tian
Zhihong Pan
Derek Aguiar
ObjD
13
0
0
09 Mar 2023
KS-DETR: Knowledge Sharing in Attention Learning for Detection
  Transformer
KS-DETR: Knowledge Sharing in Attention Learning for Detection Transformer
Kaikai Zhao
Norimichi Ukita
MU
23
1
0
22 Feb 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
  Transformers
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
16
24
0
19 Feb 2023
Few-shot Multimodal Multitask Multilingual Learning
Few-shot Multimodal Multitask Multilingual Learning
Aman Chadha
Vinija Jain
34
0
0
19 Feb 2023
Revisiting Intermediate Layer Distillation for Compressing Language
  Models: An Overfitting Perspective
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
21
6
0
03 Feb 2023
idT5: Indonesian Version of Multilingual T5 Transformer
idT5: Indonesian Version of Multilingual T5 Transformer
Mukhlish Fuadi
A. Wibawa
S. Sumpeno
6
6
0
02 Feb 2023
Improved Knowledge Distillation for Pre-trained Language Models via
  Knowledge Selection
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection
Chenglong Wang
Yi Lu
Yongyu Mu
Yimin Hu
Tong Xiao
Jingbo Zhu
24
8
0
01 Feb 2023
Music Playlist Title Generation Using Artist Information
Music Playlist Title Generation Using Artist Information
Haven Kim
Seungheon Doh
Junwon Lee
Juhan Nam
24
3
0
14 Jan 2023
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
Leonid Boytsov
Preksha Patel
Vivek Sourabh
Riddhi Nisar
Sayan Kundu
R. Ramanathan
Eric Nyberg
19
19
0
08 Jan 2023
Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt
Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt
Zhichao Yang
Sunjae Kwon
Zonghai Yao
Hongfeng Yu
13
17
0
24 Nov 2022
MGTCOM: Community Detection in Multimodal Graphs
MGTCOM: Community Detection in Multimodal Graphs
E. Dmitriev
M. Chekol
S. Wang
25
0
0
10 Nov 2022
Gradient Knowledge Distillation for Pre-trained Language Models
Gradient Knowledge Distillation for Pre-trained Language Models
Lean Wang
Lei Li
Xu Sun
VLM
16
5
0
02 Nov 2022
Multimodal Transformer Distillation for Audio-Visual Synchronization
Multimodal Transformer Distillation for Audio-Visual Synchronization
Xuan-Bo Chen
Haibin Wu
Chung-Che Wang
Hung-yi Lee
J. Jang
19
3
0
27 Oct 2022
MTEB: Massive Text Embedding Benchmark
MTEB: Massive Text Embedding Benchmark
Niklas Muennighoff
Nouamane Tazi
L. Magne
Nils Reimers
21
369
0
13 Oct 2022
Short Text Pre-training with Extended Token Classification for
  E-commerce Query Understanding
Short Text Pre-training with Extended Token Classification for E-commerce Query Understanding
Haoming Jiang
Tianyu Cao
Zheng Li
Cheng-hsin Luo
Xianfeng Tang
Qingyu Yin
Danqing Zhang
R. Goutam
Bing Yin
RALM
16
11
0
08 Oct 2022
An Embedding-Based Grocery Search Model at Instacart
An Embedding-Based Grocery Search Model at Instacart
Yuqing Xie
Taesik Na
X. Xiao
Saurav Manchanda
Young Rao
Zhihong Xu
Guanghua Shu
Esther Vasiete
Tejaswi Tenneti
Haixun Wang
DML
RALM
16
6
0
12 Sep 2022
Towards explainable evaluation of language models on the semantic
  similarity of visual concepts
Towards explainable evaluation of language models on the semantic similarity of visual concepts
Maria Lymperaiou
George Manoliadis
Orfeas Menis-Mastromichalakis
Edmund Dervakos
Giorgos Stamou
AAML
16
5
0
08 Sep 2022
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain
  Chatbots
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots
Waiman Si
Michael Backes
Jeremy Blackburn
Emiliano De Cristofaro
Gianluca Stringhini
Savvas Zannettou
Yang Zhang
19
58
0
07 Sep 2022
Evaluating Dense Passage Retrieval using Transformers
Evaluating Dense Passage Retrieval using Transformers
Nima Sadri
11
0
0
15 Aug 2022
Threddy: An Interactive System for Personalized Thread-based Exploration
  and Organization of Scientific Literature
Threddy: An Interactive System for Personalized Thread-based Exploration and Organization of Scientific Literature
Hyeonsu B Kang
Joseph Chee Chang
Yongsung Kim
A. Kittur
22
39
0
06 Aug 2022
Towards trustworthy Energy Disaggregation: A review of challenges,
  methods and perspectives for Non-Intrusive Load Monitoring
Towards trustworthy Energy Disaggregation: A review of challenges, methods and perspectives for Non-Intrusive Load Monitoring
Maria Kaselimi
Eftychios E. Protopapadakis
A. Voulodimos
N. Doulamis
Anastasios Doulamis
20
64
0
05 Jul 2022
An End-to-End Set Transformer for User-Level Classification of
  Depression and Gambling Disorder
An End-to-End Set Transformer for User-Level Classification of Depression and Gambling Disorder
Ana-Maria Bucur
Adrian Cosma
Liviu P. Dinu
Paolo Rosso
23
8
0
02 Jul 2022
Recall Distortion in Neural Network Pruning and the Undecayed Pruning
  Algorithm
Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
Aidan Good
Jia-Huei Lin
Hannah Sieg
Mikey Ferguson
Xin Yu
Shandian Zhe
J. Wieczorek
Thiago Serra
14
11
0
07 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for
  Large-Scale Transformers
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
34
438
0
04 Jun 2022
Previous
123
Next