ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.10957
  4. Cited By
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
v1v2 (latest)

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

Neural Information Processing Systems (NeurIPS), 2020
25 February 2020
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
    VLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers"

50 / 877 papers shown
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
Jiwan Kim
Kibum Kim
Sangwoo Seo
Chanyoung Park
VLMCoGeLRM
202
5
0
10 Apr 2026
The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing?
The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing?
Sadat Shahriar
Navid Ayoobi
Arjun Mukherjee
DeLMO
219
0
0
04 Dec 2025
OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
Songyan Zhang
Wenhui Huang
Zhan Chen
Chua Jiahao Collister
Qihang Huang
Chen Lv
OffRLLRM
287
4
0
01 Dec 2025
Flowchart2Mermaid: A Vision-Language Model Powered System for Converting Flowcharts into Editable Diagram Code
Flowchart2Mermaid: A Vision-Language Model Powered System for Converting Flowcharts into Editable Diagram Code
Pritam Deka
Barry Devereux
151
0
0
01 Dec 2025
Breaking It Down: Domain-Aware Semantic Segmentation for Retrieval Augmented Generation
Breaking It Down: Domain-Aware Semantic Segmentation for Retrieval Augmented Generation
Aparajitha Allamraju
Maitreya Prafulla Chitale
Hiranmai Sri Adibhatla
Rahul Mishra
Manish Shrivastava
112
0
0
29 Nov 2025
CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA
CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA
Vsevolod Kovalev
Parteek Kumar
123
0
0
29 Nov 2025
From Topology to Retrieval: Decoding Embedding Spaces with Unified Signatures
From Topology to Retrieval: Decoding Embedding Spaces with Unified Signatures
Florian Rottach
William Rudman
Bastian Rieck
Harrisen Scells
Carsten Eickhoff
187
0
0
27 Nov 2025
From Compound Figures to Composite Understanding: Developing a Multi-Modal LLM from Biomedical Literature with Medical Multiple-Image Benchmarking and Validation
From Compound Figures to Composite Understanding: Developing a Multi-Modal LLM from Biomedical Literature with Medical Multiple-Image Benchmarking and Validation
Zhen Chen
Y. Fu
Gabriel Madera
Mauro Giuffre
Serina S Applebaum
Hyunjae Kim
Hua Xu
Qingyu Chen
106
0
0
27 Nov 2025
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
Tianyi Xiong
Yi Ge
Ming Li
Zuolong Zhang
Pranav Kulkarni
...
Yanshuo Chen
X. Wang
Renrui Zhang
Wenhu Chen
Heng Huang
ELM
287
7
0
26 Nov 2025
BAMAS: Structuring Budget-Aware Multi-Agent Systems
BAMAS: Structuring Budget-Aware Multi-Agent Systems
Liming Yang
Junyu Luo
Xuanzhe Liu
Yiling Lou
Zhenpeng Chen
LLMAG
413
0
0
26 Nov 2025
MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning
MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning
Junjian Wang
Lidan Zhao
Xi Sheryl Zhang
258
0
0
26 Nov 2025
Memories Retrieved from Many Paths: A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models
Memories Retrieved from Many Paths: A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models
Trung Cuong Dang
David A. Mohaisen
AAML
237
2
0
25 Nov 2025
Building Domain-Specific Small Language Models via Guided Data Generation
Building Domain-Specific Small Language Models via Guided Data Generation
Aman Kumar
Ekant Muljibhai Amin
Xian Yeow Lee
Lasitha Vidyaratne
Ahmed K. Farahat
Dipanjan Ghosh
Yuta Koreeda
Chetan Gupta
ALM
206
0
0
23 Nov 2025
Blu-WERP (Web Extraction and Refinement Pipeline): A Scalable Pipeline for Preprocessing Large Language Model Datasets
Blu-WERP (Web Extraction and Refinement Pipeline): A Scalable Pipeline for Preprocessing Large Language Model Datasets
Gowtham
Sai Rupesh
Sanjay Kumar
Saravanan
Venkata Chaithanya
VLM
259
1
0
22 Nov 2025
ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization
ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization
Ahmad Mohammadshirazi
Pinaki Prasad Guha Neogi
Dheeraj Kulshrestha
R. Ramnath
175
0
0
22 Nov 2025
When Better Teachers Don't Make Better Students: Revisiting Knowledge Distillation for CLIP Models in VQA
When Better Teachers Don't Make Better Students: Revisiting Knowledge Distillation for CLIP Models in VQA
Pume Tuchinda
Parinthapat Pengpun
Romrawin Chumpu
Sarana Nutanong
Peerat Limkonchotiwat
VLM
158
0
0
22 Nov 2025
TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval
TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval
Özay Ezerceli
Mahmoud El Hussieni
Selva Taş
Reyhan Bayraktar
Fatma Betül Terzioğlu
Yusuf Çelebi
Yağız Asker
VLM
192
0
0
20 Nov 2025
The Shifting Landscape of Vaccine Discourse: Insights From a Decade of Pre- to Post-COVID-19 Vaccine Posts on Social Media
The Shifting Landscape of Vaccine Discourse: Insights From a Decade of Pre- to Post-COVID-19 Vaccine Posts on Social MediaPLoS ONE (PLoS ONE), 2025
Nikesh Gyawali
Doina Caragea
Cornelia Caragea
Saif M. Mohammad
94
0
0
20 Nov 2025
A Systematic Study of Model Extraction Attacks on Graph Foundation Models
A Systematic Study of Model Extraction Attacks on Graph Foundation Models
Haoyan Xu
Ruizhi Qian
Jiate Li
Yushun Dong
Minghao Lin
...
Qinghua Liu
Junhao Dong
Ruopeng Huang
Yue Zhao
Mengyuan Li
AAML
173
1
0
14 Nov 2025
H-Model: Dynamic Neural Architectures for Adaptive Processing
H-Model: Dynamic Neural Architectures for Adaptive Processing
Dmytro Hospodarchuk
120
0
0
11 Nov 2025
Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models
Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models
Supriti Vijay
Aman Priyanshu
Anu Vellore
Baturay Saglam
Amin Karbasi
LRM
192
0
0
10 Nov 2025
Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis
Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis
Abhishek More
Anthony Zhang
Nicole Bonilla
Ashvik Vivekan
Kevin Zhu
Parham Sharafoleslami
Maheep Chaudhary
LRM
153
1
0
09 Nov 2025
RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework
RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework
Seif Ikbarieh
Kshitiz Aryal
Maanak Gupta
AAML
232
1
0
09 Nov 2025
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
Md. Abdul Awal
Mrigank Rochan
Chanchal K. Roy
248
1
0
07 Nov 2025
SARCH: Multimodal Search for Archaeological Archives
SARCH: Multimodal Search for Archaeological Archives
Nivedita Sinha
Bharati Khanijo
Sanskar Singh
Priyansh Mahant
Ashutosh Roy
Saubhagya Singh Bhadouria
Arpan Jain
Maya Ramanath
83
0
0
07 Nov 2025
CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic
CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic
Saad Mankarious
Ayah Zirikly
AI4MH
450
3
0
05 Nov 2025
Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification
Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification
Shaghayegh Kolli
Richard Rosenbaum
Timo Cavelius
Lasse Strothe
Andrii Lata
Jana Diesner
KELM
189
1
0
05 Nov 2025
Do Methods to Jailbreak and Defend LLMs Generalize Across Languages?
Do Methods to Jailbreak and Defend LLMs Generalize Across Languages?
Berk Atil
R. Passonneau
Fred Morstatter
AAML
287
1
0
01 Nov 2025
SpecAware: A Spectral-Content Aware Foundation Model for Unifying Multi-Sensor Learning in Hyperspectral Remote Sensing Mapping
SpecAware: A Spectral-Content Aware Foundation Model for Unifying Multi-Sensor Learning in Hyperspectral Remote Sensing Mapping
Renjie Ji
Xue Wang
Chao Niu
Wen Zhang
Yong Mei
Kun Tan
141
1
0
31 Oct 2025
Elastic Architecture Search for Efficient Language Models
Elastic Architecture Search for Efficient Language ModelsIEEE International Conference on Multimedia and Expo (ICME), 2025
Shang Wang
KELM
175
0
0
30 Oct 2025
Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual
Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual
Sukrit Sriratanawilai
Jhayahgrit Thongwat
Romrawin Chumpu
Patomporn Payoungkhamdee
Sarana Nutanong
Peerat Limkonchotiwat
VLM
201
0
0
30 Oct 2025
Vectorized Context-Aware Embeddings for GAT-Based Collaborative Filtering
Vectorized Context-Aware Embeddings for GAT-Based Collaborative Filtering
Danial Ebrat
Sepideh Ahmadian
Luis Rueda
73
1
0
30 Oct 2025
NetEcho: From Real-World Streaming Side-Channels to Full LLM Conversation Recovery
NetEcho: From Real-World Streaming Side-Channels to Full LLM Conversation Recovery
Z. Zhang
Guanlong Wu
Sen Deng
Shuai Wang
Y. Zhang
208
0
0
29 Oct 2025
DynaStride: Dynamic Stride Windowing with MMCoT for Instructional Multi-Scene Captioning
DynaStride: Dynamic Stride Windowing with MMCoT for Instructional Multi-Scene Captioning
Eddison Pham
Prisha Priyadarshini
Adrian Maliackel
Kanishk Bandi
Cristian Meo
Kevin Zhu
201
0
0
27 Oct 2025
COOPERA: Continual Open-Ended Human-Robot Assistance
COOPERA: Continual Open-Ended Human-Robot Assistance
Chenyang Ma
Kai Lu
Ruta Desai
Xavier Puig
Andrew Markham
Niki Trigoni
189
4
0
27 Oct 2025
Minimizing Human Intervention in Online Classification
Minimizing Human Intervention in Online Classification
William Réveillard
Vasileios Saketos
Alexandre Proutière
Richard Combes
152
0
0
27 Oct 2025
The Virtues of Brevity: Avoid Overthinking in Parallel Test-Time Reasoning
The Virtues of Brevity: Avoid Overthinking in Parallel Test-Time Reasoning
Raul Cavalcante Dinardi
Bruno Yamamoto
A. H. R. Costa
Artur Jordao
LRM
125
0
0
24 Oct 2025
Leveraging semantic similarity for experimentation with AI-generated treatments
Leveraging semantic similarity for experimentation with AI-generated treatments
Lei Shi
David Arbour
Raghavendra Addanki
Ritwik Sinha
Avi Feller
187
1
0
24 Oct 2025
Vision Language Models for Dynamic Human Activity Recognition in Healthcare Settings
Vision Language Models for Dynamic Human Activity Recognition in Healthcare Settings
Abderrazek Abid
Thanh-Cong Ho
Fakhri Karray
VLM
160
2
0
24 Oct 2025
The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts
The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts
Sangmitra Madhusudan
Kaige Chen
Ali Emami
ELMLRM
187
0
0
23 Oct 2025
Restoring Pruned Large Language Models via Lost Component Compensation
Restoring Pruned Large Language Models via Lost Component Compensation
Zijian Feng
Hanzhang Zhou
Zixiao Zhu
Tianjiao Li
Jia Jim Deryl Chua
Lee Onn Mak
Gee Wah Ng
Kezhi Mao
212
2
0
22 Oct 2025
Data Efficient Any Transformer-to-Mamba Distillation via Attention Bridge
Data Efficient Any Transformer-to-Mamba Distillation via Attention Bridge
Penghao Wang
Yuhao Zhou
Mengxuan Wu
Panpan Zhang
Zhangyang Wang
Kai Wang
Mamba
384
0
0
22 Oct 2025
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
Zebin Yang
Sunjian Zheng
Tong Xie
Tianshi Xu
Bo Yu
Fan Wang
Jie Tang
Shaoshan Liu
Meng Li
184
3
0
21 Oct 2025
PP3D: An In-Browser Vision-Based Defense Against Web Behavior Manipulation Attacks
PP3D: An In-Browser Vision-Based Defense Against Web Behavior Manipulation Attacks
Spencer King
Irfan Ozen
Karthika Subramani
Saranyan Senthivel
Phani Vadrevu
R. Perdisci
AAML
114
0
0
21 Oct 2025
Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation
Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation
Chanyoung Chung
Kyeongryul Lee
Sunbin Park
Joyce Jiyoung Whang
HAI
157
0
0
21 Oct 2025
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
Haoyu Huang
Hong Ting Tsang
Jiaxin Bai
Xi Peng
Gong Zhang
Yangqiu Song
RALMSLR
245
1
0
20 Oct 2025
Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses
Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses
Runlin Lei
Lu Yi
Mingguo He
Pengyu Qiu
Zhewei Wei
Yongchao Liu
Chuntao Hong
AAML
200
0
0
20 Oct 2025
ImaGGen: Zero-Shot Generation of Co-Speech Semantic Gestures Grounded in Language and Image Input
ImaGGen: Zero-Shot Generation of Co-Speech Semantic Gestures Grounded in Language and Image Input
Hendric Voss
Stefan Kopp
SLR
331
0
0
20 Oct 2025
Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report
Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report
Rikiya Takehi
Benjamin Clavié
Sean Lee
Aamir Shakir
VLM
166
4
0
16 Oct 2025
BitNet Distillation
BitNet Distillation
Xun Wu
Shaohan Huang
Wenhui Wang
Ting Song
Li Dong
Yan Xia
Furu Wei
MQ
219
0
0
15 Oct 2025
1234...161718
Next
Page 1 of 18
Pageof 18