v1v2 (latest)

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

Neural Information Processing Systems (NeurIPS), 2020

25 February 2020

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers"

50 / 877 papers shown

CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs

202

10 Apr 2026

The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing?

219

04 Dec 2025

OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic

Songyan Zhang

Wenhui Huang

Zhan Chen

Chua Jiahao Collister

Qihang Huang

Chen Lv

OffRL LRM

287

01 Dec 2025

Flowchart2Mermaid: A Vision-Language Model Powered System for Converting Flowcharts into Editable Diagram Code

Pritam Deka

Barry Devereux

151

01 Dec 2025

Breaking It Down: Domain-Aware Semantic Segmentation for Retrieval Augmented Generation

Aparajitha Allamraju

Maitreya Prafulla Chitale

Hiranmai Sri Adibhatla

Rahul Mishra

Manish Shrivastava

112

29 Nov 2025

CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA

Vsevolod Kovalev

Parteek Kumar

123

29 Nov 2025

From Topology to Retrieval: Decoding Embedding Spaces with Unified Signatures

187

27 Nov 2025

From Compound Figures to Composite Understanding: Developing a Multi-Modal LLM from Biomedical Literature with Medical Multiple-Image Benchmarking and Validation

106

27 Nov 2025

Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following

...

287

26 Nov 2025

BAMAS: Structuring Budget-Aware Multi-Agent Systems

413

26 Nov 2025

MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning

Junjian Wang

Lidan Zhao

Xi Sheryl Zhang

258

26 Nov 2025

Memories Retrieved from Many Paths: A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models

Trung Cuong Dang

David A. Mohaisen

AAML

237

25 Nov 2025

Building Domain-Specific Small Language Models via Guided Data Generation

206

23 Nov 2025

Blu-WERP (Web Extraction and Refinement Pipeline): A Scalable Pipeline for Preprocessing Large Language Model Datasets

259

22 Nov 2025

ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization

Ahmad Mohammadshirazi

Pinaki Prasad Guha Neogi

Dheeraj Kulshrestha

R. Ramnath

175

22 Nov 2025

When Better Teachers Don't Make Better Students: Revisiting Knowledge Distillation for CLIP Models in VQA

Peerat Limkonchotiwat

VLM

158

22 Nov 2025

TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval

Fatma Betül Terzioğlu

Yusuf Çelebi

Yağız Asker

VLM

192

20 Nov 2025

The Shifting Landscape of Vaccine Discourse: Insights From a Decade of Pre- to Post-COVID-19 Vaccine Posts on Social MediaPLoS ONE (PLoS ONE), 2025

20 Nov 2025

A Systematic Study of Model Extraction Attacks on Graph Foundation Models

...

173

14 Nov 2025

H-Model: Dynamic Neural Architectures for Adaptive Processing

Dmytro Hospodarchuk

120

11 Nov 2025

Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models

192

10 Nov 2025

Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis

Parham Sharafoleslami

Maheep Chaudhary

LRM

153

09 Nov 2025

RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework

232

09 Nov 2025

A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?

Md. Abdul Awal

Mrigank Rochan

Chanchal K. Roy

248

07 Nov 2025

SARCH: Multimodal Search for Archaeological Archives

Saubhagya Singh Bhadouria

Arpan Jain

Maya Ramanath

07 Nov 2025

CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic

Saad Mankarious

Ayah Zirikly

AI4MH

450

05 Nov 2025

Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification

189

05 Nov 2025

Do Methods to Jailbreak and Defend LLMs Generalize Across Languages?

287

01 Nov 2025

SpecAware: A Spectral-Content Aware Foundation Model for Unifying Multi-Sensor Learning in Hyperspectral Remote Sensing Mapping

141

31 Oct 2025

Elastic Architecture Search for Efficient Language ModelsIEEE International Conference on Multimedia and Expo (ICME), 2025

Shang Wang

KELM

175

30 Oct 2025

Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual

Sukrit Sriratanawilai

Jhayahgrit Thongwat

Romrawin Chumpu

Patomporn Payoungkhamdee

Sarana Nutanong

Peerat Limkonchotiwat

VLM

201

30 Oct 2025

Vectorized Context-Aware Embeddings for GAT-Based Collaborative Filtering

Danial Ebrat

Sepideh Ahmadian

Luis Rueda

30 Oct 2025

NetEcho: From Real-World Streaming Side-Channels to Full LLM Conversation Recovery

208

29 Oct 2025

DynaStride: Dynamic Stride Windowing with MMCoT for Instructional Multi-Scene Captioning

201

27 Oct 2025

COOPERA: Continual Open-Ended Human-Robot Assistance

189

27 Oct 2025

Minimizing Human Intervention in Online Classification

152

27 Oct 2025

The Virtues of Brevity: Avoid Overthinking in Parallel Test-Time Reasoning

Raul Cavalcante Dinardi

125

24 Oct 2025

Leveraging semantic similarity for experimentation with AI-generated treatments

187

24 Oct 2025

Vision Language Models for Dynamic Human Activity Recognition in Healthcare Settings

160

24 Oct 2025

The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts

187

23 Oct 2025

Restoring Pruned Large Language Models via Lost Component Compensation

212

22 Oct 2025

Data Efficient Any Transformer-to-Mamba Distillation via Attention Bridge

384

22 Oct 2025

EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval

184

21 Oct 2025

PP3D: An In-Browser Vision-Based Defense Against Web Behavior Manipulation Attacks

114

21 Oct 2025

Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation

157

21 Oct 2025

AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM

245

20 Oct 2025

Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses

200

20 Oct 2025

ImaGGen: Zero-Shot Generation of Co-Speech Semantic Gestures Grounded in Language and Image Input

Hendric Voss

Stefan Kopp

SLR

331

20 Oct 2025

Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report

166

16 Oct 2025

219

15 Oct 2025