ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.02984
  4. Cited By
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

6 April 2020
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
    MQ
ArXivPDFHTML

Papers citing "MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices"

50 / 145 papers shown
Title
Efficiently Scaling Transformer Inference
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
34
295
0
09 Nov 2022
Gradient Knowledge Distillation for Pre-trained Language Models
Gradient Knowledge Distillation for Pre-trained Language Models
Lean Wang
Lei Li
Xu Sun
VLM
23
5
0
02 Nov 2022
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency
  with Slenderized Multi-exit Language Models
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
Bowen Shen
Zheng Lin
Yuanxin Liu
Zhengxiao Liu
Lei Wang
Weiping Wang
VLM
47
4
0
27 Oct 2022
Augmentation with Projection: Towards an Effective and Efficient Data
  Augmentation Paradigm for Distillation
Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation
Ziqi Wang
Yuexin Wu
Frederick Liu
Daogao Liu
Le Hou
Hongkun Yu
Jing Li
Heng Ji
37
5
0
21 Oct 2022
Efficiently Controlling Multiple Risks with Pareto Testing
Efficiently Controlling Multiple Risks with Pareto Testing
Bracha Laufer-Goldshtein
Adam Fisch
Regina Barzilay
Tommi Jaakkola
36
16
0
14 Oct 2022
InFi: End-to-End Learning to Filter Input for Resource-Efficiency in
  Mobile-Centric Inference
InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference
Mu Yuan
Lan Zhang
Fengxiang He
Xueting Tong
Miao-Hui Song
Zhengyuan Xu
Xiang-Yang Li
32
2
0
28 Sep 2022
Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity
  Matching
Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching
Kunbo Ding
Weijie Liu
Yuejian Fang
Zhe Zhao
Qi Ju
Xuefeng Yang
23
1
0
13 Sep 2022
Activity report analysis with automatic single or multispan answer
  extraction
Activity report analysis with automatic single or multispan answer extraction
R. Choudhary
A. Sridhar
Erik M. Visser
18
1
0
09 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
30
109
0
31 Aug 2022
PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model
  Adaptation
PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
VLM
CLL
32
41
0
22 Aug 2022
Building an Efficiency Pipeline: Commutativity and Cumulativeness of
  Efficiency Operators for Transformers
Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers
Ji Xin
Raphael Tang
Zhiying Jiang
Yaoliang Yu
Jimmy J. Lin
18
1
0
31 Jul 2022
Device-Cloud Collaborative Recommendation via Meta Controller
Device-Cloud Collaborative Recommendation via Meta Controller
Jiangchao Yao
Feng Wang
Xichen Ding
Shaohu Chen
Bo Han
Jingren Zhou
Hongxia Yang
30
17
0
07 Jul 2022
Knowledge Distillation of Transformer-based Language Models Revisited
Knowledge Distillation of Transformer-based Language Models Revisited
Chengqiang Lu
Jianwei Zhang
Yunfei Chu
Zhengyu Chen
Jingren Zhou
Fei Wu
Haiqing Chen
Hongxia Yang
VLM
27
10
0
29 Jun 2022
All Mistakes Are Not Equal: Comprehensive Hierarchy Aware Multi-label
  Predictions (CHAMP)
All Mistakes Are Not Equal: Comprehensive Hierarchy Aware Multi-label Predictions (CHAMP)
A. Vaswani
Gaurav Aggarwal
Praneeth Netrapalli
N. Hegde
22
4
0
17 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for
  Large-Scale Transformers
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
50
442
0
04 Jun 2022
A Closer Look at Self-Supervised Lightweight Vision Transformers
A Closer Look at Self-Supervised Lightweight Vision Transformers
Shaoru Wang
Jin Gao
Zeming Li
Jian Sun
Weiming Hu
ViT
67
41
0
28 May 2022
A Fast Attention Network for Joint Intent Detection and Slot Filling on
  Edge Devices
A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices
Liang Huang
Senjie Liang
Feiyang Ye
Nan Gao
57
4
0
16 May 2022
Chemical transformer compression for accelerating both training and
  inference of molecular modeling
Chemical transformer compression for accelerating both training and inference of molecular modeling
Yi Yu
K. Börjesson
24
0
0
16 May 2022
Adaptable Adapters
Adaptable Adapters
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
50
21
0
03 May 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
31
22
0
22 Apr 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
  Adaptation
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
22
38
0
15 Apr 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge
  Distillation
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
33
6
0
15 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
28
123
0
14 Apr 2022
Redwood: Using Collision Detection to Grow a Large-Scale Intent
  Classification Dataset
Redwood: Using Collision Detection to Grow a Large-Scale Intent Classification Dataset
Stefan Larson
Kevin Leach
24
9
0
12 Apr 2022
Searching for Efficient Neural Architectures for On-Device ML on Edge
  TPUs
Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs
Berkin Akin
Suyog Gupta
Yun Long
Anton Spiridonov
Zhuo Wang
Marie White
Haonan Xu
Ping Zhou
Yanqi Zhou
24
9
0
09 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
9
177
0
01 Apr 2022
Compression of Generative Pre-trained Language Models via Quantization
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
38
103
0
21 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through
  Dynamically Pruned Multi-Head Self-Attention
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
28
5
0
20 Mar 2022
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Zhaodong Chen
Yuying Quan
Zheng Qu
L. Liu
Yufei Ding
Yuan Xie
33
22
0
28 Feb 2022
Short-answer scoring with ensembles of pretrained language models
Short-answer scoring with ensembles of pretrained language models
Christopher M. Ormerod
36
8
0
23 Feb 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
45
212
0
16 Feb 2022
Constrained Optimization with Dynamic Bound-scaling for Effective
  NLPBackdoor Defense
Constrained Optimization with Dynamic Bound-scaling for Effective NLPBackdoor Defense
Guangyu Shen
Yingqi Liu
Guanhong Tao
Qiuling Xu
Zhuo Zhang
Shengwei An
Shiqing Ma
Xinming Zhang
AAML
18
33
0
11 Feb 2022
pNLP-Mixer: an Efficient all-MLP Architecture for Language
pNLP-Mixer: an Efficient all-MLP Architecture for Language
Francesco Fusco
Damian Pascual
Peter W. J. Staar
Diego Antognini
37
29
0
09 Feb 2022
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an
  Application to Question Answering Systems
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
Yoshitomo Matsubara
Luca Soldaini
Eric Lind
Alessandro Moschitti
26
6
0
15 Jan 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
  Power Next-Generation AI Scale
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
35
284
0
14 Jan 2022
Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program
  Code Generation
Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation
Perry Gibson
José Cano
26
12
0
14 Jan 2022
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training
  for Language Understanding and Generation
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
39
79
0
23 Dec 2021
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers
Andrei-Marius Avram
Darius Catrina
Dumitru-Clementin Cercel
Mihai Dascualu
Traian Rebedea
Vasile Puaics
Dan Tufics
22
12
0
23 Dec 2021
Pruning Pretrained Encoders with a Multitask Objective
Pruning Pretrained Encoders with a Multitask Objective
Patrick Xia
Richard Shin
44
0
0
10 Dec 2021
VIRT: Improving Representation-based Models for Text Matching through
  Virtual Interaction
VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction
Dan Li
Yang Yang
Hongyin Tang
Jingang Wang
Tong Xu
Wei Yu Wu
Enhong Chen
27
7
0
08 Dec 2021
NN-LUT: Neural Approximation of Non-Linear Operations for Efficient
  Transformer Inference
NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference
Joonsang Yu
Junki Park
Seongmin Park
Minsoo Kim
Sihwa Lee
Dong Hyun Lee
Jungwook Choi
35
48
0
03 Dec 2021
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Shota Orihashi
Yoshihiro Yamazaki
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Ryo Masumura
17
0
0
22 Nov 2021
Character-level HyperNetworks for Hate Speech Detection
Character-level HyperNetworks for Hate Speech Detection
Tomer Wullach
A. Adler
Einat Minkov
16
12
0
11 Nov 2021
Prune Once for All: Sparse Pre-Trained Language Models
Prune Once for All: Sparse Pre-Trained Language Models
Ofir Zafrir
Ariel Larey
Guy Boudoukh
Haihao Shen
Moshe Wasserblat
VLM
34
82
0
10 Nov 2021
NxMTransformer: Semi-Structured Sparsification for Natural Language
  Understanding via ADMM
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Connor Holmes
Minjia Zhang
Yuxiong He
Bo Wu
31
18
0
28 Oct 2021
Vis-TOP: Visual Transformer Overlay Processor
Vis-TOP: Visual Transformer Overlay Processor
Wei Hu
Dian Xu
Zimeng Fan
Fang Liu
Yanxiang He
BDL
ViT
25
5
0
21 Oct 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger
  Student Models
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
34
5
0
16 Oct 2021
Kronecker Decomposition for GPT Compression
Kronecker Decomposition for GPT Compression
Ali Edalati
Marzieh S. Tahaei
Ahmad Rashid
V. Nia
J. Clark
Mehdi Rezagholizadeh
36
33
0
15 Oct 2021
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Xiangyang Liu
Tianxiang Sun
Junliang He
Jiawen Wu
Lingling Wu
Xinyu Zhang
Hao Jiang
Bo Zhao
Xuanjing Huang
Xipeng Qiu
ELM
28
46
0
13 Oct 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
27
117
0
05 Oct 2021
Previous
123
Next