ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.10430
  4. Cited By
Pay Less Attention with Lightweight and Dynamic Convolutions
v1v2 (latest)

Pay Less Attention with Lightweight and Dynamic Convolutions

International Conference on Learning Representations (ICLR), 2019
29 January 2019
Felix Wu
Angela Fan
Alexei Baevski
Yann N. Dauphin
Michael Auli
ArXiv (abs)PDFHTML

Papers citing "Pay Less Attention with Lightweight and Dynamic Convolutions"

50 / 337 papers shown
Title
Enable Deep Learning on Mobile Devices: Methods, Systems, and
  Applications
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Han Cai
Ji Lin
Chengyue Wu
Zhijian Liu
Haotian Tang
Hanrui Wang
Ligeng Zhu
Song Han
226
132
0
25 Apr 2022
BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input
  Representation
BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input Representation
Zheng Zhang
Liang Ding
Dazhao Cheng
Xuebo Liu
Min Zhang
Dacheng Tao
139
11
0
16 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It StopsInternational Conference on Language Resources and Evaluation (LREC), 2022
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
253
9
0
11 Apr 2022
MAESTRO: Matched Speech Text Representations through Modality Matching
MAESTRO: Matched Speech Text Representations through Modality MatchingInterspeech (Interspeech), 2022
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Pedro J. Moreno
Ankur Bapna
Heiga Zen
213
119
0
07 Apr 2022
Paying More Attention to Self-attention: Improving Pre-trained Language
  Models via Attention Guiding
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Zhaochun Ren
Huasheng Liang
Qiang Yan
Sudipta Singha Roy
116
10
0
06 Apr 2022
MaxViT: Multi-Axis Vision Transformer
MaxViT: Multi-Axis Vision TransformerEuropean Conference on Computer Vision (ECCV), 2022
Zhengzhong Tu
Hossein Talebi
Han Zhang
Feng Yang
P. Milanfar
A. Bovik
Yinxiao Li
ViT
459
867
0
04 Apr 2022
COOL, a Context Outlooker, and its Application to Question Answering and
  other Natural Language Processing Tasks
COOL, a Context Outlooker, and its Application to Question Answering and other Natural Language Processing TasksInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Fangyi Zhu
See-Kiong Ng
S. Bressan
LRM
133
1
0
01 Apr 2022
Logit Normalization for Long-tail Object Detection
Logit Normalization for Long-tail Object DetectionInternational Journal of Computer Vision (IJCV), 2022
Liang Zhao
Yao Teng
Limin Wang
213
17
0
31 Mar 2022
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNsComputer Vision and Pattern Recognition (CVPR), 2022
Xiaohan Ding
Xinming Zhang
Yi Zhou
Jungong Han
Guiguang Ding
Jian Sun
VLM
313
662
0
13 Mar 2022
Look Backward and Forward: Self-Knowledge Distillation with
  Bidirectional Decoder for Neural Machine Translation
Look Backward and Forward: Self-Knowledge Distillation with Bidirectional Decoder for Neural Machine Translation
Xuan Zhang
Libin Shen
Disheng Pan
Liangguo Wang
Yanjun Miao
163
1
0
10 Mar 2022
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq
  Generation
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Tao Ge
Si-Qing Chen
Furu Wei
MoE
284
28
0
16 Feb 2022
General-purpose, long-context autoregressive modeling with Perceiver AR
General-purpose, long-context autoregressive modeling with Perceiver ARInternational Conference on Machine Learning (ICML), 2022
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
208
74
0
15 Feb 2022
Improving Neural Machine Translation by Denoising Training
Improving Neural Machine Translation by Denoising Training
Liang Ding
Keqin Peng
Dacheng Tao
VLMAI4CE
185
6
0
19 Jan 2022
PAEG: Phrase-level Adversarial Example Generation for Neural Machine
  Translation
PAEG: Phrase-level Adversarial Example Generation for Neural Machine TranslationInternational Conference on Computational Linguistics (COLING), 2022
Juncheng Wan
Jian Yang
Shuming Ma
Dongdong Zhang
Weinan Zhang
Yong Yu
Zhoujun Li
SILMAAML
193
5
0
06 Jan 2022
GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD
  Drawings
GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD DrawingsComputer Vision and Pattern Recognition (CVPR), 2022
Zhaohua Zheng
Jianfang Li
Lingjie Zhu
Honghua Li
F. Petzold
Ping Tan
119
22
0
03 Jan 2022
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Anirudh Thatipelli
Sanath Narayan
Salman Khan
Rao Muhammad Anwer
Fahad Shahbaz Khan
Guohao Li
ViT
235
109
0
09 Dec 2021
3D Medical Point Transformer: Introducing Convolution to Attention
  Networks for Medical Point Cloud Analysis
3D Medical Point Transformer: Introducing Convolution to Attention Networks for Medical Point Cloud Analysis
Jianhui Yu
Chaoyi Zhang
Heng Wang
Dingxin Zhang
Yang Song
Tiange Xiang
Dongnan Liu
Weidong (Tom) Cai
ViTMedIm
179
43
0
09 Dec 2021
OW-DETR: Open-world Detection Transformer
OW-DETR: Open-world Detection Transformer
Akshita Gupta
Sanath Narayan
K. J. Joseph
Salman Khan
Fahad Shahbaz Khan
M. Shah
ViT
215
231
0
02 Dec 2021
FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding
FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding
B. Pung
Alvin Chan
115
0
0
28 Nov 2021
Dynamic Parameterized Network for CTR Prediction
Dynamic Parameterized Network for CTR Prediction
Jian Zhu
Congcong Liu
Pei Wang
Xiwei Zhao
Guangpeng Chen
Junsheng Jin
Changping Peng
Zhangang Lin
Jingping Shao
168
2
0
09 Nov 2021
Mixed Transformer U-Net For Medical Image Segmentation
Mixed Transformer U-Net For Medical Image Segmentation
Hongyi Wang
Shiao Xie
Lanfen Lin
Yutaro Iwamoto
X. Han
Yenwei Chen
Ruofeng Tong
ViTMedIm
136
340
0
08 Nov 2021
Direct Multi-view Multi-person 3D Pose Estimation
Direct Multi-view Multi-person 3D Pose Estimation
Tao Wang
Jianfeng Zhang
Yujun Cai
Shuicheng Yan
Jiashi Feng
3DH
271
118
0
07 Nov 2021
Towards Building ASR Systems for the Next Billion Users
Towards Building ASR Systems for the Next Billion Users
Tahir Javed
Sumanth Doddapaneni
A. Raman
Kaushal Bhogale
Gowtham Ramesh
Anoop Kunchukuttan
Pratyush Kumar
Mitesh M. Khapra
216
69
0
06 Nov 2021
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently Modeling Long Sequences with Structured State SpacesInternational Conference on Learning Representations (ICLR), 2021
Albert Gu
Karan Goel
Christopher Ré
902
2,771
0
31 Oct 2021
Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
Scatterbrain: Unifying Sparse and Low-rank Attention ApproximationNeural Information Processing Systems (NeurIPS), 2021
Beidi Chen
Tri Dao
Eric Winsor
Zhao Song
Atri Rudra
Christopher Ré
152
151
0
28 Oct 2021
Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth
  Analysis using LIWC
Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC
Chanjun Park
Midan Shim
Sugyeong Eo
Seolhwa Lee
Jaehyung Seo
Hyeonseok Moon
Heuiseok Lim
104
8
0
28 Oct 2021
GNN-LM: Language Modeling based on Global Contexts via GNN
GNN-LM: Language Modeling based on Global Contexts via GNN
Yuxian Meng
Shi Zong
Xiaoya Li
Xiaofei Sun
Tianwei Zhang
Leilei Gan
Jiwei Li
LRM
467
45
0
17 Oct 2021
Taming Sparsely Activated Transformer with Stochastic Experts
Taming Sparsely Activated Transformer with Stochastic ExpertsInternational Conference on Learning Representations (ICLR), 2021
Simiao Zuo
Xiaodong Liu
Jian Jiao
Young Jin Kim
Hany Hassan
Ruofei Zhang
T. Zhao
Jianfeng Gao
MoE
223
133
0
08 Oct 2021
The NiuTrans System for WNGT 2020 Efficiency Task
The NiuTrans System for WNGT 2020 Efficiency Task
Chi Hu
Bei Li
Ye Lin
Yinqiao Li
Yanyang Li
Chenglong Wang
Tong Xiao
Jingbo Zhu
78
7
0
16 Sep 2021
Improving Neural Machine Translation by Bidirectional Training
Improving Neural Machine Translation by Bidirectional Training
Liang Ding
Di Wu
Dacheng Tao
137
34
0
16 Sep 2021
RankNAS: Efficient Neural Architecture Search by Pairwise Ranking
RankNAS: Efficient Neural Architecture Search by Pairwise Ranking
Chi Hu
Chenglong Wang
Xiangnan Ma
Xia Meng
Yinqiao Li
Tong Xiao
Jingbo Zhu
Changliang Li
203
13
0
15 Sep 2021
Topic-Aware Contrastive Learning for Abstractive Dialogue Summarization
Topic-Aware Contrastive Learning for Abstractive Dialogue SummarizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Junpeng Liu
Yanyan Zou
Hainan Zhang
Hongshen Chen
Zhuoye Ding
Caixia Yuan
Caixia Yuan
95
70
0
10 Sep 2021
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural
  Machine Translation
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Haoran Xu
Benjamin Van Durme
Kenton W. Murray
238
73
0
09 Sep 2021
Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source
  Pretraining
Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source PretrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Yicheng Zou
Bolin Zhu
Xingwu Hu
Tao Gui
Tao Gui
214
33
0
09 Sep 2021
Survey of Low-Resource Machine Translation
Survey of Low-Resource Machine TranslationComputational Linguistics (CL), 2021
Barry Haddow
Rachel Bawden
Antonio Valerio Miceli Barone
Jindvrich Helcl
Alexandra Birch
AIMat
444
194
0
01 Sep 2021
Lightweight Self-Attentive Sequential Recommendation
Lightweight Self-Attentive Sequential RecommendationInternational Conference on Information and Knowledge Management (CIKM), 2021
Yang Li
Tong Chen
Pengfei Zhang
Hongzhi Yin
HAIAI4TS
128
121
0
25 Aug 2021
Discriminative Region-based Multi-Label Zero-Shot Learning
Discriminative Region-based Multi-Label Zero-Shot Learning
Sanath Narayan
Akshita Gupta
Salman Khan
Fahad Shahbaz Khan
Ling Shao
M. Shah
VLM
187
55
0
20 Aug 2021
An Attention Module for Convolutional Neural Networks
An Attention Module for Convolutional Neural Networks
Zhu Baozhou
P. Hofstee
Jinho Lee
Zaid Al-Ars
181
32
0
18 Aug 2021
Adaptive Graph Convolution for Point Cloud Analysis
Adaptive Graph Convolution for Point Cloud Analysis
Hao Zhou
Yidan Feng
Mingsheng Fang
Mingqiang Wei
J. Qin
Tong Lu
3DPC
186
166
0
18 Aug 2021
Fast Convergence of DETR with Spatially Modulated Co-Attention
Fast Convergence of DETR with Spatially Modulated Co-AttentionIEEE International Conference on Computer Vision (ICCV), 2021
Shiyang Feng
Minghang Zheng
Xiaogang Wang
Jifeng Dai
Jiaming Song
ViT
229
363
0
05 Aug 2021
Dialogue Summarization with Supporting Utterance Flow Modeling and Fact
  Regularization
Dialogue Summarization with Supporting Utterance Flow Modeling and Fact Regularization
Wang Chen
Pijian Li
Hou Pong Chan
Irwin King
HILMAI4TS
141
10
0
03 Aug 2021
Dynamic Convolution for 3D Point Cloud Instance Segmentation
Dynamic Convolution for 3D Point Cloud Instance SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Tong He
Chunhua Shen
Anton Van Den Hengel
3DPC
221
17
0
18 Jul 2021
A Survey on Deep Domain Adaptation and Tiny Object Detection Challenges,
  Techniques and Datasets
A Survey on Deep Domain Adaptation and Tiny Object Detection Challenges, Techniques and Datasets
Muhammed Muzammul
Xi Li
ObjD
214
18
0
16 Jul 2021
AutoBERT-Zero: Evolving BERT Backbone from Scratch
AutoBERT-Zero: Evolving BERT Backbone from ScratchAAAI Conference on Artificial Intelligence (AAAI), 2021
Jiahui Gao
Hang Xu
Han Shi
Xiaozhe Ren
Philip L. H. Yu
Xiaodan Liang
Xin Jiang
Zhenguo Li
141
39
0
15 Jul 2021
Visual Parser: Representing Part-whole Hierarchies with Transformers
Visual Parser: Representing Part-whole Hierarchies with Transformers
Shuyang Sun
Xiaoyu Yue
S. Bai
Juil Sock
207
28
0
13 Jul 2021
ABD-Net: Attention Based Decomposition Network for 3D Point Cloud
  Decomposition
ABD-Net: Attention Based Decomposition Network for 3D Point Cloud Decomposition
Siddharth Katageri
S. V. Kudari
Akshaykumar Gunari
R. Tabib
U. Mudenagudi
3DPC
128
5
0
09 Jul 2021
A Survey on Dialogue Summarization: Recent Advances and New Frontiers
A Survey on Dialogue Summarization: Recent Advances and New Frontiers
Xiachong Feng
Xiaocheng Feng
Bing Qin
257
113
0
07 Jul 2021
Learning Geometric Combinatorial Optimization Problems using
  Self-attention and Domain Knowledge
Learning Geometric Combinatorial Optimization Problems using Self-attention and Domain Knowledge
Jaeseung Lee
Woojin Choi
Jibum Kim
3DPC
127
0
0
05 Jul 2021
Language Models are Good Translators
Language Models are Good Translators
Shuo Wang
Zhaopeng Tu
Zhixing Tan
Wenxuan Wang
Maosong Sun
Yang Liu
148
22
0
25 Jun 2021
VOLO: Vision Outlooker for Visual Recognition
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
359
372
0
24 Jun 2021
Previous
1234567
Next