ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.10430
  4. Cited By
Pay Less Attention with Lightweight and Dynamic Convolutions
v1v2 (latest)

Pay Less Attention with Lightweight and Dynamic Convolutions

International Conference on Learning Representations (ICLR), 2019
29 January 2019
Felix Wu
Angela Fan
Alexei Baevski
Yann N. Dauphin
Michael Auli
ArXiv (abs)PDFHTML

Papers citing "Pay Less Attention with Lightweight and Dynamic Convolutions"

50 / 337 papers shown
Title
Variational Neural Machine Translation with Normalizing Flows
Variational Neural Machine Translation with Normalizing FlowsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Hendra Setiawan
Matthias Sperber
Udhay Nallasamy
Matthias Paulik
DRL
121
13
0
28 May 2020
Normalized Attention Without Probability Cage
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
222
22
0
19 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
650
3,739
0
16 May 2020
Rethinking and Improving Natural Language Generation with Layer-Wise
  Multi-View Decoding
Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding
Fenglin Liu
Xuancheng Ren
Guangxiang Zhao
Chenyu You
Xuewei Ma
Xian Wu
Xu Sun
398
2
0
16 May 2020
Hierarchical Attention Transformer Architecture For Syntactic Spell
  Correction
Hierarchical Attention Transformer Architecture For Syntactic Spell Correction
Abhishek Niranjan
B. Shaik
K. Verma
64
2
0
11 May 2020
Synthesizer: Rethinking Self-Attention in Transformer Models
Synthesizer: Rethinking Self-Attention in Transformer ModelsInternational Conference on Machine Learning (ICML), 2020
Yi Tay
Dara Bahri
Donald Metzler
Da-Cheng Juan
Zhe Zhao
Che Zheng
249
379
0
02 May 2020
Hard-Coded Gaussian Attention for Neural Machine Translation
Hard-Coded Gaussian Attention for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Weiqiu You
Simeng Sun
Mohit Iyyer
225
71
0
02 May 2020
POINTER: Constrained Progressive Text Generation via Insertion-based
  Generative Pre-training
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-trainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Yizhe Zhang
Guoyin Wang
Chunyuan Li
Zhe Gan
Chris Brockett
Bill Dolan
215
30
0
01 May 2020
Exploring Self-attention for Image Recognition
Exploring Self-attention for Image RecognitionComputer Vision and Pattern Recognition (CVPR), 2020
Hengshuang Zhao
Jiaya Jia
V. Koltun
SSL
245
877
0
28 Apr 2020
Lite Transformer with Long-Short Range Attention
Lite Transformer with Long-Short Range AttentionInternational Conference on Learning Representations (ICLR), 2020
Zhanghao Wu
Zhijian Liu
Ji Lin
Chengyue Wu
Song Han
156
357
0
24 Apr 2020
DyNet: Dynamic Convolution for Accelerating Convolutional Neural
  Networks
DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks
Yikang Zhang
Jian Zhang
Qiang-qiang Wang
Zhaobai Zhong
176
104
0
22 Apr 2020
Understanding the Difficulty of Training Transformers
Understanding the Difficulty of Training TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
AI4CE
227
282
0
17 Apr 2020
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Highway Transformer: Self-Gating Enhanced Self-Attentive NetworksAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Yekun Chai
Jin Shuo
Xinwen Hou
252
21
0
17 Apr 2020
Transform and Tell: Entity-Aware News Image Captioning
Transform and Tell: Entity-Aware News Image CaptioningComputer Vision and Pattern Recognition (CVPR), 2020
Alasdair Tran
A. Mathews
Lexing Xie
VLM
169
108
0
17 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Training with Quantization Noise for Extreme Model CompressionInternational Conference on Learning Representations (ICLR), 2020
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Edouard Grave
Armand Joulin
MQ
239
256
0
15 Apr 2020
Neural Machine Translation: Challenges, Progress and Future
Neural Machine Translation: Challenges, Progress and FutureScience China Technological Sciences (Sci China Technol Sci), 2020
Jiajun Zhang
Chengqing Zong
146
58
0
13 Apr 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALMVLM
612
4,805
0
10 Apr 2020
Exploring Versatile Generative Language Model Via Parameter-Efficient
  Transfer Learning
Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer LearningFindings (Findings), 2020
Mohammad Kachuee
Andrea Madotto
Pascale Fung
361
183
0
08 Apr 2020
Aligned Cross Entropy for Non-Autoregressive Machine Translation
Aligned Cross Entropy for Non-Autoregressive Machine TranslationInternational Conference on Machine Learning (ICML), 2020
Marjan Ghazvininejad
Vladimir Karpukhin
Luke Zettlemoyer
Omer Levy
166
120
0
03 Apr 2020
Abstractive Text Summarization based on Language Model Conditioning and
  Locality Modeling
Abstractive Text Summarization based on Language Model Conditioning and Locality ModelingInternational Conference on Language Resources and Evaluation (LREC), 2020
Dmitrii Aksenov
J. Moreno-Schneider
Peter Bourgonje
Robert Schwarzenberg
Leonhard Hennig
Georg Rehm
174
31
0
29 Mar 2020
Probing Word Translations in the Transformer and Trading Decoder for
  Encoder Layers
Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers
Hongfei Xu
Josef van Genabith
Qiuhui Liu
Deyi Xiong
115
3
0
21 Mar 2020
PowerNorm: Rethinking Batch Normalization in Transformers
PowerNorm: Rethinking Batch Normalization in TransformersInternational Conference on Machine Learning (ICML), 2020
Sheng Shen
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
BDL
234
17
0
17 Mar 2020
Revisit Systematic Generalization via Meaningful Learning
Revisit Systematic Generalization via Meaningful LearningBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2020
Ning Shi
Wei Ping
Wei Wang
Xiangyu Liu
Zhouhan Lin
428
2
0
14 Mar 2020
Meta-Embeddings Based On Self-Attention
Meta-Embeddings Based On Self-Attention
Qichen Li
Xiaoke Jiang
Jun Xia
Jian Li
135
2
0
03 Mar 2020
Transformer++
Transformer++
Prakhar Thapak
P. Hore
98
0
0
02 Mar 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT worksTransactions of the Association for Computational Linguistics (TACL), 2020
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
393
1,690
0
27 Feb 2020
On Feature Normalization and Data Augmentation
On Feature Normalization and Data AugmentationComputer Vision and Pattern Recognition (CVPR), 2020
Boyi Li
Felix Wu
Ser-Nam Lim
Serge J. Belongie
Kilian Q. Weinberger
205
154
0
25 Feb 2020
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine
  Translation
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine TranslationFindings (Findings), 2020
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
331
96
0
24 Feb 2020
Tree-structured Attention with Hierarchical Accumulation
Tree-structured Attention with Hierarchical AccumulationInternational Conference on Learning Representations (ICLR), 2020
Xuan-Phi Nguyen
Shafiq Joty
Guosheng Lin
R. Socher
137
77
0
19 Feb 2020
Low-Rank Bottleneck in Multi-head Attention Models
Low-Rank Bottleneck in Multi-head Attention ModelsInternational Conference on Machine Learning (ICML), 2020
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
155
121
0
17 Feb 2020
Incorporating BERT into Neural Machine Translation
Incorporating BERT into Neural Machine TranslationInternational Conference on Learning Representations (ICLR), 2020
Jinhua Zhu
Ziheng Lu
Lijun Wu
Di He
Tao Qin
Wen-gang Zhou
Houqiang Li
Tie-Yan Liu
FedMLAIMat
200
381
0
17 Feb 2020
Time-aware Large Kernel Convolutions
Time-aware Large Kernel ConvolutionsInternational Conference on Machine Learning (ICML), 2020
Vasileios Lioutas
Yuhong Guo
AI4TS
213
30
0
08 Feb 2020
Towards the Systematic Reporting of the Energy and Carbon Footprints of
  Machine Learning
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning
Peter Henderson
Jie Hu
Joshua Romoff
Emma Brunskill
Dan Jurafsky
Joelle Pineau
266
563
0
31 Jan 2020
Semi-Autoregressive Training Improves Mask-Predict Decoding
Semi-Autoregressive Training Improves Mask-Predict Decoding
Marjan Ghazvininejad
Omer Levy
Luke Zettlemoyer
144
72
0
23 Jan 2020
Normalization of Input-output Shared Embeddings in Text Generation
  Models
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu
Yujia Zhai
Zizhong Chen
121
0
0
22 Jan 2020
Non-Autoregressive Machine Translation with Disentangled Context
  Transformer
Non-Autoregressive Machine Translation with Disentangled Context TransformerInternational Conference on Machine Learning (ICML), 2020
Jungo Kasai
James Cross
Marjan Ghazvininejad
Jiatao Gu
260
35
0
15 Jan 2020
Is Attention All What You Need? -- An Empirical Investigation on
  Convolution-Based Active Memory and Self-Attention
Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention
Thomas D. Dowdell
Hongyu Zhang
120
4
0
27 Dec 2019
Explicit Sparse Transformer: Concentrated Attention Through Explicit
  Selection
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
154
133
0
25 Dec 2019
Tag-less Back-Translation
Tag-less Back-TranslationMachine Translation (MT), 2019
Idris Abdulmumin
B. Galadanci
Aliyu Dadan Garba
222
12
0
22 Dec 2019
Are Transformers universal approximators of sequence-to-sequence
  functions?
Are Transformers universal approximators of sequence-to-sequence functions?International Conference on Learning Representations (ICLR), 2019
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
290
427
0
20 Dec 2019
Neural Machine Translation: A Review and Survey
Neural Machine Translation: A Review and SurveyJournal of Artificial Intelligence Research (JAIR), 2019
Felix Stahlberg
3DVAI4TSMedIm
327
373
0
04 Dec 2019
SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive
  Summarization
SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive SummarizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Bogdan Gliwa
Iwona Mochol
M. Biesek
A. Wawer
439
743
0
27 Nov 2019
Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for
  Distantly Supervised Relation Extraction
Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation ExtractionAAAI Conference on Artificial Intelligence (AAAI), 2019
Tao Shen
Guodong Long
Tao Shen
Wanrong Zhu
Lina Yao
Huan Huo
Jing Jiang
132
87
0
27 Nov 2019
Iterative Batch Back-Translation for Neural Machine Translation: A Conceptual Model
Idris Abdulmumin
B. Galadanci
Abubakar Isa
120
0
0
26 Nov 2019
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Guangxiang Zhao
Xu Sun
Jingjing Xu
Zhiyuan Zhang
Liangchen Luo
LRM
151
59
0
17 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
Compressive Transformers for Long-Range Sequence ModellingInternational Conference on Learning Representations (ICLR), 2019
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALMVLMKELM
263
760
0
13 Nov 2019
Two-Headed Monster And Crossed Co-Attention Networks
Two-Headed Monster And Crossed Co-Attention Networks
Yaoyiran Li
Jing Jiang
130
0
0
10 Nov 2019
Distilling Knowledge Learned in BERT for Text Generation
Distilling Knowledge Learned in BERT for Text Generation
Yen-Chun Chen
Zhe Gan
Yu Cheng
Jingzhou Liu
Jingjing Liu
225
31
0
10 Nov 2019
Data Diversification: A Simple Strategy For Neural Machine Translation
Data Diversification: A Simple Strategy For Neural Machine Translation
Xuan-Phi Nguyen
Shafiq Joty
Wu Kui
Ai Ti Aw
348
16
0
05 Nov 2019
Exploring Kernel Functions in the Softmax Layer for Contextual Word
  Classification
Exploring Kernel Functions in the Softmax Layer for Contextual Word ClassificationInternational Workshop on Spoken Language Translation (IWSLT), 2019
Yingbo Gao
Christian Herold
Weiyue Wang
Hermann Ney
155
4
0
28 Oct 2019
Previous
1234567
Next