Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.03092
Cited By
Modeling Recurrence for Transformer
5 April 2019
Jie Hao
Xing Wang
Baosong Yang
Longyue Wang
Jinfeng Zhang
Zhaopeng Tu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Modeling Recurrence for Transformer"
50 / 52 papers shown
Title
Integrating Vision and Location with Transformers: A Multimodal Deep Learning Framework for Medical Wound Analysis
Ramin Mousa
Hadis Taherinia
Khabiba Abdiyeva
Amir Ali Bengari
Mohammadmahdi Vahediahmar
32
1
0
14 Apr 2025
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Gabriel Lindenmaier
Sean Papay
Sebastian Padó
51
0
0
02 Feb 2025
Neuro-symbolic Learning Yielding Logical Constraints
Zenan Li
Yunpeng Huang
Zhaoyu Li
Yuan Yao
Jingwei Xu
Taolue Chen
Xiaoxing Ma
Jian Lu
NAI
47
5
0
28 Oct 2024
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
Kaiyue Wen
Xingyu Dang
Kaifeng Lyu
44
24
0
28 Feb 2024
Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation
J. Hu
Roberto Cavicchioli
Giulia Berardinelli
Alessandro Capotondi
36
2
0
26 Dec 2023
Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer
Zhun Yang
Adam Ishay
Joohyung Lee
23
9
0
10 Jul 2023
Understanding Parameter Sharing in Transformers
Ye Lin
Mingxuan Wang
Zhexi Zhang
Xiaohui Wang
Tong Xiao
Jingbo Zhu
MoE
11
2
0
15 Jun 2023
On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation
Changtong Zan
Liang Ding
Li Shen
Yu Cao
Weifeng Liu
Dacheng Tao
21
21
0
07 Sep 2022
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
23
21
0
13 Aug 2022
Exploring the sequence length bottleneck in the Transformer for Image Captioning
Jiapeng Hu
Roberto Cavicchioli
Alessandro Capotondi
ViT
33
3
0
07 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
19
142
0
06 Jul 2022
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
33
27
0
30 May 2022
Simple Recurrence Improves Masked Language Models
Tao Lei
Ran Tian
Jasmijn Bastings
Ankur P. Parikh
77
4
0
23 May 2022
Implicit N-grams Induced by Recurrence
Xiaobing Sun
Wei Lu
19
3
0
05 May 2022
Hysteretic Behavior Simulation Based on Pyramid Neural Network:Principle, Network Architecture, Case Study and Explanation
Yongjia Xu
Xinzheng Lu
Yifan Fei
Yuli Huang
15
13
0
29 Apr 2022
Transformers in Time-series Analysis: A Tutorial
Sabeen Ahmed
Ian E. Nielsen
Aakash Tripathi
Shamoon Siddiqui
Ghulam Rasool
R. Ramachandran
AI4TS
25
142
0
28 Apr 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
36
149
0
27 Apr 2022
BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input Representation
Zheng-Wei Zhang
Liang Ding
Dazhao Cheng
Xuebo Liu
Min Zhang
Dacheng Tao
22
11
0
16 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Staircase Attention for Recurrent Processing of Sequences
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
16
11
0
08 Jun 2021
Self-Attention Networks Can Process Bounded Hierarchical Languages
Shunyu Yao
Binghui Peng
Christos H. Papadimitriou
Karthik Narasimhan
23
73
0
24 May 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
21
175
0
31 Mar 2021
Mask Attention Networks: Rethinking and Strengthen Transformer
Zhihao Fan
Yeyun Gong
Dayiheng Liu
Zhongyu Wei
Siyuan Wang
Jian Jiao
Nan Duan
Ruofei Zhang
Xuanjing Huang
18
72
0
25 Mar 2021
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
19
348
0
03 Mar 2021
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Tao Lei
RALM
VLM
51
47
0
24 Feb 2021
To Understand Representation of Layer-aware Sequence Encoders as Multi-order-graph
Sufeng Duan
Hai Zhao
MILM
22
0
0
16 Jan 2021
Self-Paced Learning for Neural Machine Translation
Yu Wan
Baosong Yang
Derek F. Wong
Yikai Zhou
Lidia S. Chao
Haibo Zhang
Boxing Chen
54
49
0
09 Oct 2020
The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction
Alice Martin
Charles Ollion
Florian Strub
Sylvain Le Corff
Olivier Pietquin
22
6
0
15 Jul 2020
I-BERT: Inductive Generalization of Transformer to Arbitrary Context Lengths
Hyoungwook Nam
S. Seo
Vikram Sharma Malithody
Noor Michael
Lang Li
11
1
0
18 Jun 2020
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
25
63
0
16 Jun 2020
Transferring Inductive Biases through Knowledge Distillation
Samira Abnar
Mostafa Dehghani
Willem H. Zuidema
25
57
0
31 May 2020
A Mixture of
h
−
1
h-1
h
−
1
Heads is Better than
h
h
h
Heads
Hao Peng
Roy Schwartz
Dianqi Li
Noah A. Smith
MoE
19
32
0
13 May 2020
How Does Selective Mechanism Improve Self-Attention Networks?
Xinwei Geng
Longyue Wang
Xing Wang
Bing Qin
Ting Liu
Zhaopeng Tu
AAML
34
35
0
03 May 2020
Capsule-Transformer for Neural Machine Translation
Sufeng Duan
Juncheng Cao
Hai Zhao
MedIm
11
4
0
30 Apr 2020
Self-Attention with Cross-Lingual Position Representation
Liang Ding
Longyue Wang
Dacheng Tao
MILM
17
37
0
28 Apr 2020
Explicit Reordering for Neural Machine Translation
Kehai Chen
Rui Wang
Masao Utiyama
Eiichiro Sumita
MILM
11
3
0
08 Apr 2020
Modeling Future Cost for Neural Machine Translation
Chaoqun Duan
Kehai Chen
Rui Wang
Masao Utiyama
Eiichiro Sumita
Conghui Zhu
T. Zhao
AI4TS
12
15
0
28 Feb 2020
Addressing Some Limitations of Transformers with Feedback Memory
Angela Fan
Thibaut Lavril
Edouard Grave
Armand Joulin
Sainbayar Sukhbaatar
13
11
0
21 Feb 2020
A Survey of Deep Learning Techniques for Neural Machine Translation
Shu Yang
Yuxin Wang
X. Chu
VLM
AI4TS
AI4CE
15
138
0
18 Feb 2020
Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention
Thomas D. Dowdell
Hongyu Zhang
16
4
0
27 Dec 2019
Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling
Yu Wan
Baosong Yang
Derek F. Wong
Lidia S. Chao
Haihua Du
B. Ao
17
21
0
11 Dec 2019
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DV
AI4TS
MedIm
15
311
0
04 Dec 2019
Two-Headed Monster And Crossed Co-Attention Networks
Yaoyiran Li
Jing Jiang
19
0
0
10 Nov 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
56
6,370
0
26 Sep 2019
Multi-Granularity Self-Attention for Neural Machine Translation
Jie Hao
Xing Wang
Shuming Shi
Jinfeng Zhang
Zhaopeng Tu
MILM
9
48
0
05 Sep 2019
Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons
Jie Hao
Xing Wang
Shuming Shi
Jinfeng Zhang
Zhaopeng Tu
23
12
0
04 Sep 2019
Self-Attention with Structural Position Representations
Xing Wang
Zhaopeng Tu
Longyue Wang
Shuming Shi
MILM
12
72
0
01 Sep 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
13
259
0
16 Jun 2019
Exploiting Sentential Context for Neural Machine Translation
Xing Wang
Zhaopeng Tu
Longyue Wang
Shuming Shi
17
21
0
04 Jun 2019
Assessing the Ability of Self-Attention Networks to Learn Word Order
Baosong Yang
Longyue Wang
Derek F. Wong
Lidia S. Chao
Zhaopeng Tu
11
31
0
03 Jun 2019
1
2
Next