ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.03092
  4. Cited By
Modeling Recurrence for Transformer

Modeling Recurrence for Transformer

5 April 2019
Jie Hao
Xing Wang
Baosong Yang
Longyue Wang
Jinfeng Zhang
Zhaopeng Tu
ArXivPDFHTML

Papers citing "Modeling Recurrence for Transformer"

50 / 52 papers shown
Title
Integrating Vision and Location with Transformers: A Multimodal Deep Learning Framework for Medical Wound Analysis
Integrating Vision and Location with Transformers: A Multimodal Deep Learning Framework for Medical Wound Analysis
Ramin Mousa
Hadis Taherinia
Khabiba Abdiyeva
Amir Ali Bengari
Mohammadmahdi Vahediahmar
32
1
0
14 Apr 2025
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Gabriel Lindenmaier
Sean Papay
Sebastian Padó
51
0
0
02 Feb 2025
Neuro-symbolic Learning Yielding Logical Constraints
Neuro-symbolic Learning Yielding Logical Constraints
Zenan Li
Yunpeng Huang
Zhaoyu Li
Yuan Yao
Jingwei Xu
Taolue Chen
Xiaoxing Ma
Jian Lu
NAI
47
5
0
28 Oct 2024
RNNs are not Transformers (Yet): The Key Bottleneck on In-context
  Retrieval
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
Kaiyue Wen
Xingyu Dang
Kaifeng Lyu
44
24
0
28 Feb 2024
Heterogeneous Encoders Scaling In The Transformer For Neural Machine
  Translation
Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation
J. Hu
Roberto Cavicchioli
Giulia Berardinelli
Alessandro Capotondi
36
2
0
26 Dec 2023
Learning to Solve Constraint Satisfaction Problems with Recurrent
  Transformer
Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer
Zhun Yang
Adam Ishay
Joohyung Lee
23
9
0
10 Jul 2023
Understanding Parameter Sharing in Transformers
Understanding Parameter Sharing in Transformers
Ye Lin
Mingxuan Wang
Zhexi Zhang
Xiaohui Wang
Tong Xiao
Jingbo Zhu
MoE
11
2
0
15 Jun 2023
On the Complementarity between Pre-Training and Random-Initialization
  for Resource-Rich Machine Translation
On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation
Changtong Zan
Liang Ding
Li Shen
Yu Cao
Weifeng Liu
Dacheng Tao
21
21
0
07 Sep 2022
Exploiting Multiple Sequence Lengths in Fast End to End Training for
  Image Captioning
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
23
21
0
13 Aug 2022
Exploring the sequence length bottleneck in the Transformer for Image
  Captioning
Exploring the sequence length bottleneck in the Transformer for Image Captioning
Jiapeng Hu
Roberto Cavicchioli
Alessandro Capotondi
ViT
33
3
0
07 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
19
142
0
06 Jul 2022
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language
  Understanding and Generation
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
33
27
0
30 May 2022
Simple Recurrence Improves Masked Language Models
Simple Recurrence Improves Masked Language Models
Tao Lei
Ran Tian
Jasmijn Bastings
Ankur P. Parikh
77
4
0
23 May 2022
Implicit N-grams Induced by Recurrence
Implicit N-grams Induced by Recurrence
Xiaobing Sun
Wei Lu
19
3
0
05 May 2022
Hysteretic Behavior Simulation Based on Pyramid Neural
  Network:Principle, Network Architecture, Case Study and Explanation
Hysteretic Behavior Simulation Based on Pyramid Neural Network:Principle, Network Architecture, Case Study and Explanation
Yongjia Xu
Xinzheng Lu
Yifan Fei
Yuli Huang
15
13
0
29 Apr 2022
Transformers in Time-series Analysis: A Tutorial
Transformers in Time-series Analysis: A Tutorial
Sabeen Ahmed
Ian E. Nielsen
Aakash Tripathi
Shamoon Siddiqui
Ghulam Rasool
R. Ramachandran
AI4TS
25
142
0
28 Apr 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
36
149
0
27 Apr 2022
BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input
  Representation
BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input Representation
Zheng-Wei Zhang
Liang Ding
Dazhao Cheng
Xuebo Liu
Min Zhang
Dacheng Tao
22
11
0
16 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Staircase Attention for Recurrent Processing of Sequences
Staircase Attention for Recurrent Processing of Sequences
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
16
11
0
08 Jun 2021
Self-Attention Networks Can Process Bounded Hierarchical Languages
Self-Attention Networks Can Process Bounded Hierarchical Languages
Shunyu Yao
Binghui Peng
Christos H. Papadimitriou
Karthik Narasimhan
23
73
0
24 May 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
21
175
0
31 Mar 2021
Mask Attention Networks: Rethinking and Strengthen Transformer
Mask Attention Networks: Rethinking and Strengthen Transformer
Zhihao Fan
Yeyun Gong
Dayiheng Liu
Zhongyu Wei
Siyuan Wang
Jian Jiao
Nan Duan
Ruofei Zhang
Xuanjing Huang
18
72
0
25 Mar 2021
Random Feature Attention
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
19
348
0
03 Mar 2021
When Attention Meets Fast Recurrence: Training Language Models with
  Reduced Compute
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Tao Lei
RALM
VLM
51
47
0
24 Feb 2021
To Understand Representation of Layer-aware Sequence Encoders as
  Multi-order-graph
To Understand Representation of Layer-aware Sequence Encoders as Multi-order-graph
Sufeng Duan
Hai Zhao
MILM
22
0
0
16 Jan 2021
Self-Paced Learning for Neural Machine Translation
Self-Paced Learning for Neural Machine Translation
Yu Wan
Baosong Yang
Derek F. Wong
Yikai Zhou
Lidia S. Chao
Haibo Zhang
Boxing Chen
54
49
0
09 Oct 2020
The Monte Carlo Transformer: a stochastic self-attention model for
  sequence prediction
The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction
Alice Martin
Charles Ollion
Florian Strub
Sylvain Le Corff
Olivier Pietquin
22
6
0
15 Jul 2020
I-BERT: Inductive Generalization of Transformer to Arbitrary Context
  Lengths
I-BERT: Inductive Generalization of Transformer to Arbitrary Context Lengths
Hyoungwook Nam
S. Seo
Vikram Sharma Malithody
Noor Michael
Lang Li
11
1
0
18 Jun 2020
On the Computational Power of Transformers and its Implications in
  Sequence Modeling
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
25
63
0
16 Jun 2020
Transferring Inductive Biases through Knowledge Distillation
Transferring Inductive Biases through Knowledge Distillation
Samira Abnar
Mostafa Dehghani
Willem H. Zuidema
25
57
0
31 May 2020
A Mixture of $h-1$ Heads is Better than $h$ Heads
A Mixture of h−1h-1h−1 Heads is Better than hhh Heads
Hao Peng
Roy Schwartz
Dianqi Li
Noah A. Smith
MoE
19
32
0
13 May 2020
How Does Selective Mechanism Improve Self-Attention Networks?
How Does Selective Mechanism Improve Self-Attention Networks?
Xinwei Geng
Longyue Wang
Xing Wang
Bing Qin
Ting Liu
Zhaopeng Tu
AAML
34
35
0
03 May 2020
Capsule-Transformer for Neural Machine Translation
Capsule-Transformer for Neural Machine Translation
Sufeng Duan
Juncheng Cao
Hai Zhao
MedIm
11
4
0
30 Apr 2020
Self-Attention with Cross-Lingual Position Representation
Self-Attention with Cross-Lingual Position Representation
Liang Ding
Longyue Wang
Dacheng Tao
MILM
17
37
0
28 Apr 2020
Explicit Reordering for Neural Machine Translation
Explicit Reordering for Neural Machine Translation
Kehai Chen
Rui Wang
Masao Utiyama
Eiichiro Sumita
MILM
11
3
0
08 Apr 2020
Modeling Future Cost for Neural Machine Translation
Modeling Future Cost for Neural Machine Translation
Chaoqun Duan
Kehai Chen
Rui Wang
Masao Utiyama
Eiichiro Sumita
Conghui Zhu
T. Zhao
AI4TS
12
15
0
28 Feb 2020
Addressing Some Limitations of Transformers with Feedback Memory
Addressing Some Limitations of Transformers with Feedback Memory
Angela Fan
Thibaut Lavril
Edouard Grave
Armand Joulin
Sainbayar Sukhbaatar
13
11
0
21 Feb 2020
A Survey of Deep Learning Techniques for Neural Machine Translation
A Survey of Deep Learning Techniques for Neural Machine Translation
Shu Yang
Yuxin Wang
X. Chu
VLM
AI4TS
AI4CE
15
138
0
18 Feb 2020
Is Attention All What You Need? -- An Empirical Investigation on
  Convolution-Based Active Memory and Self-Attention
Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention
Thomas D. Dowdell
Hongyu Zhang
16
4
0
27 Dec 2019
Unsupervised Neural Dialect Translation with Commonality and Diversity
  Modeling
Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling
Yu Wan
Baosong Yang
Derek F. Wong
Lidia S. Chao
Haihua Du
B. Ao
17
21
0
11 Dec 2019
Neural Machine Translation: A Review and Survey
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DV
AI4TS
MedIm
15
311
0
04 Dec 2019
Two-Headed Monster And Crossed Co-Attention Networks
Two-Headed Monster And Crossed Co-Attention Networks
Yaoyiran Li
Jing Jiang
19
0
0
10 Nov 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
56
6,370
0
26 Sep 2019
Multi-Granularity Self-Attention for Neural Machine Translation
Multi-Granularity Self-Attention for Neural Machine Translation
Jie Hao
Xing Wang
Shuming Shi
Jinfeng Zhang
Zhaopeng Tu
MILM
9
48
0
05 Sep 2019
Towards Better Modeling Hierarchical Structure for Self-Attention with
  Ordered Neurons
Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons
Jie Hao
Xing Wang
Shuming Shi
Jinfeng Zhang
Zhaopeng Tu
23
12
0
04 Sep 2019
Self-Attention with Structural Position Representations
Self-Attention with Structural Position Representations
Xing Wang
Zhaopeng Tu
Longyue Wang
Shuming Shi
MILM
12
72
0
01 Sep 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
13
259
0
16 Jun 2019
Exploiting Sentential Context for Neural Machine Translation
Exploiting Sentential Context for Neural Machine Translation
Xing Wang
Zhaopeng Tu
Longyue Wang
Shuming Shi
17
21
0
04 Jun 2019
Assessing the Ability of Self-Attention Networks to Learn Word Order
Assessing the Ability of Self-Attention Networks to Learn Word Order
Baosong Yang
Longyue Wang
Derek F. Wong
Lidia S. Chao
Zhaopeng Tu
11
31
0
03 Jun 2019
12
Next