Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.02762
Cited By
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
6 June 2019
Yiping Lu
Zhuohan Li
Di He
Zhiqing Sun
Bin Dong
Tao Qin
Liwei Wang
Tie-Yan Liu
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"
36 / 36 papers shown
Title
IM-BERT: Enhancing Robustness of BERT through the Implicit Euler Method
Mihyeon Kim
Juhyoung Park
Youngbin Kim
34
0
0
11 May 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
41
2
0
02 Mar 2025
Learning to Decouple Complex Systems
Zihan Zhou
Tianshu Yu
BDL
74
4
0
17 Feb 2025
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
Muhammad Waseem Akram
Stefano Dettori
V. Colla
Giorgio Buttazzo
52
0
0
17 Feb 2025
OT-Transformer: A Continuous-time Transformer Architecture with Optimal Transport Regularization
Kelvin Kan
Xingjian Li
Stanley Osher
93
2
0
30 Jan 2025
Clustering in pure-attention hardmax transformers and its role in sentiment analysis
Albert Alcalde
Giovanni Fantuzzi
Enrique Zuazua
32
3
0
26 Jun 2024
How Smooth Is Attention?
Valérie Castin
Pierre Ablin
Gabriel Peyré
AAML
40
9
0
22 Dec 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
21
17
0
18 May 2023
EENED: End-to-End Neural Epilepsy Detection based on Convolutional Transformer
Chenyu Liu
Xin-qiu Zhou
Yang Liu
ViT
MedIm
18
1
0
17 May 2023
The emergence of clusters in self-attention dynamics
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
22
46
0
09 May 2023
Learning PDE Solution Operator for Continuous Modeling of Time-Series
Yesom Park
Jaemoo Choi
Changyeon Yoon
Changhoon Song
Myung-joo Kang
AI4TS
AI4CE
27
3
0
02 Feb 2023
A Neural ODE Interpretation of Transformer Layers
Yaofeng Desmond Zhong
Tongtao Zhang
Amit Chakraborty
Biswadip Dey
20
9
0
12 Dec 2022
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Ye Bai
Jie Li
W. Han
Hao Ni
Kaituo Xu
Zhuo Zhang
Cheng Yi
Xiaorui Wang
MoE
21
1
0
17 Sep 2022
Attention Enhanced Citrinet for Speech Recognition
Xianchao Wu
10
1
0
01 Sep 2022
Deep Sparse Conformer for Speech Recognition
Xianchao Wu
20
2
0
01 Sep 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim
A. Gholami
Albert Eaton Shaw
Nicholas Lee
K. Mangalam
Jitendra Malik
Michael W. Mahoney
Kurt Keutzer
23
99
0
02 Jun 2022
Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
Michael E. Sander
Pierre Ablin
Gabriel Peyré
32
25
0
29 May 2022
Does Simultaneous Speech Translation need Simultaneous Models?
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
41
26
0
08 Apr 2022
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation
Nishant Kambhatla
Logan Born
Anoop Sarkar
15
16
0
01 Apr 2022
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Yanqing Liu
Rui Shao
G. Wang
Kuan Chen
Bohan Li
P. Yuen
Jinzhu Li
Lei He
Sheng Zhao
32
55
0
25 Oct 2021
Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Yangyang Shi
Chunyang Wu
Dilin Wang
Alex Xiao
Jay Mahadeokar
...
Ke Li
Yuan Shangguan
Varun K. Nagaraja
Ozlem Kalinli
M. Seltzer
30
15
0
07 Oct 2021
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation
Haoran Xu
Benjamin Van Durme
Kenton W. Murray
47
57
0
09 Sep 2021
Is Attention Better Than Matrix Decomposition?
Zhengyang Geng
Meng-Hao Guo
Hongxu Chen
Xia Li
Ke Wei
Zhouchen Lin
59
137
0
09 Sep 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
16
21
0
02 Jul 2021
End-to-end Neural Diarization: From Transformer to Conformer
Yi Y. Liu
Eunjung Han
Chul Lee
A. Stolcke
22
40
0
14 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
32
1,087
0
08 Jun 2021
UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost
Zhen Wu
Lijun Wu
Qi Meng
Yingce Xia
Shufang Xie
Tao Qin
Xinyu Dai
Tie-Yan Liu
12
22
0
11 Apr 2021
Mask Attention Networks: Rethinking and Strengthen Transformer
Zhihao Fan
Yeyun Gong
Dayiheng Liu
Zhongyu Wei
Siyuan Wang
Jian Jiao
Nan Duan
Ruofei Zhang
Xuanjing Huang
26
72
0
25 Mar 2021
Recent Developments on ESPnet Toolkit Boosted by Conformer
Pengcheng Guo
Florian Boyer
Xuankai Chang
Tomoki Hayashi
Yosuke Higuchi
...
Jing Shi
Shinji Watanabe
Kun Wei
Wangyou Zhang
Yuekai Zhang
34
262
0
26 Oct 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
48
3,029
0
16 May 2020
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth
Yiping Lu
Chao Ma
Yulong Lu
Jianfeng Lu
Lexing Ying
MLT
36
78
0
11 Mar 2020
Deep Learning via Dynamical Systems: An Approximation Perspective
Qianxiao Li
Ting Lin
Zuowei Shen
AI4TS
AI4CE
14
107
0
22 Dec 2019
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
11
87
0
10 Nov 2019
A Review on Deep Learning in Medical Image Reconstruction
Hai-Miao Zhang
Bin Dong
MedIm
32
122
0
23 Jun 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex
Q. Liao
T. Poggio
213
255
0
13 Apr 2016
1