Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.11747
Cited By
RealFormer: Transformer Likes Residual Attention
21 December 2020
Ruining He
Anirudh Ravula
Bhargav Kanagal
Joshua Ainslie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RealFormer: Transformer Likes Residual Attention"
16 / 16 papers shown
Title
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
68
2
0
13 Aug 2024
Paraphrase Identification with Deep Learning: A Review of Datasets and Methods
Chao Zhou
Cheng Qiu
Daniel Ernesto Acuna
24
25
0
13 Dec 2022
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Zhe Zhao
Yudong Li
Cheng-An Hou
Jing-xin Zhao
Rong Tian
...
Xingwu Sun
Zhanhui Kang
Xiaoyong Du
Linlin Shen
Kimmo Yan
VLM
29
23
0
13 Dec 2022
Relational Graph Convolutional Neural Networks for Multihop Reasoning: A Comparative Study
Ieva Staliunaite
P. Gorinski
Ignacio Iacobacci
GNN
14
0
0
12 Oct 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
28
32
0
19 Jun 2022
Revisiting Over-smoothing in BERT from the Perspective of Graph
Han Shi
Jiahui Gao
Hang Xu
Xiaodan Liang
Zhenguo Li
Lingpeng Kong
Stephen M. S. Lee
James T. Kwok
8
70
0
17 Feb 2022
TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
Yuefeng Tao
Zhiwei Jia
Runze Ma
Shugong Xu
ViT
17
6
0
16 Nov 2021
MNet-Sim: A Multi-layered Semantic Similarity Network to Evaluate Sentence Similarity
Manuela Nayantara Jeyaraj
D. Kasthurirathna
11
3
0
09 Nov 2021
MedGPT: Medical Concept Prediction from Clinical Narratives
Z. Kraljevic
Anthony Shek
D. Bean
R. Bendayan
J. Teo
Richard J. B. Dobson
LM&MA
AI4TS
MedIm
6
38
0
07 Jul 2021
Attention-based multi-channel speaker verification with ad-hoc microphone arrays
Che-Yuan Liang
Junqi Chen
Shanzheng Guan
Xiao-Lei Zhang
12
9
0
01 Jul 2021
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Jiangning Zhang
Chao Xu
Jian Li
Wenzhou Chen
Yabiao Wang
Ying Tai
Shuo Chen
Chengjie Wang
Feiyue Huang
Yong Liu
19
22
0
31 May 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
17
22
0
12 Feb 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
2,009
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
238
578
0
12 Mar 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1