RealFormer: Transformer Likes Residual Attention

RealFormer: Transformer Likes Residual Attention

21 December 2020

Bhargav Kanagal

Joshua Ainslie

Papers citing "RealFormer: Transformer Likes Residual Attention"

16 / 16 papers shown

Title
Layerwise Recurrent Router for Mixture-of-Experts Zihan Qiu Zeyu Huang Shuang Cheng Yizhi Zhou Zili Wang Ivan Titov Jie Fu MoE 68 2 0 13 Aug 2024
Paraphrase Identification with Deep Learning: A Review of Datasets and Methods Chao Zhou Cheng Qiu Daniel Ernesto Acuna 24 25 0 13 Dec 2022
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities Zhe Zhao Yudong Li Cheng-An Hou Jing-xin Zhao Rong Tian ... Xingwu Sun Zhanhui Kang Xiaoyong Du Linlin Shen Kimmo Yan VLM 29 23 0 13 Dec 2022
Relational Graph Convolutional Neural Networks for Multihop Reasoning: A Comparative Study Ieva Staliunaite P. Gorinski Ignacio Iacobacci GNN 14 0 0 12 Oct 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm Jiangning Zhang Xiangtai Li Yabiao Wang Chengjie Wang Yibo Yang Yong Liu Dacheng Tao ViT 28 32 0 19 Jun 2022
Revisiting Over-smoothing in BERT from the Perspective of Graph Han Shi Jiahui Gao Hang Xu Xiaodan Liang Zhenguo Li Lingpeng Kong Stephen M. S. Lee James T. Kwok 8 70 0 17 Feb 2022
TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance Yuefeng Tao Zhiwei Jia Runze Ma Shugong Xu ViT 17 6 0 16 Nov 2021
MNet-Sim: A Multi-layered Semantic Similarity Network to Evaluate Sentence Similarity Manuela Nayantara Jeyaraj D. Kasthurirathna 11 3 0 09 Nov 2021
MedGPT: Medical Concept Prediction from Clinical Narratives Z. Kraljevic Anthony Shek D. Bean R. Bendayan J. Teo Richard J. B. Dobson LM&MA AI4TS MedIm 6 38 0 07 Jul 2021
Attention-based multi-channel speaker verification with ad-hoc microphone arrays Che-Yuan Liang Junqi Chen Shanzheng Guan Xiao-Lei Zhang 12 9 0 01 Jul 2021
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model Jiangning Zhang Chao Xu Jian Li Wenzhou Chen Yabiao Wang Ying Tai Shuo Chen Chengjie Wang Feiyue Huang Yong Liu 19 22 0 31 May 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention Peng Liu Yuewen Cao Songxiang Liu Na Hu Guangzhi Li Chao Weng Dan Su 17 22 0 12 Feb 2021
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 249 2,009 0 28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 238 578 0 12 Mar 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 243 1,815 0 17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018