The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles

2 June 2023

Papers citing "The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles"

10 / 10 papers shown

Title
Disrupting Diffusion-based Inpainters with Semantic Digression Geonho Son Juhun Lee Simon S. Woo DiffM 34 2 0 14 Jul 2024
Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformers Md Shamim Hussain Mohammed J. Zaki D. Subramanian ViT 26 4 0 07 Feb 2024
GRPE: Relative Positional Encoding for Graph Transformer Wonpyo Park Woonggi Chang Donggeon Lee Juntae Kim Seung-won Hwang 39 74 0 30 Jan 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 245 695 0 27 Aug 2021
Combiner: Full Attention Transformer with Sparse Computation Cost Hongyu Ren H. Dai Zihang Dai Mengjiao Yang J. Leskovec Dale Schuurmans Bo Dai 73 77 0 12 Jul 2021
Shortformer: Better Language Modeling using Shorter Inputs Ofir Press Noah A. Smith M. Lewis 219 88 0 31 Dec 2020
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 251 2,012 0 28 Jul 2020
The Lottery Ticket Hypothesis for Pre-trained BERT Networks Tianlong Chen Jonathan Frankle Shiyu Chang Sijia Liu Yang Zhang Zhangyang Wang Michael Carbin 148 376 0 23 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 238 579 0 12 Mar 2020
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Y. Gal Zoubin Ghahramani UQCV BDL 247 9,109 0 06 Jun 2015