Your Transformer May Not be as Powerful as You Expect

26 May 2022

Papers citing "Your Transformer May Not be as Powerful as You Expect"

7 / 7 papers shown

Title
Approximation Rate of the Transformer Architecture for Sequence Modeling Hao Jiang Qianxiao Li 34 9 0 03 Jan 2025
Let the Code LLM Edit Itself When You Edit the Code Zhenyu He Jun Zhang Shengjie Luo Jingjing Xu Z. Zhang Di He KELM 21 0 0 03 Jul 2024
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? T. Kajitsuka Issei Sato 16 16 0 26 Jul 2023
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 234 690 0 27 Aug 2021
Combiner: Full Attention Transformer with Sparse Computation Cost Hongyu Ren H. Dai Zihang Dai Mengjiao Yang J. Leskovec Dale Schuurmans Bo Dai 73 66 0 12 Jul 2021
Benchmarking Graph Neural Networks Vijay Prakash Dwivedi Chaitanya K. Joshi Anh Tuan Luu T. Laurent Yoshua Bengio Xavier Bresson 173 907 0 02 Mar 2020
Geometric deep learning on graphs and manifolds using mixture model CNNs Federico Monti Davide Boscaini Jonathan Masci Emanuele Rodolà Jan Svoboda M. Bronstein GNN 231 1,801 0 25 Nov 2016