Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.13401
Cited By
Your Transformer May Not be as Powerful as You Expect
26 May 2022
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Your Transformer May Not be as Powerful as You Expect"
7 / 7 papers shown
Title
Approximation Rate of the Transformer Architecture for Sequence Modeling
Hao Jiang
Qianxiao Li
34
9
0
03 Jan 2025
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Z. Zhang
Di He
KELM
21
0
0
03 Jul 2024
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
16
16
0
26 Jul 2023
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
234
690
0
27 Aug 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
73
66
0
12 Jul 2021
Benchmarking Graph Neural Networks
Vijay Prakash Dwivedi
Chaitanya K. Joshi
Anh Tuan Luu
T. Laurent
Yoshua Bengio
Xavier Bresson
173
907
0
02 Mar 2020
Geometric deep learning on graphs and manifolds using mixture model CNNs
Federico Monti
Davide Boscaini
Jonathan Masci
Emanuele Rodolà
Jan Svoboda
M. Bronstein
GNN
231
1,801
0
25 Nov 2016
1