Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.13401
Cited By
Your Transformer May Not be as Powerful as You Expect
26 May 2022
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Your Transformer May Not be as Powerful as You Expect"
11 / 11 papers shown
Title
Approximation Rate of the Transformer Architecture for Sequence Modeling
Hao Jiang
Qianxiao Li
34
9
0
03 Jan 2025
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Z. Zhang
Di He
KELM
21
0
0
03 Jul 2024
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
Tianlang Chen
Shengjie Luo
Di He
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
AI4CE
16
5
0
24 Jun 2024
Non-autoregressive Personalized Bundle Generation
Wenchuan Yang
Cheng Yang
Jichao Li
Yuejin Tan
Xin Lu
Chuan Shi
19
0
0
11 Jun 2024
Transformers are Expressive, But Are They Expressive Enough for Regression?
Swaroop Nath
H. Khadilkar
Pushpak Bhattacharyya
16
3
0
23 Feb 2024
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
18
16
0
26 Jul 2023
Graph Inductive Biases in Transformers without Message Passing
Liheng Ma
Chen Lin
Derek Lim
Adriana Romero Soriano
P. Dokania
Mark J. Coates
Philip H. S. Torr
Ser-Nam Lim
AI4CE
12
86
0
27 May 2023
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
234
690
0
27 Aug 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
73
66
0
12 Jul 2021
Benchmarking Graph Neural Networks
Vijay Prakash Dwivedi
Chaitanya K. Joshi
Anh Tuan Luu
T. Laurent
Yoshua Bengio
Xavier Bresson
173
907
0
02 Mar 2020
Geometric deep learning on graphs and manifolds using mixture model CNNs
Federico Monti
Davide Boscaini
Jonathan Masci
Emanuele Rodolà
Jan Svoboda
M. Bronstein
GNN
231
1,801
0
25 Nov 2016
1