ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.13401
  4. Cited By
Your Transformer May Not be as Powerful as You Expect

Your Transformer May Not be as Powerful as You Expect

26 May 2022
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
ArXivPDFHTML

Papers citing "Your Transformer May Not be as Powerful as You Expect"

11 / 11 papers shown
Title
Approximation Rate of the Transformer Architecture for Sequence Modeling
Approximation Rate of the Transformer Architecture for Sequence Modeling
Hao Jiang
Qianxiao Li
34
9
0
03 Jan 2025
Let the Code LLM Edit Itself When You Edit the Code
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Z. Zhang
Di He
KELM
21
0
0
03 Jul 2024
GeoMFormer: A General Architecture for Geometric Molecular
  Representation Learning
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
Tianlang Chen
Shengjie Luo
Di He
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
AI4CE
16
5
0
24 Jun 2024
Non-autoregressive Personalized Bundle Generation
Non-autoregressive Personalized Bundle Generation
Wenchuan Yang
Cheng Yang
Jichao Li
Yuejin Tan
Xin Lu
Chuan Shi
19
0
0
11 Jun 2024
Transformers are Expressive, But Are They Expressive Enough for
  Regression?
Transformers are Expressive, But Are They Expressive Enough for Regression?
Swaroop Nath
H. Khadilkar
Pushpak Bhattacharyya
16
3
0
23 Feb 2024
Are Transformers with One Layer Self-Attention Using Low-Rank Weight
  Matrices Universal Approximators?
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
18
16
0
26 Jul 2023
Graph Inductive Biases in Transformers without Message Passing
Graph Inductive Biases in Transformers without Message Passing
Liheng Ma
Chen Lin
Derek Lim
Adriana Romero Soriano
P. Dokania
Mark J. Coates
Philip H. S. Torr
Ser-Nam Lim
AI4CE
12
86
0
27 May 2023
Train Short, Test Long: Attention with Linear Biases Enables Input
  Length Extrapolation
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
234
690
0
27 Aug 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
73
66
0
12 Jul 2021
Benchmarking Graph Neural Networks
Benchmarking Graph Neural Networks
Vijay Prakash Dwivedi
Chaitanya K. Joshi
Anh Tuan Luu
T. Laurent
Yoshua Bengio
Xavier Bresson
173
907
0
02 Mar 2020
Geometric deep learning on graphs and manifolds using mixture model CNNs
Geometric deep learning on graphs and manifolds using mixture model CNNs
Federico Monti
Davide Boscaini
Jonathan Masci
Emanuele Rodolà
Jan Svoboda
M. Bronstein
GNN
231
1,801
0
25 Nov 2016
1