ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.03928
  4. Cited By
Which transformer architecture fits my data? A vocabulary bottleneck in
  self-attention

Which transformer architecture fits my data? A vocabulary bottleneck in self-attention

9 May 2021
Noam Wies
Yoav Levine
Daniel Jannai
Amnon Shashua
ArXivPDFHTML

Papers citing "Which transformer architecture fits my data? A vocabulary bottleneck in self-attention"

14 / 14 papers shown
Title
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Yuling Jiao
Yanming Lai
Yang Wang
Bokai Yan
34
0
0
18 Apr 2025
Structure-informed Positional Encoding for Music Generation
Structure-informed Positional Encoding for Music Generation
Manvi Agarwal
Changhong Wang
Gaël Richard
26
2
0
20 Feb 2024
Cramming: Training a Language Model on a Single GPU in One Day
Cramming: Training a Language Model on a Single GPU in One Day
Jonas Geiping
Tom Goldstein
MoE
28
84
0
28 Dec 2022
On the Ability of Graph Neural Networks to Model Interactions Between
  Vertices
On the Ability of Graph Neural Networks to Model Interactions Between Vertices
Noam Razin
Tom Verbin
Nadav Cohen
19
10
0
29 Nov 2022
Transformer Vs. MLP-Mixer: Exponential Expressive Gap For NLP Problems
Transformer Vs. MLP-Mixer: Exponential Expressive Gap For NLP Problems
D. Navon
A. Bronstein
MoE
36
0
0
17 Aug 2022
Learning to Learn to Predict Performance Regressions in Production at
  Meta
Learning to Learn to Predict Performance Regressions in Production at Meta
M. Beller
Hongyu Li
V. Nair
V. Murali
Imad Ahmad
Jürgen Cito
Drew Carlson
Gareth Ari Aye
Wes Dyer
26
5
0
08 Aug 2022
Pure Transformers are Powerful Graph Learners
Pure Transformers are Powerful Graph Learners
Jinwoo Kim
Tien Dat Nguyen
Seonwoo Min
Sungjun Cho
Moontae Lee
Honglak Lee
Seunghoon Hong
32
187
0
06 Jul 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep
  Convolutional Neural Networks
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Noam Razin
Asaf Maman
Nadav Cohen
33
29
0
27 Jan 2022
Can Vision Transformers Perform Convolution?
Can Vision Transformers Perform Convolution?
Shanda Li
Xiangning Chen
Di He
Cho-Jui Hsieh
ViT
19
19
0
02 Nov 2021
The Inductive Bias of In-Context Learning: Rethinking Pretraining
  Example Design
The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design
Yoav Levine
Noam Wies
Daniel Jannai
D. Navon
Yedid Hoshen
Amnon Shashua
AI4CE
11
36
0
09 Oct 2021
Scaling Laws for Neural Machine Translation
Scaling Laws for Neural Machine Translation
Behrooz Ghorbani
Orhan Firat
Markus Freitag
Ankur Bapna
M. Krikun
Xavier Garcia
Ciprian Chelba
Colin Cherry
22
99
0
16 Sep 2021
Revisiting Deep Learning Models for Tabular Data
Revisiting Deep Learning Models for Tabular Data
Yu. V. Gorishniy
Ivan Rubachev
Valentin Khrulkov
Artem Babenko
LMTD
19
694
0
22 Jun 2021
ByT5: Towards a token-free future with pre-trained byte-to-byte models
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Linting Xue
Aditya Barua
Noah Constant
Rami Al-Rfou
Sharan Narang
Mihir Kale
Adam Roberts
Colin Raffel
14
464
0
28 May 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
1