Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.04301
Cited By
Setting the Record Straight on Transformer Oversmoothing
9 January 2024
G. Dovonon
M. Bronstein
Matt J. Kusner
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Setting the Record Straight on Transformer Oversmoothing"
9 / 9 papers shown
Title
Pretraining Graph Transformers with Atom-in-a-Molecule Quantum Properties for Improved ADMET Modeling
Alessio Fallani
Ramil I. Nugmanov
Jose A. Arjona-Medina
Jörg Kurt Wegner
Alexandre Tkatchenko
Kostiantyn Chernichenko
MedIm
AI4CE
19
0
0
10 Oct 2024
Bundle Neural Networks for message diffusion on graphs
Jacob Bamberger
Federico Barbero
Xiaowen Dong
Michael M. Bronstein
37
1
0
24 May 2024
On the Scalability of GNNs for Molecular Graphs
Maciej Sypetkowski
Frederik Wenkel
Farimah Poursafaei
Nia Dickson
Karush Suri
Philip Fradkin
Dominique Beaini
GNN
AI4CE
26
11
0
17 Apr 2024
The Hidden Attention of Mamba Models
Ameen Ali
Itamar Zimerman
Lior Wolf
Mamba
26
57
0
03 Mar 2024
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
19
3
0
07 Dec 2023
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Shuangfei Zhai
Tatiana Likhomanenko
Etai Littwin
Dan Busbridge
Jason Ramapuram
Yizhe Zhang
Jiatao Gu
J. Susskind
AAML
35
64
0
11 Mar 2023
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
282
1,490
0
27 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1