Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.08625
Cited By
Revisiting Over-smoothing in BERT from the Perspective of Graph
17 February 2022
Han Shi
Jiahui Gao
Hang Xu
Xiaodan Liang
Zhenguo Li
Lingpeng Kong
Stephen M. S. Lee
James T. Kwok
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Revisiting Over-smoothing in BERT from the Perspective of Graph"
50 / 50 papers shown
Title
CSE-SFP: Enabling Unsupervised Sentence Representation Learning via a Single Forward Pass
Bowen Zhang
Zixin Song
Chunping Li
24
0
0
01 May 2025
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores
Fengwei Zhou
Jiafei Song
Wenjin Jason Li
Gengjian Xue
Zhikang Zhao
Yichao Lu
Bailin Na
17
0
0
23 Apr 2025
Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation
Zhuo-Yang Song
Zeyu Li
Qing-Hong Cao
Ming-xing Luo
Hua Xing Zhu
28
0
0
28 Mar 2025
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
Zichen Miao
Wei Chen
Qiang Qiu
90
1
0
24 Mar 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
37
2
0
02 Mar 2025
Revisiting Kernel Attention with Correlated Gaussian Process Representation
Long Minh Bui
Tho Tran Huu
Duy-Tung Dinh
T. Nguyen
Trong Nghia Hoang
29
2
0
27 Feb 2025
A Survey of Graph Transformers: Architectures, Theories and Applications
Chaohao Yuan
Kangfei Zhao
Ercan Engin Kuruoglu
Liang Wang
Tingyang Xu
Wenbing Huang
Deli Zhao
Hong Cheng
Yu Rong
49
4
0
23 Feb 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
59
0
0
31 Jan 2025
The Geometry of Tokens in Internal Representations of Large Language Models
Karthik Viswanathan
Yuri Gardinazzi
Giada Panerai
Alberto Cazzaniga
Matteo Biagetti
AIFin
88
4
0
17 Jan 2025
GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers
Guoguo Ai
Guansong Pang
Hezhe Qiao
Yuan Gao
Hui Yan
67
0
0
26 Nov 2024
Zipfian Whitening
Sho Yokoi
Han Bao
Hiroto Kurita
Hidetoshi Shimodaira
27
0
0
01 Nov 2024
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Federico Arangath Joseph
Jerome Sieber
M. Zeilinger
Carmen Amo Alonso
33
0
0
14 Oct 2024
Pretraining Graph Transformers with Atom-in-a-Molecule Quantum Properties for Improved ADMET Modeling
Alessio Fallani
Ramil I. Nugmanov
Jose A. Arjona-Medina
Jörg Kurt Wegner
Alexandre Tkatchenko
Kostiantyn Chernichenko
MedIm
AI4CE
29
0
0
10 Oct 2024
Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective
Xueying Bai
Yifan Sun
Niranjan Balasubramanian
CLL
16
0
0
08 Oct 2024
Multi-task Heterogeneous Graph Learning on Electronic Health Records
Tsai Hor Chan
Guosheng Yin
Kyongtae Bae
Lequan Yu
CML
20
4
0
14 Aug 2024
Elliptical Attention
Stefan K. Nielsen
Laziz U. Abdullaev
R. Teo
Tan M. Nguyen
21
3
0
19 Jun 2024
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
R. Teo
Tan M. Nguyen
41
4
0
19 Jun 2024
MultiMax: Sparse and Multi-Modal Attention Learning
Yuxuan Zhou
Mario Fritz
M. Keuper
35
1
0
03 Jun 2024
On the Role of Attention Masks and LayerNorm in Transformers
Xinyi Wu
A. Ajorlou
Yifei Wang
Stefanie Jegelka
Ali Jadbabaie
35
9
0
29 May 2024
Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation
Zhiwei Yang
Yucong Meng
Kexue Fu
Shuo Wang
Zhijian Song
34
4
0
12 Apr 2024
PIDformer: Transformer Meets Control Theory
Tam Nguyen
César A. Uribe
Tan-Minh Nguyen
Richard G. Baraniuk
37
7
0
25 Feb 2024
SIBO: A Simple Booster for Parameter-Efficient Fine-Tuning
Zhihao Wen
Jie Zhang
Yuan Fang
MoE
24
3
0
19 Feb 2024
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
20
5
0
09 Jan 2024
PanGu-
π
π
π
: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
59
15
0
27 Dec 2023
Polynomial-based Self-Attention for Table Representation learning
Jayoung Kim
Yehjin Shin
Jeongwhan Choi
Hyowon Wi
Noseong Park
LMTD
17
2
0
12 Dec 2023
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
25
4
0
07 Dec 2023
Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
Tam Nguyen
Tan-Minh Nguyen
Richard G. Baraniuk
21
8
0
01 Dec 2023
URLOST: Unsupervised Representation Learning without Stationarity or Topology
Zeyu Yun
Juexiao Zhang
Bruno A. Olshausen
Yann LeCun
23
0
0
06 Oct 2023
Transformers are efficient hierarchical chemical graph learners
Zihan Pengmei
Zimu Li
Chih-chan Tien
Risi Kondor
Aaron R Dinner
GNN
19
1
0
02 Oct 2023
Towards Deep Attention in Graph Neural Networks: Problems and Remedies
Soo Yong Lee
Fanchen Bu
Jaemin Yoo
Kijung Shin
GNN
11
30
0
04 Jun 2023
Centered Self-Attention Layers
Ameen Ali
Tomer Galanti
Lior Wolf
28
6
0
02 Jun 2023
Demystifying Oversmoothing in Attention-Based Graph Neural Networks
Xinyi Wu
A. Ajorlou
Zihui Wu
Ali Jadbabaie
13
34
0
25 May 2023
Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation
Fangwen Wu
Jingxuan He
Yufei Yin
Y. Hao
Gang Huang
Lechao Cheng
ISeg
18
5
0
15 May 2023
Alleviating Over-smoothing for Unsupervised Sentence Representation
Nuo Chen
Linjun Shou
Ming Gong
Jian Pei
Bowen Cao
Jianhui Chang
Daxin Jiang
Jia Li
SSL
30
18
0
09 May 2023
Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection
Pilhyeon Lee
Taeoh Kim
Minho Shim
Dongyoon Wee
H. Byun
16
11
0
30 Mar 2023
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
Peng Gao
Renrui Zhang
Rongyao Fang
Ziyi Lin
Hongyang Li
Hongsheng Li
Qiao Yu
16
18
0
09 Mar 2023
A Message Passing Perspective on Learning Dynamics of Contrastive Learning
Yifei Wang
Qi Zhang
Tianqi Du
Jiansheng Yang
Zhouchen Lin
Yisen Wang
SSL
14
18
0
08 Mar 2023
Token Contrast for Weakly-Supervised Semantic Segmentation
Lixiang Ru
Heliang Zheng
Yibing Zhan
Bo Du
ViT
35
86
0
02 Mar 2023
Specformer: Spectral Graph Neural Networks Meet Transformers
Deyu Bo
Chuan Shi
Lele Wang
Renjie Liao
69
78
0
02 Mar 2023
Are More Layers Beneficial to Graph Transformers?
Haiteng Zhao
Shuming Ma
Dongdong Zhang
Zhi-Hong Deng
Furu Wei
27
12
0
01 Mar 2023
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
18
2
0
20 Dec 2022
Modeling Fine-grained Information via Knowledge-aware Hierarchical Graph for Zero-shot Entity Retrieval
Taiqiang Wu
Xingyu Bai
Weigang Guo
Weijie Liu
Siheng Li
Yujiu Yang
24
15
0
20 Nov 2022
DyREx: Dynamic Query Representation for Extractive Question Answering
Urchade Zaratiana
Niama El Khbir
Dennis Núñez
Pierre Holat
Nadi Tomeh
Thierry Charnois
79
2
0
26 Oct 2022
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning
Tao Yang
Jinghao Deng
Xiaojun Quan
Qifan Wang
Shaoliang Nie
20
3
0
12 Oct 2022
Transformers from an Optimization Perspective
Yongyi Yang
Zengfeng Huang
David Wipf
32
24
0
27 May 2022
Enhancing Continual Learning with Global Prototypes: Counteracting Negative Representation Drift
Xueying Bai
Jinghuan Shang
Yifan Sun
Niranjan Balasubramanian
CLL
25
1
0
24 May 2022
A Study on Transformer Configuration and Training Objective
Fuzhao Xue
Jianghai Chen
Aixin Sun
Xiaozhe Ren
Zangwei Zheng
Xiaoxin He
Yongming Chen
Xin Jiang
Yang You
20
7
0
21 May 2022
Representation Learning on Graphs with Jumping Knowledge Networks
Keyulu Xu
Chengtao Li
Yonglong Tian
Tomohiro Sonobe
Ken-ichi Kawarabayashi
Stefanie Jegelka
GNN
229
1,935
0
09 Jun 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,724
0
26 Sep 2016
1