Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2003.02436
Cited By
Talking-Heads Attention
5 March 2020
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Talking-Heads Attention"
42 / 42 papers shown
Knocking-Heads Attention
Zhanchao Zhou
Xiaodong Chen
Haoxing Chen
Zhenzhong Lan
Jianguo Li
138
1
0
27 Oct 2025
AttentionDrop: A Novel Regularization Method for Transformer Models
Mirza Samad Ahmed Baig
Syeda Anshrah Gillani
Abdul Akbar Khan
Shahid Munir Shah
Muhammad Omer Khan
285
0
0
16 Apr 2025
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
446
7
0
01 Apr 2025
Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
Shihao Zhou
Dayu Li
Jinshan Pan
Juncheng Zhou
Jinglei Shi
Jufeng Yang
367
2
0
26 Mar 2025
SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture
Hocheol Lim
Hyein Cho
Jeonghoon Kim
296
1
0
04 Mar 2025
Dockformer: A transformer-based molecular docking paradigm for large-scale virtual screening
Zhangfan Yang
Junkai Ji
Shan He
Jianqiang Li
Ruibin Bai
Zexuan Zhu
Yew-Soon Ong
Yew-Soon Ong
411
1
0
11 Nov 2024
Improving Vision Transformers by Overlapping Heads in Multi-Head Self-Attention
Tianxiao Zhang
Bo Luo
G. Wang
ViT
273
5
0
18 Oct 2024
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Chuanyang Zheng
Yihang Gao
Han Shi
Jing Xiong
Jiankai Sun
...
Xiaozhe Ren
Michael Ng
Xin Jiang
Zhenguo Li
Yu Li
412
12
0
07 Oct 2024
Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets
Tianxiao Zhang
Wenju Xu
Bo Luo
Guanghui Wang
ViT
MDE
519
48
0
28 Jul 2024
MultiMax: Sparse and Multi-Modal Attention Learning
Yuxuan Zhou
Mario Fritz
Margret Keuper
657
3
0
03 Jun 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
International Conference on Machine Learning (ICML), 2024
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
338
6
0
14 May 2024
GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets
Dongjing Shan
guiqiang chen
ViT
350
1
0
07 Apr 2024
Enhancing Automatic Modulation Recognition through Robust Global Feature Extraction
IEEE Transactions on Vehicular Technology (IEEE Trans. Veh. Technol.), 2024
Yunpeng Qu
Zhilin Lu
Rui Zeng
Jintao Wang
Jian Wang
257
36
0
02 Jan 2024
MABViT -- Modified Attention Block Enhances Vision Transformers
Mahesh Ramesh
Aswinkumar Ramkumar
170
3
0
03 Dec 2023
Memory-efficient Stochastic methods for Memory-based Transformers
Vishwajit Kumar Vishnu
C. Sekhar
154
0
0
14 Nov 2023
ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations
Muntabir Hasan Choudhury
Lamia Salsabil
William A. Ingram
Edward A. Fox
Jian Wu
196
1
0
07 Nov 2023
How Much Context Does My Attention-Based ASR System Need?
Interspeech (Interspeech), 2023
Robert Flynn
Anton Ragni
304
5
0
24 Oct 2023
Entropic Score metric: Decoupling Topology and Size in Training-free NAS
Niccolò Cavagnero
Luc Robbiano
Francesca Pistilli
Barbara Caputo
Giuseppe Averta
239
4
0
06 Oct 2023
TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
Neural Information Processing Systems (NeurIPS), 2023
P. Phothilimthana
Sami Abu-El-Haija
Kaidi Cao
Bahare Fatemi
Mike Burrows
Charith Mendis
Bryan Perozzi
GNN
AI4TS
475
29
0
25 Aug 2023
Finding Stakeholder-Material Information from 10-K Reports using Fine-Tuned BERT and LSTM Models
V. Z. Chen
251
0
0
15 Aug 2023
Finding the Pillars of Strength for Multi-Head Attention
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jinjie Ni
Rui Mao
Zonglin Yang
Han Lei
Xiaoshi Zhong
273
9
0
22 May 2023
Multi-Head State Space Model for Speech Recognition
Interspeech (Interspeech), 2023
Yassir Fathullah
Chunyang Wu
Yuan Shangguan
Junteng Jia
Wenhan Xiong
...
Chunxi Liu
Yangyang Shi
Ozlem Kalinli
M. Seltzer
Mark Gales
212
20
0
21 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Reliability Engineering & System Safety (Reliab. Eng. Syst. Saf.), 2023
Yanfang Li
Huan Wang
Muxia Sun
LM&MA
AI4TS
AI4CE
441
105
0
10 May 2023
ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices
IEEE International Conference on Computer Vision (ICCV), 2023
Chen Tang
Li Zhang
Huiqiang Jiang
Jiahang Xu
Ting Cao
Quanlu Zhang
Yuqing Yang
Zhi Wang
Mao Yang
218
15
0
17 Mar 2023
Semantic Feature Integration network for Fine-grained Visual Classification
Haibo Wang
Yueyang Li
Haichi Luo
246
0
0
13 Feb 2023
EIT: Enhanced Interactive Transformer
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
318
3
0
20 Dec 2022
Rethinking Vision Transformers for MobileNet Size and Speed
IEEE International Conference on Computer Vision (ICCV), 2022
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
430
284
0
15 Dec 2022
BJTU-WeChat's Systems for the WMT22 Chat Translation Task
Conference on Machine Translation (WMT), 2022
Yunlong Liang
Fandong Meng
Jinan Xu
Jinan Xu
Jie Zhou
160
3
0
28 Nov 2022
FF2: A Feature Fusion Two-Stream Framework for Punctuation Restoration
Yangjun Wu
Kebin Fang
Yao Zhao
Hao Zhang
Lifeng Shi
Mengqi Zhang
115
0
0
09 Nov 2022
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
European Conference on Computer Vision (ECCV), 2022
Kan Wu
Jinnian Zhang
Houwen Peng
Xiyang Dai
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
351
446
0
21 Jul 2022
FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer
Jingping Liu
Yuqiu Song
Kui Xue
Hongli Sun
Chao Wang
Lihan Chen
Haiyun Jiang
Jiaqing Liang
Tong Ruan
237
3
0
30 Jun 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Computer Vision and Pattern Recognition (CVPR), 2022
Jinnian Zhang
Houwen Peng
Kan Wu
Xiyang Dai
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
306
162
0
14 Apr 2022
Transformers in Medical Imaging: A Survey
Fahad Shamshad
Salman Khan
Syed Waqas Zamir
Muhammad Haris Khan
Munawar Hayat
Fahad Shahbaz Khan
Huazhu Fu
ViT
LM&MA
MedIm
410
1,034
0
24 Jan 2022
Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Yangyang Shi
Chunyang Wu
Dilin Wang
Alex Xiao
Jay Mahadeokar
...
Ke Li
Yuan Shangguan
Varun K. Nagaraja
Ozlem Kalinli
M. Seltzer
303
19
0
07 Oct 2021
WeChat Neural Machine Translation Systems for WMT21
Conference on Machine Translation (WMT), 2021
Xianfeng Zeng
Yanjun Liu
Ernan Li
Qiu Ran
Fandong Meng
Peng Li
Jinan Xu
Jie Zhou
234
21
0
05 Aug 2021
MedGPT: Medical Concept Prediction from Clinical Narratives
Z. Kraljevic
Anthony Shek
D. Bean
R. Bendayan
J. Teo
Richard J. B. Dobson
LM&MA
AI4TS
MedIm
271
52
0
07 Jul 2021
A Survey of Transformers
AI Open (AO), 2021
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
634
1,442
0
08 Jun 2021
Refiner: Refining Self-attention for Vision Transformers
Daquan Zhou
Yujun Shi
Bingyi Kang
Weihao Yu
Zihang Jiang
Yuan Li
Xiaojie Jin
Qibin Hou
Jiashi Feng
ViT
254
69
0
07 Jun 2021
Vision Transformers with Patch Diversification
Chengyue Gong
Dilin Wang
Meng Li
Vikas Chandra
Qiang Liu
ViT
346
69
0
26 Apr 2021
Going deeper with Image Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Edouard Grave
ViT
785
1,244
0
31 Mar 2021
Multi-Head Attention: Collaborate Instead of Concatenate
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
246
160
0
29 Jun 2020
Global Attention based Graph Convolutional Neural Networks for Improved Materials Property Prediction
Steph-Yves M. Louis
Yong Zhao
Alireza Nasiri
Xiran Wong
Yuqi Song
Fei Liu
Jianjun Hu
AI4CE
137
16
0
11 Mar 2020
1
Page 1 of 1