ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.02436
  4. Cited By
Talking-Heads Attention

Talking-Heads Attention

5 March 2020
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
ArXiv (abs)PDFHTML

Papers citing "Talking-Heads Attention"

42 / 42 papers shown
Knocking-Heads Attention
Knocking-Heads Attention
Zhanchao Zhou
Xiaodong Chen
Haoxing Chen
Zhenzhong Lan
Jianguo Li
138
1
0
27 Oct 2025
AttentionDrop: A Novel Regularization Method for Transformer Models
AttentionDrop: A Novel Regularization Method for Transformer Models
Mirza Samad Ahmed Baig
Syeda Anshrah Gillani
Abdul Akbar Khan
Shahid Munir Shah
Muhammad Omer Khan
285
0
0
16 Apr 2025
Multi-Token Attention
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
446
7
0
01 Apr 2025
Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
Shihao Zhou
Dayu Li
Jinshan Pan
Juncheng Zhou
Jinglei Shi
Jufeng Yang
367
2
0
26 Mar 2025
SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture
SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture
Hocheol Lim
Hyein Cho
Jeonghoon Kim
296
1
0
04 Mar 2025
Dockformer: A transformer-based molecular docking paradigm for
  large-scale virtual screening
Dockformer: A transformer-based molecular docking paradigm for large-scale virtual screening
Zhangfan Yang
Junkai Ji
Shan He
Jianqiang Li
Ruibin Bai
Zexuan Zhu
Yew-Soon Ong
Yew-Soon Ong
411
1
0
11 Nov 2024
Improving Vision Transformers by Overlapping Heads in Multi-Head Self-Attention
Improving Vision Transformers by Overlapping Heads in Multi-Head Self-Attention
Tianxiao Zhang
Bo Luo
G. Wang
ViT
273
5
0
18 Oct 2024
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
DAPE V2: Process Attention Score as Feature Map for Length ExtrapolationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Chuanyang Zheng
Yihang Gao
Han Shi
Jing Xiong
Jiankai Sun
...
Xiaozhe Ren
Michael Ng
Xin Jiang
Zhenguo Li
Yu Li
412
12
0
07 Oct 2024
Depth-Wise Convolutions in Vision Transformers for Efficient Training on
  Small Datasets
Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets
Tianxiao Zhang
Wenju Xu
Bo Luo
Guanghui Wang
ViTMDE
519
48
0
28 Jul 2024
MultiMax: Sparse and Multi-Modal Attention Learning
MultiMax: Sparse and Multi-Modal Attention Learning
Yuxuan Zhou
Mario Fritz
Margret Keuper
657
3
0
03 Jun 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
Improving Transformers with Dynamically Composable Multi-Head AttentionInternational Conference on Machine Learning (ICML), 2024
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
338
6
0
14 May 2024
GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets
GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets
Dongjing Shan
guiqiang chen
ViT
350
1
0
07 Apr 2024
Enhancing Automatic Modulation Recognition through Robust Global Feature
  Extraction
Enhancing Automatic Modulation Recognition through Robust Global Feature ExtractionIEEE Transactions on Vehicular Technology (IEEE Trans. Veh. Technol.), 2024
Yunpeng Qu
Zhilin Lu
Rui Zeng
Jintao Wang
Jian Wang
257
36
0
02 Jan 2024
MABViT -- Modified Attention Block Enhances Vision Transformers
MABViT -- Modified Attention Block Enhances Vision Transformers
Mahesh Ramesh
Aswinkumar Ramkumar
170
3
0
03 Dec 2023
Memory-efficient Stochastic methods for Memory-based Transformers
Memory-efficient Stochastic methods for Memory-based Transformers
Vishwajit Kumar Vishnu
C. Sekhar
154
0
0
14 Nov 2023
ETDPC: A Multimodality Framework for Classifying Pages in Electronic
  Theses and Dissertations
ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations
Muntabir Hasan Choudhury
Lamia Salsabil
William A. Ingram
Edward A. Fox
Jian Wu
196
1
0
07 Nov 2023
How Much Context Does My Attention-Based ASR System Need?
How Much Context Does My Attention-Based ASR System Need?Interspeech (Interspeech), 2023
Robert Flynn
Anton Ragni
304
5
0
24 Oct 2023
Entropic Score metric: Decoupling Topology and Size in Training-free NAS
Entropic Score metric: Decoupling Topology and Size in Training-free NAS
Niccolò Cavagnero
Luc Robbiano
Francesca Pistilli
Barbara Caputo
Giuseppe Averta
239
4
0
06 Oct 2023
TpuGraphs: A Performance Prediction Dataset on Large Tensor
  Computational Graphs
TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational GraphsNeural Information Processing Systems (NeurIPS), 2023
P. Phothilimthana
Sami Abu-El-Haija
Kaidi Cao
Bahare Fatemi
Mike Burrows
Charith Mendis
Bryan Perozzi
GNNAI4TS
475
29
0
25 Aug 2023
Finding Stakeholder-Material Information from 10-K Reports using
  Fine-Tuned BERT and LSTM Models
Finding Stakeholder-Material Information from 10-K Reports using Fine-Tuned BERT and LSTM Models
V. Z. Chen
251
0
0
15 Aug 2023
Finding the Pillars of Strength for Multi-Head Attention
Finding the Pillars of Strength for Multi-Head AttentionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jinjie Ni
Rui Mao
Zonglin Yang
Han Lei
Xiaoshi Zhong
273
9
0
22 May 2023
Multi-Head State Space Model for Speech Recognition
Multi-Head State Space Model for Speech RecognitionInterspeech (Interspeech), 2023
Yassir Fathullah
Chunyang Wu
Yuan Shangguan
Junteng Jia
Wenhan Xiong
...
Chunxi Liu
Yangyang Shi
Ozlem Kalinli
M. Seltzer
Mark Gales
212
20
0
21 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health
  Management: A Survey and Roadmaps
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and RoadmapsReliability Engineering & System Safety (Reliab. Eng. Syst. Saf.), 2023
Yanfang Li
Huan Wang
Muxia Sun
LM&MAAI4TSAI4CE
441
105
0
10 May 2023
ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision
  Transformer on Diverse Mobile Devices
ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile DevicesIEEE International Conference on Computer Vision (ICCV), 2023
Chen Tang
Li Zhang
Huiqiang Jiang
Jiahang Xu
Ting Cao
Quanlu Zhang
Yuqing Yang
Zhi Wang
Mao Yang
218
15
0
17 Mar 2023
Semantic Feature Integration network for Fine-grained Visual
  Classification
Semantic Feature Integration network for Fine-grained Visual Classification
Haibo Wang
Yueyang Li
Haichi Luo
246
0
0
13 Feb 2023
EIT: Enhanced Interactive Transformer
EIT: Enhanced Interactive TransformerAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
318
3
0
20 Dec 2022
Rethinking Vision Transformers for MobileNet Size and Speed
Rethinking Vision Transformers for MobileNet Size and SpeedIEEE International Conference on Computer Vision (ICCV), 2022
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
430
284
0
15 Dec 2022
BJTU-WeChat's Systems for the WMT22 Chat Translation Task
BJTU-WeChat's Systems for the WMT22 Chat Translation TaskConference on Machine Translation (WMT), 2022
Yunlong Liang
Fandong Meng
Jinan Xu
Jinan Xu
Jie Zhou
160
3
0
28 Nov 2022
FF2: A Feature Fusion Two-Stream Framework for Punctuation Restoration
FF2: A Feature Fusion Two-Stream Framework for Punctuation Restoration
Yangjun Wu
Kebin Fang
Yao Zhao
Hao Zhang
Lifeng Shi
Mengqi Zhang
115
0
0
09 Nov 2022
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
TinyViT: Fast Pretraining Distillation for Small Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022
Kan Wu
Jinnian Zhang
Houwen Peng
Xiyang Dai
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
351
446
0
21 Jul 2022
FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer
FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer
Jingping Liu
Yuqiu Song
Kui Xue
Hongli Sun
Chao Wang
Lihan Chen
Haiyun Jiang
Jiaqing Liang
Tong Ruan
237
3
0
30 Jun 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
MiniViT: Compressing Vision Transformers with Weight MultiplexingComputer Vision and Pattern Recognition (CVPR), 2022
Jinnian Zhang
Houwen Peng
Kan Wu
Xiyang Dai
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
306
162
0
14 Apr 2022
Transformers in Medical Imaging: A Survey
Transformers in Medical Imaging: A Survey
Fahad Shamshad
Salman Khan
Syed Waqas Zamir
Muhammad Haris Khan
Munawar Hayat
Fahad Shahbaz Khan
Huazhu Fu
ViTLM&MAMedIm
410
1,034
0
24 Jan 2022
Streaming Transformer Transducer Based Speech Recognition Using
  Non-Causal Convolution
Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Yangyang Shi
Chunyang Wu
Dilin Wang
Alex Xiao
Jay Mahadeokar
...
Ke Li
Yuan Shangguan
Varun K. Nagaraja
Ozlem Kalinli
M. Seltzer
303
19
0
07 Oct 2021
WeChat Neural Machine Translation Systems for WMT21
WeChat Neural Machine Translation Systems for WMT21Conference on Machine Translation (WMT), 2021
Xianfeng Zeng
Yanjun Liu
Ernan Li
Qiu Ran
Fandong Meng
Peng Li
Jinan Xu
Jie Zhou
234
21
0
05 Aug 2021
MedGPT: Medical Concept Prediction from Clinical Narratives
MedGPT: Medical Concept Prediction from Clinical Narratives
Z. Kraljevic
Anthony Shek
D. Bean
R. Bendayan
J. Teo
Richard J. B. Dobson
LM&MAAI4TSMedIm
271
52
0
07 Jul 2021
A Survey of Transformers
A Survey of TransformersAI Open (AO), 2021
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
634
1,442
0
08 Jun 2021
Refiner: Refining Self-attention for Vision Transformers
Refiner: Refining Self-attention for Vision Transformers
Daquan Zhou
Yujun Shi
Bingyi Kang
Weihao Yu
Zihang Jiang
Yuan Li
Xiaojie Jin
Qibin Hou
Jiashi Feng
ViT
254
69
0
07 Jun 2021
Vision Transformers with Patch Diversification
Vision Transformers with Patch Diversification
Chengyue Gong
Dilin Wang
Meng Li
Vikas Chandra
Qiang Liu
ViT
346
69
0
26 Apr 2021
Going deeper with Image Transformers
Going deeper with Image TransformersIEEE International Conference on Computer Vision (ICCV), 2021
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Edouard Grave
ViT
785
1,244
0
31 Mar 2021
Multi-Head Attention: Collaborate Instead of Concatenate
Multi-Head Attention: Collaborate Instead of Concatenate
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
246
160
0
29 Jun 2020
Global Attention based Graph Convolutional Neural Networks for Improved
  Materials Property Prediction
Global Attention based Graph Convolutional Neural Networks for Improved Materials Property Prediction
Steph-Yves M. Louis
Yong Zhao
Alireza Nasiri
Xiran Wong
Yuqi Song
Fei Liu
Jianjun Hu
AI4CE
137
16
0
11 Mar 2020
1
Page 1 of 1