Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.10183
Cited By
Multi-Head Attention with Disagreement Regularization
24 October 2018
Jian Li
Zhaopeng Tu
Baosong Yang
Michael R. Lyu
Tong Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-Head Attention with Disagreement Regularization"
23 / 23 papers shown
Title
Style4Rec: Enhancing Transformer-based E-commerce Recommendation Systems with Style and Shopping Cart Information
Berke Ugurlu
Ming-Yi Hong
Che Lin
43
0
0
17 Jan 2025
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
46
2
0
22 May 2024
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Stefan Arnold
Nils Kemmerzell
Annika Schreiner
25
0
0
17 Oct 2023
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
26
2
0
20 Dec 2022
Explanation on Pretraining Bias of Finetuned Vision Transformer
Bumjin Park
Jaesik Choi
ViT
29
1
0
18 Nov 2022
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
26
11
0
20 Sep 2022
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Z. Ren
Huasheng Liang
Qiang Yan
Pengjie Ren
25
9
0
06 Apr 2022
Universal Simultaneous Machine Translation with Mixture-of-Experts Wait-k Policy
Shaolei Zhang
Yang Feng
MoE
20
39
0
11 Sep 2021
Discrete Auto-regressive Variational Attention Models for Text Modeling
Xianghong Fang
Haoli Bai
Jian Li
Zenglin Xu
Michael Lyu
Irwin King
32
3
0
16 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
32
1,086
0
08 Jun 2021
Learning Slice-Aware Representations with Mixture of Attentions
Cheng Wang
Sungjin Lee
Sunghyun Park
Han Li
Young-Bum Kim
R. Sarikaya
21
2
0
04 Jun 2021
On the Sub-Layer Functionalities of Transformer Decoder
Yilin Yang
Longyue Wang
Shuming Shi
Prasad Tadepalli
Stefan Lee
Zhaopeng Tu
24
27
0
06 Oct 2020
Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding
Dhanasekar Sundararaman
Vivek Subramanian
Guoyin Wang
Shijing Si
Dinghan Shen
Dong Wang
Lawrence Carin
17
40
0
10 Nov 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
22
25
0
30 Sep 2019
DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks
Zehui Lin
Pengfei Liu
Luyao Huang
Junkun Chen
Xipeng Qiu
Xuanjing Huang
3DPC
16
44
0
25 Jul 2019
Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings
Marcely Zanon Boito
Aline Villavicencio
Laurent Besacier
15
8
0
29 Jun 2019
What Does BERT Look At? An Analysis of BERT's Attention
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
42
1,578
0
11 Jun 2019
Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards
Hou Pong Chan
Wang Chen
Lu Wang
Irwin King
13
81
0
10 Jun 2019
Convolutional Self-Attention Networks
Baosong Yang
Longyue Wang
Derek F. Wong
Lidia S. Chao
Zhaopeng Tu
21
124
0
05 Apr 2019
Modeling Recurrence for Transformer
Jie Hao
Xing Wang
Baosong Yang
Longyue Wang
Jinfeng Zhang
Zhaopeng Tu
34
85
0
05 Apr 2019
Context-Aware Self-Attention Networks
Baosong Yang
Jian Li
Derek F. Wong
Lidia S. Chao
Xing Wang
Zhaopeng Tu
22
113
0
15 Feb 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,923
0
17 Aug 2015
1